# Example notebook

In [2]:
import numpy as np
import scanpy as sc
from anndata import AnnData
import scimb

In [3]:
%load_ext autoreload
%autoreload 2

In [4]:
adata = sc.datasets.pbmc3k_processed()

In [5]:
adata.obs.louvain.value_counts()

louvain
CD4 T cells          1144
CD14+ Monocytes       480
B cells               342
CD8 T cells           316
NK cells              154
FCGR3A+ Monocytes     150
Dendritic cells        37
Megakaryocytes         15
Name: count, dtype: int64

In [6]:
adata_balanced = scimb.pp.oversample(
    adata,
    "louvain",
)
adata_balanced.obs.louvain.value_counts()

  utils.warn_names_duplicates("obs")


louvain
CD4 T cells          1144
CD14+ Monocytes      1144
B cells              1144
CD8 T cells          1144
NK cells             1144
FCGR3A+ Monocytes    1144
Dendritic cells      1144
Megakaryocytes       1144
Name: count, dtype: int64

In [7]:
adata_balanced = scimb.pp.oversample(adata, "louvain", method="RandomOverSampler")
adata_balanced.obs.louvain.value_counts()

  utils.warn_names_duplicates("obs")


louvain
CD4 T cells          1144
CD14+ Monocytes      1144
B cells              1144
CD8 T cells          1144
NK cells             1144
FCGR3A+ Monocytes    1144
Dendritic cells      1144
Megakaryocytes       1144
Name: count, dtype: int64

In [17]:
adata_balanced = scimb.pp.oversample(adata, "louvain", method="SMOTE")
adata_balanced.obs.louvain.value_counts()



louvain
CD4 T cells          1144
CD14+ Monocytes      1144
B cells              1144
CD8 T cells          1144
NK cells             1144
FCGR3A+ Monocytes    1144
Dendritic cells      1144
Megakaryocytes       1144
Name: count, dtype: int64

In [20]:
adata_balanced.obs

Unnamed: 0,louvain
0,CD4 T cells
1,B cells
2,CD4 T cells
3,CD14+ Monocytes
4,NK cells
...,...
9147,NK cells
9148,NK cells
9149,NK cells
9150,NK cells


In [6]:
from imblearn.over_sampling import SMOTE

X = adata.X
y = adata.obs.louvain.values

sm = SMOTE(random_state=42)

sm.fit(X, y)

In [7]:
sm.fit_resample(X, y)

(array([[-0.17146951, -0.28081203, -0.04667679, ..., -0.09826884,
         -0.2090951 , -0.5312034 ],
        [-0.21458222, -0.37265295, -0.05480444, ..., -0.26684406,
         -0.31314576, -0.5966544 ],
        [-0.37688747, -0.2950843 , -0.0575275 , ..., -0.15865596,
         -0.17087643,  1.379     ],
        ...,
        [-0.23402742,  0.8719506 , -0.04928692, ..., -0.09815607,
         -0.18415484, -0.5116575 ],
        [ 3.3320024 , -0.24518597, -0.04460106, ..., -0.0367176 ,
         -0.16262281,  1.4521854 ],
        [-0.26825947,  1.1068604 , -0.04966896, ..., -0.07021379,
         -0.15046501, -0.487915  ]], dtype=float32),
 array(['CD4 T cells', 'B cells', 'CD4 T cells', ..., 'NK cells',
        'NK cells', 'NK cells'], dtype=object))

With myst it is possible to link in the text cell of a notebook such as this one the documentation of a function or a class.

Let's take as an example the function {func}`scimb.pp.basic_preproc`. 
You can see that by clicking on the text, the link redirects to the API documentation of the function. 
Check the raw markdown of this cell to understand how this is specified.

This works also for any package listed by `intersphinx`. Go to `docs/conf.py` and look for the `intersphinx_mapping` variable. 
There, you will see a list of packages (that this package is dependent on) for which this functionality is supported. 

For instance, we can link to the class {class}`anndata.AnnData`, to the attribute {attr}`anndata.AnnData.obs` or the method {meth}`anndata.AnnData.write`.

Again, check the raw markdown of this cell to see how each of these links are specified.

You can read more about this in the [intersphinx page](https://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html) and the [myst page](https://myst-parser.readthedocs.io/en/v0.15.1/syntax/syntax.html#roles-an-in-line-extension-point).

In [3]:
scimb.pp.basic_preproc(adata)

Implement a preprocessing function here.

0