# All of Annsel

This notebook tells you just about everything you need to use `annsel`. It's a good starting point to get a feel for the package.

:::{note}
:class: dropdown

You should be familiar with [`AnnData` ](https://anndata.readthedocs.io/en/latest/) beforehand.
:::

## Set up Data

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
import annsel as an

We will load a Leukemic bone marrow cytometry dataset :cite:p:`Triana2021` You can view the dataset on [CellxGene](https://cellxgene.cziscience.com/e/b3a5a10f-b1cb-4e8e-abce-bf345448625b.cxg/).

In [None]:
adata = an.datasets.leukemic_bone_marrow_dataset()

View the contents of the `AnnData` object.


Importing `annsel` will automatically register the accessor and add the `an` attribute to the `AnnData` object's namespace.

You can access the methods of the accessor using the `an` attribute on an `AnnData` object.

```python

anndata.AnnData.an.<method_name>
```

## Filter


We can filter the `AnnData` object using the `filter` method. Filtering can be applied to the `obs`, `var`, `X` (with a layer), `obs_names` and `var_names`.


Let's filter the `AnnData` object by the Cell Type stored in `obs["Cell_label"]`.

In [None]:
adata.an.filter(obs=an.col(["Cell_label"]) == "CD8+CD103+ tissue resident memory T cells")

:::{note}
This is equivalent to 

```python
adata[adata.obs["Cell_label"] == "CD8+CD103+ tissue resident memory T cells", :]
```
:::





You can also apply other expressions to `an.col` callable, for example `is_in` allows you to filter by a list of values. Under the hood, `annsel` uses [Narwhals](https://narwhals-dev.github.io/narwhals/) to apply these expressions. A full list of expressions which can be applied to a column can be found [here](https://narwhals-dev.github.io/narwhals/api-reference/expr/).

In [None]:
adata.an.filter(obs=an.col(["Cell_label"]).is_in(["Classical Monocytes", "CD8+CD103+ tissue resident memory T cells"]))

We can also combine multiple Predicates using the `&` and `|` operators.

In [None]:
adata.an.filter(
    obs=(an.col(["Cell_label"]).is_in(["Classical Monocytes", "CD8+CD103+ tissue resident memory T cells"]))
    & (an.col(["sex"]) == "male")
)

Or if you pass a tuple of Predicates, it will apply the `&` operator between them automatically.

In [None]:
adata.an.filter(
    obs=(
        an.col(["Cell_label"]).is_in(["Classical Monocytes", "CD8+CD103+ tissue resident memory T cells"]),
        an.col(["sex"]) == "male",
    )
)

The `|` operator is applied between the Predicates as well. Here we will select all the cells that are either Classical Monocytes or CD8+CD103+ tissue resident memory T cells, or are from male samples irrespective of the cell type.

In [None]:
adata.an.filter(
    obs=(
        an.col(["Cell_label"]).is_in(["Classical Monocytes", "CD8+CD103+ tissue resident memory T cells"])
        | (an.col(["sex"]) == "male")
    )
)

We can also filter the `AnnData` object by the `var` column. Here we will filter the `AnnData` object to only include the genes with `vst.mean` greater than or equal to 3.

In [None]:
adata.an.filter(var=an.col(["vst.mean"]) >= 3)

Filtering can also be applied to the `X` matrix. Here we will filter the `AnnData` object to only include the cells with `ENSG00000205336` gene expression greater than 1. If you want to filter by a layer, you can pass the layer name to the `layer` argument. and the operation will be applied to the `var_name` of that layer.

In [None]:
adata.an.filter(
    x=an.col(["ENSG00000205336"]) > 1,
    layer=None,
)

This can all be combined together as well

In [None]:
adata.an.filter(
    obs=(
        an.col(["Cell_label"]).is_in(["Classical Monocytes", "CD8+CD103+ tissue resident memory T cells"]),
        an.col(["sex"]) == "male",
    ),
    var=an.col(["vst.mean"]) >= 3,
)

Filtering can also be applied to `var_names` and `obs_names` using the `an.var_names` and `an.obs_names` predicates respectively. These are special predicates which can only be applied to `var_names` and `obs_names` of the `AnnData` object.

Here, arbitrarily, we will filter the AnnData object to only include the cells with `obs_names` starting with `645` and the genes with `var_names` starting with `ENSG0000018`.

In [None]:
adata.an.filter(obs_names=an.obs_names.str.starts_with("645"), var_names=an.var_names.str.starts_with("ENSG0000018"))

## Select


We can also apply the `select` method to the `AnnData` object. This is similar to the `filter` method, but it will only keep the rows and columns that match the Predicates. It can be applied to the `obs`, `var`, and `X`.

Here we will select the `Cell_label` and `sex` columns from the `obs` table, the `feature_name` column from the `var` table and the `ENSG00000205336` gene from `X`. This will return a new `AnnData` object with only these columns in the `obs`, `var` and `X` tables.

In [None]:
adata.an.select(
    obs=an.col(["Cell_label", "sex"]),
    var=an.col(["feature_name"]),
    x=an.col(["ENSG00000205336"]),
)

## Group By


We can also group the `AnnData` object by the `obs` and `var` columns. This will return a generator of `AnnData` objects subset on each group.



Here we will group the `AnnData` object by the `Cell_label` column in the `obs` table and the `feature_type` column in the `var` table.

If you pass `return_group_names=True`, the generator will yield a tuple of the group name and the `AnnData` object. If you group by both `obs` and `var`, the generator will yield a tuple of the both group names and the `AnnData` object, if you group by only one, it will yield a tuple of the group name and the `AnnData` object.

In [None]:
for i in adata.an.group_by(
    obs=an.col(["Cell_label"]),
    var=an.col(["feature_type"]),
    return_group_names=True,
):
    obs_group, var_group, _adata = i

## Pipe

We can also use pipe to apply functions on `AnnData` objects.

In [None]:
adata

In [None]:
import scanpy as sc

adata.an.pipe(sc.pl.embedding, basis="X_tsneni", color="Cell_label")

We can chain together multiple methods as well.

In [None]:
adata.an.select(obs=an.col(["Cell_label"])).an.filter(
    obs=an.col(["Cell_label"]).is_in(["Classical Monocytes", "CD8+CD103+ tissue resident memory T cells"])
).an.pipe(sc.pl.embedding, basis="X_tsneni", color="Cell_label")