# CellTypist Tutorial

## Set Up

In [None]:
# import
import scanpy as sc
import pandas as pd
import celltypist
from celltypist import models

In [None]:
# download model 
models.download_models(model = 'Human_Lung_Atlas.pkl')
model = models.Model.load(model = 'Human_Lung_Atlas.pkl')
model

In [None]:
# read in data
adata_query = sc.read_h5ad('../../data/gtex/lung.h5ad')
adata_query

## Pre-processing

In [None]:
#normalize
sc.pp.normalize_total(adata_query, target_sum = 1e4)
sc.pp.log1p(adata_query)
adata_query

> if you encounter: `KeyError: 'No "neighbors" in .uns'`, this is because the neighbors graph is queried differently in the source code than it is stored here

Just run: `adata_query.obsp.pop('connectivities', None)` before running `celltypist.annotate(adata_query, model = model, majority_voting = True)` to force it to create a new neighbors graph that is stored correctly

In [None]:
# to address issue with not querying neighbors right:
adata_query.obsp.pop('connectivities', None)

>if you encounter: `"🛑 No features overlap with the model. Please provide gene symbols"` after running `celltypist.annotate(f1, model = model, majority_voting = True)`:

In [None]:
# verify that adata_query.var_names contains the gene names, or else reassign 
gene_name_col = 'gene_name'
adata_query.var.rename(columns={gene_name_col: 'var_names'}, inplace=True)

## Annotation

**There are two prediction parameters: individual prediction and majority voting**
#### `majority_voting = True`: 
from the CellTypist website: "Prediction results are refined by a majority voting approach based on the idea that transcriptionally similar cells are more likely to form a (sub)cluster regardless of their individual prediction outcomes. The query data will be over-clustered (by Leiden clustering with a canonical Scanpy pipeline) and each resulting subcluster is assigned the identity supported by the dominant cell type predicted. Through this, distinguishable small subclusters will be assigned distinct labels, and homogenous subclusters will be assigned the same labels and iteratively converge to a bigger cluster."

In [None]:
predictions = celltypist.annotate(adata_query, model = model, majority_voting = True)
predictions.predicted_labels

### Saving Results

In [None]:
predictions_adata = predictions.to_adata()
predictions_adata

In [None]:
predictions_adata.obs.to_csv('results.csv')