# Using CellTypist for cell type classification
This notebook showcases the cell type classification for scRNA-seq query data by retrieving the most likely cell type labels from either the built-in CellTypist models or the user-trained custom models.

*This is my version modified using also the github information.*

Only the main steps and key parameters are introduced in this notebook. Refer to detailed [Usage](https://github.com/Teichlab/celltypist#usage) if you want to learn more.

## Install CellTypist

In [1]:
!pip install celltypist



In [1]:
import scanpy as sc

In [2]:
import celltypist
from celltypist import models

In [10]:
#from google.colab import drive
#drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
#import os
#import gzip

In [3]:
# Enabling `force_update = True` will overwrite existing (old) models.
models.download_models(force_update = True)

📜 Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
📚 Total models in list: 48
📂 Storing models in /home/seriph/.celltypist/data/models
💾 Downloading model [1/48]: Immune_All_Low.pkl
💾 Downloading model [2/48]: Immune_All_High.pkl
💾 Downloading model [3/48]: Adult_CynomolgusMacaque_Hippocampus.pkl
💾 Downloading model [4/48]: Adult_Human_PancreaticIslet.pkl
💾 Downloading model [5/48]: Adult_Human_Skin.pkl
💾 Downloading model [6/48]: Adult_Mouse_Gut.pkl
💾 Downloading model [7/48]: Adult_Mouse_OlfactoryBulb.pkl
💾 Downloading model [8/48]: Adult_Pig_Hippocampus.pkl
💾 Downloading model [9/48]: Adult_RhesusMacaque_Hippocampus.pkl
💾 Downloading model [10/48]: Autopsy_COVID19_Lung.pkl
💾 Downloading model [11/48]: COVID19_HumanChallenge_Blood.pkl
💾 Downloading model [12/48]: COVID19_Immune_Landscape.pkl
💾 Downloading model [13/48]: Cells_Adult_Breast.pkl
💾 Downloading model [14/48]: Cells_Fetal_Lung.pkl
💾 Downloading model [15/48]: Cells_Human_Tonsil.pkl
💾

In [4]:
models.models_path

'/home/seriph/.celltypist/data/models'

In [7]:
models.models_description()

👉 Detailed model information can be found at `https://www.celltypist.org/models`


Unnamed: 0,model,description
0,Immune_All_Low.pkl,immune sub-populations combined from 20 tissue...
1,Immune_All_High.pkl,immune populations combined from 20 tissues of...
2,Adult_Mouse_Gut.pkl,cell types in the adult mouse gut combined fro...
3,Autopsy_COVID19_Lung.pkl,cell types from the lungs of 16 SARS-CoV-2 inf...
4,COVID19_HumanChallenge_Blood.pkl,detailed blood cell states from 16 individuals...
5,COVID19_Immune_Landscape.pkl,immune subtypes from lung and blood of COVID-1...
6,Cells_Fetal_Lung.pkl,cell types from human embryonic and fetal lungs
7,Cells_Intestinal_Tract.pkl,"intestinal cells from fetal, pediatric (health..."
8,Cells_Lung_Airway.pkl,cell populations from scRNA-seq of five locati...
9,Developing_Human_Brain.pkl,cell types from the first-trimester developing...


## CD14 dataset

In [5]:
input = '../Data/CD14Cleaned/CD14_Monocytes_cleaned.csv'

Get an overview of the models and what they represent.

In [6]:
predictions = celltypist.annotate(input, model = 'Immune_All_Low.pkl', transpose_input = True, majority_voting = True,mode = 'best match')
predictions.to_table(folder = '../Data/CD14Cleaned/', prefix = "CD14Cleaned_Immune_All_Low_")

📁 Input file is '../Data/CD14Cleaned/CD14_Monocytes_cleaned.csv'
⏳ Loading data
🔬 Input data has 2438 cells and 32738 genes
🔗 Matching reference genes in the model
🧬 5278 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
  @numba.jit()
  @numba.jit()
  @numba.jit()
  @numba.jit()
2024-05-10 18:56:48.324783: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!


In [7]:
predictions = celltypist.annotate(input, model = 'Immune_All_Low.pkl', transpose_input = True, majority_voting = True,mode = 'prob match',p_thres = 0.3)
predictions.to_table(folder = '../Data/CD14Cleaned/', prefix = "CD14Cleaned_Immune_All_Low_probMatch03_")

📁 Input file is '../Data/CD14Cleaned/CD14_Monocytes_cleaned.csv'
⏳ Loading data
🔬 Input data has 2438 cells and 32738 genes
🔗 Matching reference genes in the model
🧬 5278 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!


## CD14 not cleaned - original dataset

In [27]:
input = '../Data/CD14Cleaned/CD14_raw_NOT_cleaned.csv'

In [28]:
predictions = celltypist.annotate(input, model = 'Immune_All_Low.pkl', transpose_input = True, majority_voting = True,mode = 'best match')
predictions.to_table(folder = '../Data/CD14Cleaned/', prefix = "CD14_Immune_All_Low_")

📁 Input file is '../Data/CD14Cleaned/CD14_raw_NOT_cleaned.csv'
⏳ Loading data
🔬 Input data has 2612 cells and 32738 genes
🔗 Matching reference genes in the model
🧬 5278 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!


## Mouse Brain Le Manno - Loom file E13.5

In [5]:
input = '../Data/MouseCortexFromLoom/e13.5_ForebrainDorsal_cleaned.csv'

In [6]:
predictions = celltypist.annotate(input, model = 'Developing_Mouse_Brain.pkl', transpose_input = True, majority_voting = True,mode = 'best match')
predictions.to_table(folder = '../Data/MouseCortexFromLoom/', prefix = "E135_Devel_Mouse_Brain_")

📁 Input file is '../Data/MouseCortexFromLoom/e13.5_ForebrainDorsal_cleaned.csv'
⏳ Loading data
🔬 Input data has 4981 cells and 14282 genes
🔗 Matching reference genes in the model
🧬 5981 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
  @numba.jit()
  @numba.jit()
  @numba.jit()
  from .autonotebook import tqdm as notebook_tqdm
  @numba.jit()
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!


## Mouse Brain Le Manno - Loom file E15.0

In [6]:
input = '../Data/MouseCortexFromLoom/e15.0_ForebrainDorsal_cleaned.csv'

In [7]:
predictions = celltypist.annotate(input, model = 'Developing_Mouse_Brain.pkl', transpose_input = True, majority_voting = True,mode = 'best match')
predictions.to_table(folder = '../Data/MouseCortexFromLoom/', prefix = "E150_Devel_Mouse_Brain_")

📁 Input file is '../Data/MouseCortexFromLoom/e15.0_ForebrainDorsal_cleaned.csv'
⏳ Loading data
🔬 Input data has 8562 cells and 14120 genes
🔗 Matching reference genes in the model
🧬 5902 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
  @numba.jit()
  @numba.jit()
  @numba.jit()
  from .autonotebook import tqdm as notebook_tqdm
  @numba.jit()
⛓️ Over-clustering input data with resolution set to 10
🗳️ Majority voting the predictions
✅ Majority voting done!


## Mouse Brain Le Manno - Loom file E17.5

In [8]:
input = '../Data/MouseCortexFromLoom/e17.5_ForebrainDorsal_cleaned.csv'

In [9]:
predictions = celltypist.annotate(input, model = 'Developing_Mouse_Brain.pkl', transpose_input = True, majority_voting = True,mode = 'best match')
predictions.to_table(folder = '../Data/MouseCortexFromLoom/', prefix = "E175_Devel_Mouse_Brain_")

📁 Input file is '../Data/MouseCortexFromLoom/e17.5_ForebrainDorsal_cleaned.csv'
⏳ Loading data
🔬 Input data has 2467 cells and 14227 genes
🔗 Matching reference genes in the model
🧬 5949 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!


## Cortical cells DGE E13.5 (mouse)

In [10]:
input = '../Data/Yuzwa_MouseCortex/CorticalCells_GSM2861511_E135_cleaned.csv'

In [11]:
predictions = celltypist.annotate(input, model = 'Developing_Mouse_Brain.pkl', transpose_input = True, majority_voting = True,mode = 'best match')
predictions.to_table(folder = '../Data/Yuzwa_MouseCortex/', prefix = "E13_5_Devel_Mouse_Brain_")

📁 Input file is '../Data/Yuzwa_MouseCortex/CorticalCells_GSM2861511_E135_cleaned.csv'
⏳ Loading data
🔬 Input data has 1112 cells and 17082 genes
🔗 Matching reference genes in the model
🧬 6136 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!


## Cortical cells DGE E17.5 (mouse)

In [12]:
input = '../Data/Yuzwa_MouseCortex/CorticalCells_GSM2861514_E175_cleaned.csv'

In [13]:
predictions = celltypist.annotate(input, model = 'Developing_Mouse_Brain.pkl', transpose_input = True, majority_voting = True,mode = 'best match')
predictions.to_table(folder = '../Data/Yuzwa_MouseCortex/', prefix = "E17_5_Devel_Mouse_Brain_")

📁 Input file is '../Data/Yuzwa_MouseCortex/CorticalCells_GSM2861514_E175_cleaned.csv'
⏳ Loading data
🔬 Input data has 874 cells and 17085 genes
🔗 Matching reference genes in the model
🧬 6158 features used for prediction
⚖️ Scaling input data
🖋️ Predicting labels
✅ Prediction done!
👀 Can not detect a neighborhood graph, will construct one before the over-clustering
⛓️ Over-clustering input data with resolution set to 5
🗳️ Majority voting the predictions
✅ Majority voting done!


----------------------------------------------------------