Skip to content

nrclaudio/MKA

Repository files navigation

Mouse Kidney Atlas

workflow

We present the Mouse Kidney Atlas (MKA), a comprehensive atlas of cellular heterogeneity in the healthy mouse kidney, which we generated by carefully integrating data from eight publicly available studies. We integrate these datasets using scVI and scANVI. To overcome annotation inconsistencies we learn the relationship between cell type transcriptomic profiles across datasets using scHPL. This model is then able to automatically label unseen cell populations with unprecedented resolution and accuracy. We demonstrate the significance of our atlas by obtaining robust and novel markers for poorly described cell types.

The MKA is publicly available to download, visualize and interact with at cellxgene

For more details refer to: A comprehensive mouse kidney atlas enables rare cell population characterization and robust marker discovery

File descriptions

  • models: Files containing the trained models used in the manuscript

  • notebooks: notebooks used to generate the figures presented in the manuscript

    • QC_scVI_scANVI : Figure 1

    • scHPL_ManualReannotation : Figure 2 and 3

      Supplementary Figures 1, 2 and 3

    • scHPL_Evaluation : Figure 4

      Supplementary Figure 4 and 5

    • Downstream_analyses : Figure 5

      Supplementary Figure 6

  • MKA_Metamarkers.xlsx Excel file with the identified metamarkers for each cell type label in the MKA.

    • Rank: Overall ranking for this gene within a cell type. The higher the ranking the better the marker is for the given population accounting for batch differences and number of datasets in which the gene is detected.
    • AUROC: Area under the receiver-operator curve. This value is an indication of how good the gene is in a classification scenario. For example, Podxl has an AUROC value of 0.9, which means that this gene is very good at classifying Podocytes as such.
  • functions.py helper functions used across the code

  • hyper_tune.py Ray tune implementation to optimize scVI model hyperparameters

Using the trained models

If you want to use the models for your own research you will need the HVG-filtered matrix we trained these on. You can find the AnnData object at Zenodo. Once downloaded, you can:

import os
import scvi
import scanpy as sc

os.chdir("MKA")
adata = sc.read_h5ad("adata.h5ad")
atlas_model = scvi.model.SCANVI.load("models/scANVI_model_full", adata=adata)

Hyperparameter Optimization

Ray tune was used train 1000 different hyperparameter and model configurations.

The tracked metrics at each training epoch were 'elbo_validation', 'reconstruction_loss' and 'silhouette_score'. Batch and cell type silhouette scores computed on the latent space were used as objective functions to maximize during training.

The search space was defined as follows:

  • model configuration
    • dropout rate: loguniform distribution between 1e-4 and 1e-1
    • number of layers: random integer between 1 and 3
    • number of latent dimensions: random integer between 20 and 31
  • plan configuration
    • learning rate: loguniform distribution between 1e-4 and 1e-1
  • atlas architecture
    • subset: random boolean (True/ False).

    The purpose of this parameter is to test the effect of filtering the feature space

    • number of hvgs: random choice between 2000 and 8000 in 1000 increments
    • continious_covariates: random choice between 'pct_counts_mt' and None
    • categorical_covariates: random choice between 'Source' and None

    'Source' in this case refers to either nuclei or cell as the starting material

  • number of epochs: random number between 100 and 201

Datasets

The following table contains all studies included in the MKA

Publication Abbbreviation Accession number
Wu et al., 2019 Wu19 GSE119531
Miao et al., 2021 Miao21 GSE157079
Park et al., 2018 Park18 GSE107585
Kirita et al., 2020 Kirita20 GSE139107
Dumas et al., 2020 Dumas20 E-MTAB-8145
Conway et al., 2020 Conway20 GSE140023
Hinze et al., 2021 Hinze21 GSE145690
Janosevic et al., 2021 Janosevic21 GSE151658

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published