Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit

Consistent Spelling and Lists

Thank you very much for the input!

Git stats


Failed to load latest commit information.
Latest commit message
Commit time

insct ("Insight")

INtegration of millions of Single Cells using batch-aware Triplet networks

INSCT is a deep learning algorithm which calculates an integrated embedding for scRNA-seq data. With INSCT, you can:

  • Integrate scRNA-seq datasets across batches with/without labels.
  • Generate a low-dimensional representation of the scRNA-seq data.
  • Integrate of millions of cells on personal computers.

For more info check out our manuscript.

How does it work?


  1. INSCT learns a data representation, which integrates cells across batches. The goal of the network is to minimize the distance between Anchor and Positive while maximizing the distance between Anchor and Negative. Anchor and Positive pairs consist of transcriptionally similar cells from different batches. The Negative is a transcriptomically dissimilar cell sampled from the same batch as the Anchor.
  2. Principal components of three data points corresponding to Anchor, Positive and Negative are fed into three identical neural networks, which share weights. The triplet loss function is used to train the network weights and the two-dimensional embedding layer activations represent the integrated embedding.

To learn an integrated embedding that overcomes batch effects, INSCT samples triplets in a batch-aware manner:


What does it do?

For example, we simulated scRNAseq data, where batch effects dominate the embedding:


However, INSCT learns an integrated embedding where cells cluster by group instead of batch:


Check out our interactive tutorials!

The following notebooks can be run within your web browser and allow you to interactively explore tnn. We have prepared the following analysis examples:

  1. Simulation dataset
  2. Pancreas dataset

Notebooks to reproduce the analyses described in our preprint can be found in the reproducibility folder.


INSCT depends on the following Python packages. These need to be installed separately:


To install INSCT, follow these instructions:


Install directly from Github using pip:

pip install git+

Download the package from Github and install it locally:

git clone
cd insct
pip install .


Unsupervised model

Triplets sampled based on transcriptional similarity

  1. AnnData object with PCs
  2. Batch vector
from insct.tnn import TNN
model = TNN() = adata, batch_name='batch')

Supervised model

Triplets sampled based on both transcriptional similarity and known labels

  1. AnnData object with PCs
  2. Batch vector
  3. Celltype vector
model = TNN() = adata, batch_name='batch', celltype_name='Celltypes')

Semi-supervised model

Triplets sampled based on both transcriptional similarity and known labels

  1. AnnData object with PCs
  2. Batch vector
  3. Celltype vector
  4. Masking vector (which labels to ignore)
model = TNN() = adata, batch_name='batch', celltype_name='Celltypes', mask_batch= batch_name)


  1. Coordinates for the integrated embedding