Skip to content
Rare cell identification tool for single cell datasets
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
build/lib/giniclust3 the commit message Sep 26, 2019
dist
example Update performGiniClust3.py Sep 27, 2019
giniclust3.egg-info the commit message Sep 26, 2019
giniclust3 the commit message Sep 26, 2019
INSTALL the commit message Sep 26, 2019
LICENSE Create LICENSE Jul 24, 2019
README.md Update README.md Sep 30, 2019
pipeline.png the commit message Sep 26, 2019
setup.py the commit message Sep 26, 2019

README.md

GiniClust3

GiniClust3: a fast and memory-efficient tool for rare cell type identification

GiniClust is a clustering method specifically designed for rare cell type detection. It uses the Gini index to identify genes that are associated with rare cell types without prior knowledge. This differs from traditional clustering methods using highly variable genes. Using a cluster-aware, weighted consensus clustering approach, we can combine the outcomes from Gini index and Fano factor-based clustering and identify both common and rare cell types. In this new version (GiniClust3), we have substantially increased the speed and reduced memory usage in order to meet the need for large data size. It can now be used to identify rare cell types from over a million cells. Previous versions of GiniClust can be found below: GiniClust (https://github.com/lanjiangboston/GiniClust). GiniClust2 (https://github.com/dtsoucas/GiniClust2).

A schematic overview of the GiniClust3 pipeline

Prerequisites

Installation

Scanpy is needed to be installed first from "https://scanpy.readthedocs.io/en/stable/installation.html".

Install by using anaconda (recommend)

conda install -c rdong giniclust3

OR download from Github and install

python setup.py install

OR install by using pip

pip install giniclust3

Usage and example:

Import associated packages

import scanpy as sc
import numpy as np
import giniclust3 as gc
import anndata

Read single cell file

adataRaw=sc.read_csv("./data/GSM1599495_ES_d0_biorep_techrep1.csv",first_column_names=True)

Filter expression matrix

sc.pp.filter_cells(adataRaw,min_genes=3)
sc.pp.filter_genes(adataRaw,min_cells=200)

Format expression matrix

###example csv file is col:cells X row:genes. Skip this step if the input matrix is col:genes X row:cells
adataSC=anndata.AnnData(X=adataRaw.X.T,obs=adataRaw.var,var=adataRaw.obs)

Normalization

sc.pp.normalize_per_cell(adataSC, counts_per_cell_after=1e4)

Perform GiniIndexClust

gc.gini.calGini(adataSC) ###Calculate Gini Index
adataGini=gc.gini.clusterGini(adataSC,neighbors=3) ###Cluster based on Gini Index

Perform FanoFactorClust

gc.fano.calFano(adataSC) ###Calculate Fano factor
adataFano=gc.fano.clusterFano(adataSC) ###Cluster based on Fano factor

ConsensusClust

consensusCluster={}
consensusCluster['giniCluster']=np.array(adataSC.obs['rare'].values.tolist())
consensusCluster['fanoCluster']=np.array(adataSC.obs['fano'].values.tolist())
gc.consensus.generateMtilde(consensusCluster) ###Generate consensus matrix
gc.consensus.clusterMtilde(consensusCluster) ###Cluster consensus matrix
gc.consensus.projectFinalCluster(consensusCluster) ###Projection to each cell
np.savetxt("final.txt",consensusCluster['finalCluster'], delimiter="\t",fmt='%s')

UMAP visualization

adataGini.obs['final']=consensusCluster['finalCluster']
adataFano.obs['final']=consensusCluster['finalCluster']
gc.plot.umapGini(adataGini)
gc.plot.umapFano(adataFano)

Citation

NA

License

Copyright (C) 2019 YuanLab. See the LICENSE file for license rights and limitations (MIT).

You can’t perform that action at this time.