Skip to content

Geneset based clustering and CytOF Compatibility #510

@biskra

Description

@biskra

I want to start off by mentioning how much I love scanpy. I recommend this package to everyone I know who is doing single cell sequencing. I love how PAGA and UMAP can be integrated together. PAGA makes appreciation of data topology so simple. In addition, it's so easy to do velocity analysis with the scvelo integration. Installing scanpy as well as hdf5/loom compatibility is remarkably easier on python than in R, which gives scanpy users an obvious advantage.

I've learned so much using this package and it has allowed me to display my data in a simple and intuitive way. Truly, from the bottom of my heart, thank you, this package is awesome and I deeply appreciate all the work the scanpy team and Theis lab has put into it.

With that said, I find myself wanting to do things with scanpy that aren't currently supported.

(1) Clustering cells based on a restricted set of genes as opposed to principle components: I understand that this is a biased approach, but I find myself wanting to do this to explore datasets and how genes vary together. I believe it would aid in understanding data topology to reduce the dimensionality of the dataset by looking at a subset of key genes that are picked by an expert who has domain knowledge. I think this is best integrated with a separate sc.pp.neighbors-like function that uses a gene list, as opposed to PCs. Integration into downstream plotting can be done with separate functions that have a similar name (sc.tl.umap_genelist or sc.tl.umap_sparse; sc.tl.paga_genelist or sc.tl.paga_sparse).

I envision that this would also be a nice stepping stone towards multi-modal analysis with CITE-seq.

(2) I want to use PAGA and UMAP to display my CytOF data as well as integrate scRNA-seq/CytOF data together. I saw a pre-print on bioRxiv that indicated that you are working on this aspect of PAGA/scanpy. This was very exciting to me and I am glad that this is being developed. Is it possible to access the current version of this software?

I think that major partitions identified with CytOF or scRNA-seq can be linked together providing a coarse-grained mechanism to demonstrate how heterogeneity identified with each technique relates to each other based on a given experimental time point.

(3) Histogram integration in the plotting api for QC metrics would be helpful. While scatter plots and violin plots are effective, I find myself wanting to make cuts (bounds on percent mitochondrial or nGenes) in my data based off histograms.

Again, thank you so much for the amazing software!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions