Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum cell cutoff in sc.pl.dotplot #1829

Open
1 of 5 tasks
gokceneraslan opened this issue May 4, 2021 · 2 comments
Open
1 of 5 tasks

Minimum cell cutoff in sc.pl.dotplot #1829

gokceneraslan opened this issue May 4, 2021 · 2 comments
Labels
Enhancement ✨ good first issue easy first issue to get started in OSS community contribution!

Comments

@gokceneraslan
Copy link
Collaborator

gokceneraslan commented May 4, 2021

  • Additional function parameters / changed functionality / changed defaults?
  • New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
  • New plotting function: A kind of plot you would like to seein sc.pl?
  • External tools: Do you know an existing package that should go into sc.external.*?
  • Other?

Especially when we visualize large datasets with multiple categorical variables (e.g. patient, disease, cell type) using sc.pl.dotplot, and we use a sequence in the groupby argument (e.g. sc.pl.dotplot(ad, 'genex', groupby=['individual', 'disease_status', 'cell type'])), sometimes we end up with too few cells in some rows, in which summary statistics like fraction of nonzero expressors or mean expression are not very robust.

To avoid that, I think it'd be cool to have a minimum observation cutoff in the function, where e.g. min_cells=5 would show groupby combinations with at least 5 cells. Without this option, this sort of filtering becomes an annoying pandas exercise (which some might enjoy but possibly not everyone).

@gokceneraslan gokceneraslan added Enhancement ✨ good first issue easy first issue to get started in OSS community contribution! labels May 4, 2021
@ivirshup
Copy link
Member

ivirshup commented May 4, 2021

I wonder if min_cells is too specific, and if there is a more generalizable way to handle this.

Do we give any indication of how many cells are in a group right now? I feel like this would be important for the user to even know the stats could be unreliable.

Misc other thoughts:

  • I think this could make sense to address in the plotting classes
  • Could be cool to be able to pass a grouped anndata to sc.pl.dotplot, something like:
sc.pl.dotplot(
    adata.groupby([...]).select(lambda x: len(x) > 5),
    genes,
    ...
)

@gokceneraslan
Copy link
Collaborator Author

Do we give any indication of how many cells are in a group right now? I feel like this would be important for the user to even know the stats could be unreliable.

sc.pl.dotplot(..., return_fig=True).add_totals().show()

is one way to check the cell numbers, but there is no intuitive way to do the filtering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ good first issue easy first issue to get started in OSS community contribution!
Projects
None yet
Development

No branches or pull requests

2 participants