-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dotplot with x axis being one variable and y axis being another variable #1876
Comments
That's an interesting idea and I see how it would be useful. I don't think it's going to be easy to implement, since I believe our code is heavily based around having groups of observations on one axis, groups of variables on the other. Definitely something to keep in mind for a refactor though. |
@zhangguy, adding on to some thoughts from your PR #2055 (comment) From my reading of that PR, you added a boolean argument pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)
sc.pl.dotplot(pbmc, var_names='LDHB', groupby=['louvain', 'sampleid'], groupby_expand=True) Instead of having an argument which changes the interpretation of the earlier arguments, I would prefer more orthogonal arguments. I think you'd be able to get an output close to what you would currently like with: import scanpy as sc, pandas as pd, numpy as np
pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)
df = sc.get.obs_df(pbmc, ["LDHB", "louvain", "sampleid"])
summarized = df.pivot_table(
index=["louvain", "sampleid"],
values="LDHB",
aggfunc=[np.mean, np.count_nonzero]
)
color_df = summarized["mean"].unstack()
size_df = summarized["count_nonzero"].unstack()
# I don't think the var_names or groupby variables are actually important here
sc.pl.DotPlot(
pbmc,
var_names="LDHB", groupby=["louvain", "sampleid"], # Just here so it doesn't error
dot_color_df=color_df, dot_size_df=size_df,
).style(cmap="Reds").show() I think this functionality could be more generic, and inspired by the # Imaginary implementation:
sc.pl.heatmap(
pbmc,
var_names="LDHB",
row_groups="louvain",
col_groups="sampleid"
) sc.pl.heatmap(
pbmc,
var_names=["LDHB", "LYZ", "CD79A"],
row_groups="louvain",
col_groups="sampleid"
) What do you think about that? |
Thanks @ivirshup ! I like these lines you suggested- perhaps I can adopt to make it more elegant when creating color_df/size_df:
this is the output: But I can certainly change grouby_expand from bool to an actual variable Thanks |
That could be up to you. It depends on what the user is trying to achieve, which makes more sense. For instance, I'm not sure if it makes sense to allow splitting the columns by both variables and groups, or if that's the wrong abstraction.
This code could definitely be a lot more simple. Would definitely appreciate help here! I think some of the concepts used in Maybe a good first step would be to fix how so the dotplot would look right if the user provides the dot size and dot color dataframes? Would make these plots possible, and gives an interface to try later approaches with. |
Hi @ivirshup
For the heatmap, I think you were referring to
|
sc.tools
?sc.pl
?sc.external.*
?Hi,
I'm wondering if it is possible to add a new feature to sc.pl.dotplot if it is not too much of work. Say I'm interested in just one gene, and I want to plot the expression across two conditions. I understand that currently this could be achieved by using groupby = ['var1', 'var2'], but it'll be only one column, and conditions will be coerced into var1_var2. Is it possible to add a feature to the plotting function and change this behavior? I want var1 to be the x axis and var2 to be the y axis.
Thank you very much!
The text was updated successfully, but these errors were encountered: