dotplot with x axis being one variable and y axis being another variable #1876

zhangguy · 2021-06-15T19:41:35Z

Additional function parameters / changed functionality / changed defaults?
New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
New plotting function: A kind of plot you would like to seein sc.pl?
External tools: Do you know an existing package that should go into sc.external.*?
Other?

Hi,
I'm wondering if it is possible to add a new feature to sc.pl.dotplot if it is not too much of work. Say I'm interested in just one gene, and I want to plot the expression across two conditions. I understand that currently this could be achieved by using groupby = ['var1', 'var2'], but it'll be only one column, and conditions will be coerced into var1_var2. Is it possible to add a feature to the plotting function and change this behavior? I want var1 to be the x axis and var2 to be the y axis.

Thank you very much!

The text was updated successfully, but these errors were encountered:

ivirshup · 2021-06-18T06:20:09Z

That's an interesting idea and I see how it would be useful. I don't think it's going to be easy to implement, since I believe our code is heavily based around having groups of observations on one axis, groups of variables on the other.

Definitely something to keep in mind for a refactor though.

ivirshup · 2021-12-06T18:37:28Z

@zhangguy, adding on to some thoughts from your PR #2055 (comment)

From my reading of that PR, you added a boolean argument groupby_expand which, when True, assumed group_by had two values: a grouping variable for the rows of the plot and a grouping variable for the columns of the plot. It also assumed var_names was a single variable which would be used to fill cell in the plot. As an example:

pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)

sc.pl.dotplot(pbmc, var_names='LDHB', groupby=['louvain', 'sampleid'], groupby_expand=True)

Instead of having an argument which changes the interpretation of the earlier arguments, I would prefer more orthogonal arguments.

I think you'd be able to get an output close to what you would currently like with:

import scanpy as sc, pandas as pd, numpy as np

pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)
df = sc.get.obs_df(pbmc, ["LDHB", "louvain", "sampleid"])

summarized = df.pivot_table(
    index=["louvain", "sampleid"],
    values="LDHB",
    aggfunc=[np.mean, np.count_nonzero]
)
color_df = summarized["mean"].unstack()
size_df = summarized["count_nonzero"].unstack()

# I don't think the var_names or groupby variables are actually important here
sc.pl.DotPlot(
    pbmc,
    var_names="LDHB",  groupby=["louvain", "sampleid"],  # Just here so it doesn't error
    dot_color_df=color_df, dot_size_df=size_df,
).style(cmap="Reds").show()

I think this functionality could be more generic, and inspired by the pd.pivot_table function. This could end up looking like:

# Imaginary implementation:
sc.pl.heatmap(
    pbmc,
    var_names="LDHB",
    row_groups="louvain",
    col_groups="sampleid"
)

sc.pl.heatmap(
    pbmc,
    var_names=["LDHB", "LYZ", "CD79A"],
    row_groups="louvain",
    col_groups="sampleid"
)

What do you think about that?

zhangguy · 2021-12-07T15:42:40Z

@zhangguy, adding on to some thoughts from your PR #2055 (comment)

From my reading of that PR, you added a boolean argument groupby_expand which, when True, assumed group_by had two values: a grouping variable for the rows of the plot and a grouping variable for the columns of the plot. It also assumed var_names was a single variable which would be used to fill cell in the plot. As an example:
pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)

sc.pl.dotplot(pbmc, var_names='LDHB', groupby=['louvain', 'sampleid'], groupby_expand=True)
Instead of having an argument which changes the interpretation of the earlier arguments, I would prefer more orthogonal arguments.

I think you'd be able to get an output close to what you would currently like with:
import scanpy as sc, pandas as pd, numpy as np

pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)
df = sc.get.obs_df(pbmc, ["LDHB", "louvain", "sampleid"])

summarized = df.pivot_table(
    index=["louvain", "sampleid"],
    values="LDHB",
    aggfunc=[np.mean, np.count_nonzero]
)
color_df = summarized["mean"].unstack()
size_df = summarized["count_nonzero"].unstack()

# I don't think the var_names or groupby variables are actually important here
sc.pl.DotPlot(
    pbmc,
    var_names="LDHB",  groupby=["louvain", "sampleid"],  # Just here so it doesn't error
    dot_color_df=color_df, dot_size_df=size_df,
).style(cmap="Reds").show()
I think this functionality could be more generic, and inspired by the pd.pivot_table function. This could end up looking like:
# Imaginary implementation:
sc.pl.heatmap(
    pbmc,
    var_names="LDHB",
    row_groups="louvain",
    col_groups="sampleid"
)
sc.pl.heatmap(
    pbmc,
    var_names=["LDHB", "LYZ", "CD79A"],
    row_groups="louvain",
    col_groups="sampleid"
)
What do you think about that?

Thanks @ivirshup !

I like these lines you suggested- perhaps I can adopt to make it more elegant when creating color_df/size_df:

import scanpy as sc, pandas as pd, numpy as np

pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)
df = sc.get.obs_df(pbmc, ["LDHB", "louvain", "sampleid"])

summarized = df.pivot_table(
    index=["louvain", "sampleid"],
    values="LDHB",
    aggfunc=[np.mean, np.count_nonzero]
)
color_df = summarized["mean"].unstack()
size_df = summarized["count_nonzero"].unstack()

# I don't think the var_names or groupby variables are actually important here
sc.pl.DotPlot(
    pbmc,
    var_names="LDHB",  groupby=["louvain", "sampleid"],  # Just here so it doesn't error
    dot_color_df=color_df, dot_size_df=size_df,
).style(cmap="Reds").show()

this is the output:

some work are needed to modify the grid/axis size, legend and scale. Actually this is the reason I work on top of the _dotplot and _baseplot function/ classes to implement the solution- to make the plots the same style with scanpy dotplot without doing too much work on the cosmetics.

But I can certainly change grouby_expand from bool to an actual variable group_cols as you suggested in #2055 . Or should we call it col_groups as you did in your sc.pl.heatmap pseudo code?
I'd be more than happy to make it more generalized, i.e., to sc.pl.heatmap, but I may need some time to understand sc.pl.heatmap first. The plotting functions are getting really complex- it took me some time to understand _dotplot and _baseplot :)

Thanks

ivirshup · 2021-12-08T16:13:53Z

Or should we call it col_groups as you did in your sc.pl.heatmap pseudo code?

That could be up to you. It depends on what the user is trying to achieve, which makes more sense. For instance, I'm not sure if it makes sense to allow splitting the columns by both variables and groups, or if that's the wrong abstraction.

I'd be more than happy to make it more generalized, i.e., to sc.pl.heatmap, but I may need some time to understand sc.pl.heatmap first. The plotting functions are getting really complex- it took me some time to understand _dotplot and _baseplot :)

This code could definitely be a lot more simple. Would definitely appreciate help here! I think some of the concepts used in seaborn could be quite useful here, though it looks like they're under heavy refactoring at the moment (relevant seaborn branch).

Maybe a good first step would be to fix how so the dotplot would look right if the user provides the dot size and dot color dataframes? Would make these plots possible, and gives an interface to try later approaches with.

zhangguy · 2021-12-23T01:00:56Z

Hi @ivirshup
I made some updates to PR #2055 . The column grouping argument was changed to a string/list argument 'col_groups'.
A few examples:

pbmc = sc.datasets.pbmc3k_processed().raw.to_adata()
pbmc.obs["sampleid"] = np.repeat(["s1", "s2"], pbmc.n_obs / 2)
pbmc.obs["condition"] = np.tile(["c1", "c2"], int(pbmc.n_obs / 2))

## plot one gene, one column grouping variable
sc.pl.dotplot(pbmc, var_names='C1QA', groupby='louvain', col_groups='sampleid')

## plot two genes, one column grouping variable
sc.pl.dotplot(pbmc, var_names=['C1QA', 'CD19'], groupby='louvain', col_groups='sampleid')

## plot two genes, tow column group variable
sc.pl.dotplot(pbmc, var_names=['C1QA', 'CD19'], groupby='louvain', col_groups=['sampleid', 'condition'])

## or we could use the same varaibles as y axis
sc.pl.dotplot(pbmc, var_names=['C1QA', 'CD19'], groupby=['sampleid', 'condition'], col_groups='louvain')

For the heatmap, I think you were referring to sc.pl.matrixplot. sc.pl.heatmap is a different function which plot a cell as a row and a gene as a column. col_groups was also added to sc.pl.matrixplot:

## plot two genes, tow column group variable
sc.pl.matrixplot(pbmc, var_names=['C1QA', 'CD19'], groupby='louvain', col_groups=['sampleid', 'condition'])

For the row_groups you proposed in your hypothetical sc.pl.heatmap implementation, it is equivalent to the current groupby argument in sc.pl.dotplot/sc.pl.matrixplot. I think it might be good to keep it as is for now- for this kind of changes it might be good to do a coordinated update on all plotting functions because I see quite a few functions use the groupby argument.

zhangguy added the Enhancement ✨ label Jun 15, 2021

ivirshup added the Area - Plotting 🌺 label Jun 18, 2021

zhangguy mentioned this issue Nov 20, 2021

allowing dotplot to use two variables in groupby as x and y axis #2055

Open

ivirshup mentioned this issue Jan 19, 2022

Domino plot/cell information in dotplot #2107

Open

5 tasks

yugeji mentioned this issue Nov 28, 2023

obs. vs. obs for all grouped plots using BasePlot #2769

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dotplot with x axis being one variable and y axis being another variable #1876

dotplot with x axis being one variable and y axis being another variable #1876

zhangguy commented Jun 15, 2021

ivirshup commented Jun 18, 2021

ivirshup commented Dec 6, 2021

zhangguy commented Dec 7, 2021

ivirshup commented Dec 8, 2021

zhangguy commented Dec 23, 2021 •

edited

dotplot with x axis being one variable and y axis being another variable #1876

dotplot with x axis being one variable and y axis being another variable #1876

Comments

zhangguy commented Jun 15, 2021

ivirshup commented Jun 18, 2021

ivirshup commented Dec 6, 2021

zhangguy commented Dec 7, 2021

ivirshup commented Dec 8, 2021

zhangguy commented Dec 23, 2021 • edited

zhangguy commented Dec 23, 2021 •

edited