Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/VIS: groupby.hist/plot() should pass group keys as labels #6279

Closed
TomAugspurger opened this issue Feb 6, 2014 · 12 comments · Fixed by #33493
Closed

BUG/VIS: groupby.hist/plot() should pass group keys as labels #6279

TomAugspurger opened this issue Feb 6, 2014 · 12 comments · Fixed by #33493

Comments

@TomAugspurger
Copy link
Contributor

At least wherever possible.

In [2]: df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])

In [3]: df['C'] = 15 * ['a'] + 15 * ['b']

In [4]: ax = df.groupby('C')['A'].hist()

hist_leg
Ideally the group keys would be the legend for the plot.
I can take this one.

@jreback jreback added this to the 0.14.0 milestone Feb 6, 2014
@TomAugspurger
Copy link
Contributor Author

I'm still working on this. The interface between the groupby and plotting methods is a bit messy.

Interestingly enough, df.groupby('C')['A'].hist() and df['A'].hist(by=df['C']) follow two completely different code paths, and produce different results.

>>>df.groupby('C')['A'].hist()
C
a    Axes(0.552174,0.15;0.347826x0.75)
b    Axes(0.552174,0.15;0.347826x0.75)
dtype: object

group_then_hist

and

>>>df['A'].hist(by=df['C'])
array([<matplotlib.axes.AxesSubplot object at 0x1111bba10>,
       <matplotlib.axes.AxesSubplot object at 0x1111ef710>], dtype=object)

hist_then_group

@fonnesbeck
Copy link

Is there currently a hack to get a legend for plots like this (i.e. the first two plots where histograms are on the same axis)? At present I have no way of knowing which histogram belongs to which series.

@TomAugspurger
Copy link
Contributor Author

Sorry, haven't gotten around to fixing this. My current workaround is to do the groupby and then iterate over the groups:

groups = df.groupby("age_bin")['Impressions']

fig, ax = plt.subplots()

for k, v in groups:
    v.hist(label=k, alpha=.75, ax=ax)

ax.legend()

That will give you something like

hist

@jreback
Copy link
Contributor

jreback commented Apr 6, 2014

@TomAugspurger fixable for 0.14? push? wont-fix?

@TomAugspurger TomAugspurger modified the milestones: 0.15.0, 0.14.0 Apr 28, 2014
@TomAugspurger
Copy link
Contributor Author

@jreback pushing

@jreback
Copy link
Contributor

jreback commented Apr 28, 2014

ok......(of course if you do fix by release time, then can pull forward)

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jorisvandenbossche
Copy link
Member

@TomAugspurger this is actually also true for Groupby.plot(), this also does not show a legend

@jorisvandenbossche jorisvandenbossche changed the title BUG/VIS: groupby.hist() should pass group keys as labels BUG/VIS: groupby.hist/plot() should pass group keys as labels Mar 12, 2015
@mattayes
Copy link
Contributor

Is this issue abandoned?

@TomAugspurger
Copy link
Contributor Author

Still open. PRs welcome if it's something you'd use.

@Jeitan
Copy link

Jeitan commented Sep 9, 2017

This is something I'd really like to see ... I took a look at the code, but I think the various pathways are way too confounding for me to tackle.

However, I did embark on a journey of exploration and compiled all the possible ways to do this and how they behave out of the box, plus another workaround that turns out to function nicely but isn't necessarily obvious. It might be useful for anybody who wants to tackle this issue. It's in a spreadsheet that is accessible (I think) here.

The workaround involves pivot: df.pivot(values='A', columns='C').plot.hist(stacked=True).

@judimaci
Copy link

Is there a way to plot these histograms on a 3rd axis as a 3D plot or on subplots to have a better comparison of each?

@Jeitan
Copy link

Jeitan commented Oct 25, 2018

@judimaci I haven't looked at this again since last year, but at that time at least it was actually quite easy to make subplots, just not intuitive how to do it. If you want to plot values of a column 'A' grouped by categories in column 'C', something along the lines of df.pivot(values='A', columns='C').plot.hist(subplots=True) should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants