Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stacked bar graph: Bar segments are sorted in opposite order of legend. #6014

Closed
csaid opened this issue Jan 20, 2014 · 9 comments
Closed

Stacked bar graph: Bar segments are sorted in opposite order of legend. #6014

csaid opened this issue Jan 20, 2014 · 9 comments

Comments

@csaid
Copy link

csaid commented Jan 20, 2014

#The legend is sorted 'A, B, C', but the bar segments are sorted 'C, B, A'.
d = {'A': [2,3], 'B': [4,5], 'C': [6,7]}
df = pd.DataFrame(data=d)
df.plot(kind='bar', stacked=True)

Thanks, and here's my full version string:

INSTALLED VERSIONS

Python: 2.7.6.final.0
OS: Darwin
Release: 13.0.0
Processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.0
Cython: Not installed
Numpy: 1.8.0
Scipy: 0.13.1
statsmodels: Not installed
patsy: Not installed
scikits.timeseries: Not installed
dateutil: 2.2
pytz: 2013.9
bottleneck: Not installed
PyTables: Not Installed
numexpr: Not Installed
matplotlib: 1.3.1
openpyxl: Not installed
xlrd: Not installed
xlwt: Not installed
xlsxwriter: Not installed
sqlalchemy: Not installed
lxml: Not installed
bs4: Not installed
html5lib: Not installed
bigquery: Not installed
apiclient: Not installed

@bjornarneson
Copy link
Contributor

This seems like the expected behavior. A stacked column or bar chart should plot the first variable nearest to the intersection of the X & Y axes, right?

@csaid
Copy link
Author

csaid commented Jan 21, 2014

screen shot 2014-01-21 at 1 14 30 pm
From top to bottom the sequence in the legend is ABC, but the sequence in the bar segments is CBA. I don't have a strong opinion about which sequence is correct, but I definitely think there should be consistency between the sequence in the bars and the sequence in the legend.

@bjornarneson
Copy link
Contributor

I appreciate what you are saying, but if you think about the order of the segments & legend as 'first to last' rather than 'top to bottom', the current behavior makes sense. The first value is always the one nearest to the x-axis.

Imagine if the values of a, b, and c were all negative. 'Top to bottom' takes on a different meaning on the other side of the x-axis...or if the dataset was rendered as a horizontal bar chart.

image

@csaid
Copy link
Author

csaid commented Jan 21, 2014

Thanks for the comment, although I totally disagree :)

Stacked bar graphs are hardly ever negative. In the typical case they are positive, and it makes it a lot easier for people to match segments to labels if they follow the same order in the 'top to bottom' sense (i.e the visually intuitive sense), not the 'first to last' sense.

I'm new to issue reporting on github and so I don't know the etiquette for these things, but I'm curious if we can get someone else's opinion on this.

@ghost
Copy link

ghost commented Jan 27, 2014

Let's try a constructional proof:

  • The quantities to be displayed have a well-defined ordering.
  • Reading from top to bottom is quite popular.
  • We must stack things in bottom-up order. I learned this the hard way
    playing "hanoi towers" as a small child. It was tough going there for a while.
  • The reading order of items in the legend should conform to the ordering on the quantities.
  • The stacking order should conform to the ordering on the quantities.
  • Since the legend is ordered top to bottom and the bars are stacked bottom to top,
    they express the same ordering on the quantities in the plot as shown, and the plot
    is therefore consistent. ∎

That's not convincing in the least, but it shows the behavior makes sense for some
definition of sense.

Show me a major piece of related software that does this differently: ggplot2,
excel, tableau or one of the big stats packages and we'll seriously consider changing it.

Demonstrate that there's In fact a consensus among most of those and this ain't it, and
we will change it.

The etiquette on GH issues is to argue your point well while remaining polite. Amuse if you're able.

@csaid
Copy link
Author

csaid commented Jan 27, 2014

Hi all,
I probably shouldn't have used 'A, B, C' as my example because those already have a well defined ordering. Where this really matters is when you have categorical variables without a well-defined ordering. Here is default behavior in pandas. I think this is hard to read.
screen shot 2014-01-26 at 11 17 33 pm

Here is the default behavior in Excel, which I think is much easier to read. I also like how Excel preserves the order of the categories rather than sorting alphabetically.
screen shot 2014-01-26 at 11 06 07 pm

I personally think the Excel-style visually-matching ordering should be the default, but if that's not possible do you think there could be a simple "reverse labels" option in the plot method, or some other simple way to accomplish this? Thanks.

P.S, I don't have Tableau, but I did a google image search, and it looks like the segments and labels are sorted in the same way visually, as in Excel.
https://www.google.com/search?q=tableau+stacked+bar+graph&tbm=isch

@ghost
Copy link

ghost commented Jan 27, 2014

I've opened a PR for 0.14.0, we'll see how it goes. the plotting part of pandas is really fragile.
Also, I suggest you make use of pd.options.display.mpl_style='default', that color scheme
is really tragic.

@csaid
Copy link
Author

csaid commented Jan 27, 2014

Thanks y-p, much appreciated.

@ghost
Copy link

ghost commented Feb 7, 2014

I just merged #6118.

@ghost ghost closed this as completed Feb 7, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants