Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new behavior regarding Series.plot.bar for logy=True in 0.21 #18394

Closed
pcluo opened this issue Nov 20, 2017 · 13 comments
Closed

new behavior regarding Series.plot.bar for logy=True in 0.21 #18394

pcluo opened this issue Nov 20, 2017 · 13 comments
Labels
good first issue Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite Visualization plotting

Comments

@pcluo
Copy link
Contributor

pcluo commented Nov 20, 2017

Code Sample

Series([10**x for x in range(5)]).plot.bar()
Series([10**x for x in range(5)]).plot.bar(logy=True)

Problem description

The behavior changed from last version, 0.20. In last version, Series.plot.bar uses one color for bars. In the new version, it uses different colors for bars even for a single series. When using logy argument, the bars cannot be seen. It seems the fill color is same as the background color.

image
image

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.7.0
xarray: None
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: 0.7.11.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0]

@PMeira
Copy link

PMeira commented Jan 16, 2018

Bumped into the same issue. You can pass a bottom argument to fix it, e.g.

Series([10**x for x in range(5)]).plot.bar(logy=True, bottom=1)

@Deborah-Digges
Copy link

@PMeira Why was the behavior of Series.plot.bar changed to plot bars with different color. Visually these colors add nothing and add to clutter in the chart. Different colors should only be used when they correspond to differences of meaning in the data.

Why is the default behavior to provide an unnecessarily visually overwhelming graph? It took me some time to realize why my bars suddenly started acting strangely. Now, I pass color='cornflowerblue' to get all my bars the same pleasant hue.

Reference for visual appeal and the use of colors: http://www.perceptualedge.com/articles/visual_business_intelligence/rules_for_using_color.pdf

@PMeira
Copy link

PMeira commented Apr 2, 2018

@Deborah-Digges I completely agree, I don't quite understand the change either. Maybe a dev can enlighten us.

@Deborah-Digges
Copy link

Thanks @PMeira. apologies, I assumed you were a dev!

@TomAugspurger TomAugspurger added the Visualization plotting label Apr 2, 2018
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Apr 2, 2018
@TomAugspurger TomAugspurger added the Regression Functionality that used to work in a prior pandas version label Apr 2, 2018
@TomAugspurger
Copy link
Contributor

Sorry about missing this when the issue was opened.

Is anyone interested in debugging this to see where things break? We're doing a release in 2-3 weeks, would be nice to get this fixed.

@TomAugspurger
Copy link
Contributor

Actually, the log issue seems to be fixed on master

gh

Tests confirming that this is fixed would be welcome.

@Deborah-Digges
Copy link

@TomAugspurger The issue is really different than the one just closed as a duplicate. The reason I raised it as a separate issue is because it's more a question of visual encoding than broken functionality. There's probably a reason the bars were made differently colored post version 0.20 and that's something I want to understand, which seems unrelated to the broken functionality that resulted because of this change(described above)

@TomAugspurger TomAugspurger added Testing pandas testing functions or related to the test suite good first issue and removed Difficulty Intermediate labels Apr 2, 2018
@pcluo
Copy link
Contributor Author

pcluo commented Apr 2, 2018

I should be more clear in my initial post. Visual encoding is a design choice while the log issue was a bug. However, I agree with @Deborah-Digges and do prefer a single color when plotting one variable. Right now, I use Series.plot(color='C0') to get default behavior in previous versions.

For the log issue, I'm not entirely sure how to assert an output that is a graph. Otherwise, I am happy to produce a test later this week @TomAugspurger.

@TomAugspurger
Copy link
Contributor

For the test, I'd imagine that in the broken version, ax.patches[-1].get_height() would be some small number. Ideally you'd write a test that checks it's the correct height, ensure that the test fails on the broken version, test that it works on the new version, then make a PR with the test.

That said, I can't reproduce this even with pandas 0.21.0. We'll want to make sure the issue is reproducible before writing a test for it.

@pcluo
Copy link
Contributor Author

pcluo commented Apr 9, 2018

@TomAugspurger the logy issue seems to be related to matplotlib. I can reproduce it with pandas 0.21.0 but matplotlib 2.1.0, which is the installed version in my first post, but not with the newer matplotlib 2.2.2.

@jtweeder
Copy link

@TomAugspurger , This may be old and left open, but you had previously asked for more tests that this is fixed. It appears that @pcluo is correct about the matplotlib issue.

I tested with matplotlib 2.1.0 and pandas 0.22.0 and reproduced a similar issue
oldversuion

Then I tested matplotlib 2.2.2 and pandas 0.22.0 and no issue.
newversion

I also had matplotlib 2.2.2 and pandas 0.20.3 and had no issue, so I think matplotlib may be the culprit.
third

Hope that provides some useful information!

@adamshamsudeen
Copy link

I think this issue can be closed.

@TomAugspurger
Copy link
Contributor

I think that
#18394 (comment) was asking for additional unit tests within pandas, to ensure that future regressions are caught. But, given that this seems to have been a matplotlib regression, we can hopefully assume they've put regression tests in place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Regression Functionality that used to work in a prior pandas version Testing pandas testing functions or related to the test suite Visualization plotting
Projects
None yet
Development

No branches or pull requests

6 participants