Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrame.hist() does not get along with matplotlib.pyplot.tight_layout() #15602

Closed
wants to merge 2 commits into from

Conversation

ResidentMario
Copy link
Contributor

@ResidentMario ResidentMario commented Mar 7, 2017

@ResidentMario
Copy link
Contributor Author

ResidentMario commented Mar 8, 2017

@jreback What's up with these tests? AppVeyor passes, TravisCI fails once, CircleCI fails lots of times. Is this somehow platform-dependent or are they not all running the same tests...?

From the CircleCI output it looks like the _check_axes_shape tests rely on the old behavior (set_visible(False)) of removing the subplots in unnecessary axis positions. As I understand it in the new behavior (axis('off')) the subplots exist, they just aren't displayed, so they still have a BBox and avoid confusing tight_layout with a None.

This looks like it's the only side effect. If so, I can just swap in 4s for 3s in the tests and they should all pass...

@jreback
Copy link
Contributor

jreback commented Mar 8, 2017

@ResidentMario all of the CI tests different things (well there is some overlap). For example appveyor doesn't run any slow tests. What you are changing are slow tests (most plotting tests are slow), so they pass. circleci mostly tests slow, but using different locales. Travis generally tests everything. only 1 run is slow.

So you need to click thru and examine the test failures and repro locally.

@ResidentMario
Copy link
Contributor Author

Done, thanks.

Suppose the following code example:

import numpy as np
from numpy import random

n = 100

gender = np.random.choice(['Male', 'Female'], size=n)
classroom = np.random.choice(['A', 'B', 'C'], size=n)

hist_df = pd.DataFrame({'gender': gender,
                        'classroom': classroom,
                        'height': random.normal(66, 4, size=n),
                        'weight': random.normal(161, 32, size=n),
                        'category': random.randint(4, size=n)})
hist_df.hist(column='height', by=hist_df.category, layout=(4, 2))

The user has reserved space for way more subplots than there are things to fill them.

The current behavior doesn't reserve any space for these additional plots:

image

The new behavior does:

image

It seems to me that it is impossible to have both this old space-preserving behavior and the new tight_layout compatibility at the same time. tight_layout effectively requires that everything has a bounding box, and having a bounding box requires space be reserved for said bounding box. So you can have one, or the other, but not both.

I don't think this old space-saving behavior is very useful, so I think that this is an acceptable trade-off to make. The user is manually specifying a layout that doesn't actually make sense; pandas doesn't necessarily have to conform the output to something that does. We can instead hold the user to providing input that makes sense in the first place.

@ResidentMario
Copy link
Contributor Author

ResidentMario commented Mar 8, 2017

OK, this is more complicated than I thought at first.

With:

hist_df.boxplot(column=['height', 'weight', 'category'], by='gender', return_type='axes')

Old:

image

New:

image

No good, obviously. But this should be fixable...

I wonder if this wouldn't be easier to fix in the matplotlib source. cf. matplotlib#8225.

@ResidentMario
Copy link
Contributor Author

Closing this PR as the fix is downstreamed to matplotlib, cf. here.

@ResidentMario ResidentMario deleted the mpl-fix branch March 12, 2017 23:51
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Mar 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame.hist() does not get along with matplotlib.pyplot.tight_layout()
3 participants