Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram gap artifacts #4340

Closed
fonnesbeck opened this issue Apr 15, 2015 · 4 comments
Closed

Histogram gap artifacts #4340

fonnesbeck opened this issue Apr 15, 2015 · 4 comments

Comments

@fonnesbeck
Copy link

I'm hoping there is a way to improve the default behavior of hist, which often generates gap artifacts between classes in histograms. For example, even using clean integer data that has natural breaks:

lsl_dr.degree_hl.dropna().value_counts()

6    10446
3     3127
4     3041
5     2705
2     1416
0      549
1      170
dtype: int64

I get a result like the following:

bad hist

This plot is misleading because it suggests there are intervening classes with zero counts in between the bars. I assume this is a solved problem, since it does not occur in other packages (e.g. ggplot).

@fonnesbeck
Copy link
Author

I'm going to try my hand at a pull request for this. Stay tuned.

@tacaswell
Copy link
Member

Pretty sure the issue is that the default number of bins in 10. We don't currently try to do anything intelligent on guessing at 'good' values for bins.

Also see #4316

@efiring
Copy link
Member

efiring commented Apr 15, 2015

On 2015/04/15 3:54 AM, Chris Fonnesbeck wrote:

I'm going to try my hand at a pull request for this. Stay tuned.

Chris, check the numpy mailing list. The pull request for this should go
to numpy, not matplotlib. There is a discussion going on now about
this, on the numpy list. Moving that along, to ensure the functionality
gets into numpy.histogram soon, would be good. (It might be in danger
of being delayed or derailed by overly complex proposals.)

If auto bin selection is added to numpy.histogram, we will support it in
matplotlib. Depending on how complicated it is we could consider
including a backport to handle the case of people using up-to-date
matplotlib with older numpy.

@tacaswell
Copy link
Member

closing as not-a-matplotlib problem.

A version of numpy with 'auto' as a valid bin argument is now out (but I don't think it works well with categorical axes yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants