Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large values in histograms not showing #609

Closed
LeoBreebaart opened this issue Dec 6, 2011 · 6 comments
Closed

Large values in histograms not showing #609

LeoBreebaart opened this issue Dec 6, 2011 · 6 comments

Comments

@LeoBreebaart
Copy link

I posted this to matplotlib-users, and was asked to file a bug for it. So:

It would appear that Axes.hist() does not handle large input
values the way I was expecting it to.

For example:

import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
# Plot as expected: single bar in the center:
#result = ax.hist([1.0e+14], 5)
# Plot remains completely empty:
result = ax.hist([1.0e+16], 5)
print "result:", result
plt.show()

My hypothesis is that the large value in x is causing the bin
interval size to become infinitesimally small, but is it
conceptually wrong of me to expect a histogram for such large
values to still work? If so, what would be a workaround? I don't
control the data I am trying to plot, and sometimes there's yes,
only a single value, and yes, it's that large...

(All this is done with matplotlib 1.1.0 on Debian stable (v6.0.x)
for Python 2.6.6. uname: Linux foo 2.6.32-5-686 # 1 SMP Mon
Oct 3 04:15:24 UTC 2011 i686 GNU/Linux).

@tonysyu
Copy link
Contributor

tonysyu commented Jan 2, 2012

You're right: a large value for x causes the bin width to go to zero. If anything, the bug is in numpy, but I'm not really sure I'd call it a bug. For example, calling np.histogram([1e16], 3) gives

(array([0, 0, 1]),
 array([  1.00000000e+16,   1.00000000e+16,   1.00000000e+16,
         1.00000000e+16]))

Notice the bin edges---the second return value---are all equal. By default, numpy will make bins of width 0.1 for single data points (10 bins with in unit range surrounding single data point). This, combined with the fact that the resolution for float64 is 1e-15, means that the bin edges are indistinguishable. Maybe the default range for a single value should normalized (so that it scales up with the size), but that would be a big change in behavior (although this is only relevant to single valued data, which I imagine wouldn't affect many people).

As for work arounds, it depends on what you want. Personally, I'd just scale up the data range for single points. Something like:

x = [1.0e+16]
default_range = np.array((0.9, 1.1)) * x
result = ax.hist(x, 5, range=default_range)

@efiring
Copy link
Member

efiring commented May 29, 2013

The explanation and work-around sound right, so I am closing this.

@efiring efiring closed this as completed May 29, 2013
@rlpitts
Copy link

rlpitts commented Mar 14, 2018

I'd really like to reopen this because hexbin does the same thing and there's no range kwarg I can use to apply this workaround.

@efiring
Copy link
Member

efiring commented Mar 14, 2018

@rlpitts I suggest you post a message to matplotlib-users giving a simple example of what you want to do, and what the present problem or limitation is.

@efiring
Copy link
Member

efiring commented Mar 14, 2018

@rlpitts, I'm not sure I understand the problem you are referring to, but I think the "extent" kwarg of hexbin is analogous to "range" in hist.

@rlpitts
Copy link

rlpitts commented Mar 15, 2018

I have a map of temperatures and a map of molecular hydrogen column densities for some nebulae. The temperatures are in the 10s of K, but the column densities are typically between 10^24 and 10^27 H2/m^2. I'm trying to make some hexbin 2D histograms of the column density vs. temperature distribution so I could try to fit a line to it and see if it has the slope I predict, but the plots just end up blank with no error message (it's worth noting that I'm using cm.cubehelix_r so bins with a single value would be white). My work-around was to put them both in log space, but people in the field aren't used to working in units of log(T) and it obscures the point I'm trying to make.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants