Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Large values in histograms not showing #609

Closed
LeoBreebaart opened this Issue · 2 comments

3 participants

@LeoBreebaart

I posted this to matplotlib-users, and was asked to file a bug for it. So:

It would appear that Axes.hist() does not handle large input
values the way I was expecting it to.

For example:

import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111)
# Plot as expected: single bar in the center:
#result = ax.hist([1.0e+14], 5)
# Plot remains completely empty:
result = ax.hist([1.0e+16], 5)
print "result:", result
plt.show()

My hypothesis is that the large value in x is causing the bin
interval size to become infinitesimally small, but is it
conceptually wrong of me to expect a histogram for such large
values to still work? If so, what would be a workaround? I don't
control the data I am trying to plot, and sometimes there's yes,
only a single value, and yes, it's that large...

(All this is done with matplotlib 1.1.0 on Debian stable (v6.0.x)
for Python 2.6.6. uname: Linux foo 2.6.32-5-686 # 1 SMP Mon
Oct 3 04:15:24 UTC 2011 i686 GNU/Linux).

@tonysyu

You're right: a large value for x causes the bin width to go to zero. If anything, the bug is in numpy, but I'm not really sure I'd call it a bug. For example, calling np.histogram([1e16], 3) gives

(array([0, 0, 1]),
 array([  1.00000000e+16,   1.00000000e+16,   1.00000000e+16,
         1.00000000e+16]))

Notice the bin edges---the second return value---are all equal. By default, numpy will make bins of width 0.1 for single data points (10 bins with in unit range surrounding single data point). This, combined with the fact that the resolution for float64 is 1e-15, means that the bin edges are indistinguishable. Maybe the default range for a single value should normalized (so that it scales up with the size), but that would be a big change in behavior (although this is only relevant to single valued data, which I imagine wouldn't affect many people).

As for work arounds, it depends on what you want. Personally, I'd just scale up the data range for single points. Something like:

x = [1.0e+16]
default_range = np.array((0.9, 1.1)) * x
result = ax.hist(x, 5, range=default_range)
@efiring
Owner

The explanation and work-around sound right, so I am closing this.

@efiring efiring closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.