pyplot.hist: normalization fails #6357

khyox · 2016-05-01T13:43:26Z

Matplotlib 1.5.1 under python 3.5.1.
The provided example for pyplot.hist is not working as expected for different mu and sigma values:

"""
Demo of the histogram (hist) function with a few features.

In addition to the basic histogram, this demo shows a few optional features:

    * Setting the number of data bins
    * The ``normed`` flag, which normalizes bin heights so that the integral of
      the histogram is 1. The resulting histogram is a probability density.
    * Setting the face color of the bars
    * Setting the opacity (alpha value).

"""
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt


# example data
mu = 0.5*5  # mean of distribution
sigma = 0.1*5  # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Normed histogram of IQ: $\mu=%f$, $\sigma=%f$' % (mu, sigma))

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.show()

If the parameters mu and sigma are reduced to:

mu = 0.5*2  # mean of distribution
sigma = 0.1*2  # standard deviation of distribution

the plot becomes even more clearly unnormalized:

This fail with normalization also happens for non-gaussian datasets.

jenshnielsen · 2016-05-01T16:15:05Z

Are you sure that you are interpreting the normalization correctly? It normalizes the area under the curve not the sum of the counts. See #6353 for a similar issue.
I haven't done an independent calculation of the normalization but the area under the curve in the last example is roughly (drawing a triangle around the data) 1*2*(0.5) = 1 so I don't see how this is clearly not normalized.

khyox · 2016-05-01T18:06:34Z

@jenshnielsen Thank you very much for your fast and useful reply. OMG, I was clearly misinterpreting the normalization despite all the comments. I took for granted the approach in which the normalized count is the count in a class divided by the total number of observations. Further away my slip, I suggest to improve the web documentation of pyplot.hist in two ways:

An improved explanation for the normalization factor, so that it clearly shows that the normalized count is the count in the class divided by the number of observations times the class width. I think that (n/(len(x)dbin)` (literally) is not very clear in that sense.
And, please, in the provided example in the doc replace:

plt.ylabel('Probability')

with:

plt.ylabel('Probability Density')

Thanks again!

tacaswell · 2016-05-01T18:13:45Z

@khyox Can you open up a pull request making those changes to the documentation?

khyox · 2016-05-01T18:45:13Z

@tacaswell Of course, Thomas, and very sorry for not having done it yet (about to travel to Japan for the 1st time and a bit bewildered... 👾).

khyox · 2016-05-08T09:14:55Z

PR ready! 📦

DOC: Minor improvements to histogram docs closes #6353 and closes #6357

khyox changed the title ~~pyplot.hist: normalization fails for input values under some threshold~~ pyplot.hist: normalization fails May 1, 2016

khyox closed this as completed May 1, 2016

khyox mentioned this issue May 8, 2016

Minor improvements concerning #6353 and #6357 #6386

Merged

tacaswell added a commit that referenced this issue May 9, 2016

Merge pull request #6386 from khyox/master

aeb6112

DOC: Minor improvements to histogram docs closes #6353 and closes #6357

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyplot.hist: normalization fails #6357

pyplot.hist: normalization fails #6357

khyox commented May 1, 2016 •

edited

jenshnielsen commented May 1, 2016

khyox commented May 1, 2016

tacaswell commented May 1, 2016

khyox commented May 1, 2016

khyox commented May 8, 2016

pyplot.hist: normalization fails #6357

pyplot.hist: normalization fails #6357

Comments

khyox commented May 1, 2016 • edited

jenshnielsen commented May 1, 2016

khyox commented May 1, 2016

tacaswell commented May 1, 2016

khyox commented May 1, 2016

khyox commented May 8, 2016

khyox commented May 1, 2016 •

edited