Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyplot.hist: normalization fails #6357

Closed
khyox opened this issue May 1, 2016 · 5 comments
Closed

pyplot.hist: normalization fails #6357

khyox opened this issue May 1, 2016 · 5 comments

Comments

@khyox
Copy link
Contributor

khyox commented May 1, 2016

Matplotlib 1.5.1 under python 3.5.1.
The provided example for pyplot.hist is not working as expected for different mu and sigma values:

"""
Demo of the histogram (hist) function with a few features.

In addition to the basic histogram, this demo shows a few optional features:

    * Setting the number of data bins
    * The ``normed`` flag, which normalizes bin heights so that the integral of
      the histogram is 1. The resulting histogram is a probability density.
    * Setting the face color of the bars
    * Setting the opacity (alpha value).

"""
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt


# example data
mu = 0.5*5  # mean of distribution
sigma = 0.1*5  # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r--')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Normed histogram of IQ: $\mu=%f$, $\sigma=%f$' % (mu, sigma))

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.show()

hist_normed

If the parameters mu and sigma are reduced to:

mu = 0.5*2  # mean of distribution
sigma = 0.1*2  # standard deviation of distribution

the plot becomes even more clearly unnormalized:

hist_normed_fail

This fail with normalization also happens for non-gaussian datasets.

@khyox khyox changed the title pyplot.hist: normalization fails for input values under some threshold pyplot.hist: normalization fails May 1, 2016
@jenshnielsen
Copy link
Member

Are you sure that you are interpreting the normalization correctly? It normalizes the area under the curve not the sum of the counts. See #6353 for a similar issue.
I haven't done an independent calculation of the normalization but the area under the curve in the last example is roughly (drawing a triangle around the data) 1*2*(0.5) = 1 so I don't see how this is clearly not normalized.

@khyox
Copy link
Contributor Author

khyox commented May 1, 2016

@jenshnielsen Thank you very much for your fast and useful reply. OMG, I was clearly misinterpreting the normalization despite all the comments. I took for granted the approach in which the normalized count is the count in a class divided by the total number of observations. Further away my slip, I suggest to improve the web documentation of pyplot.hist in two ways:

  • An improved explanation for the normalization factor, so that it clearly shows that the normalized count is the count in the class divided by the number of observations times the class width. I think that (n/(len(x)dbin)` (literally) is not very clear in that sense.
  • And, please, in the provided example in the doc replace:
plt.ylabel('Probability')

with:

plt.ylabel('Probability Density')

Thanks again!

@khyox khyox closed this as completed May 1, 2016
@tacaswell
Copy link
Member

@khyox Can you open up a pull request making those changes to the documentation?

@khyox
Copy link
Contributor Author

khyox commented May 1, 2016

@tacaswell Of course, Thomas, and very sorry for not having done it yet (about to travel to Japan for the 1st time and a bit bewildered... 👾).

@khyox
Copy link
Contributor Author

khyox commented May 8, 2016

PR ready! 📦

tacaswell added a commit that referenced this issue May 9, 2016
DOC: Minor improvements to histogram docs

closes #6353 and closes #6357
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants