Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG] sample from a truncated normal instead of clipping samples from a normal #12177
Related to "Merge iterativeimputer branch into master" #11977
What does this implement/fix? Explain your changes.
When sampling from the posterior and boundary values are given, the current implementation clips values that are sampled from normal distribution. This can lead to undesired oversampling of the boundary values. For example, if our boundaries were [0, 2] and we have a mean of 0 and standard deviation of 1, the difference between sampling from a normal and clipping and a truncated normal is shown here:
from scipy.stats import norm, truncnorm import matplotlib.pyplot as plt import numpy as np norm_dis = norm(loc=0, scale=1) truc_dis = truncnorm(a=0, b=2, loc=0, scale=1) trucs = truc_dis.rvs(10000) norms = norm_dis.rvs(10000) norms = np.clip(norms, 0, 2) plt.hist(norms, histtype='step') plt.hist(trucs, histtype='step') plt.legend(["clipped normal", "truncated normal"]) plt.show()
When sampling from the posterior, this PR samples from a truncated normal distribution instead of clipping values that have been sampled from a normal distribution.
Any other comments?
To impute values within boundaries, the
 page 149, https://cran.r-project.org/web/packages/mice/mice.pdf
Some tests fail with older versions of scipy. I think we might need a backport in
Actually, based on the consensus emerging from the discussion in #12184, I don't think we need to backport: scikit-learn 0.21 will require scipy >= 0.17.0.
So please remove the backport and instead skip the tests that fail because of the lack of
# TODO remove the skipif marker as soon as scipy < 0.17 support is dropped @pytest.mark.skipif(not hasattr(scipy.stats, 'truncnorm'), 'need scipy.stats.truncnorm') def test_something(): ...
ogrisel left a comment
Here are some comments.
The new test is nice but it does really check that truncated normal != clipped normal right? But I guess this is fine. I am not sure what we could do to test better.