Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-108322: Optimize statistics.NormalDist.samples() #108324

Merged
merged 6 commits into from
Aug 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions Doc/library/statistics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -828,6 +828,11 @@ of applications in statistics.
number generator. This is useful for creating reproducible results,
even in a multi-threading context.

.. versionchanged:: 3.13

Switched to a faster algorithm. To reproduce samples from previous
versions, use :func:`random.seed` and :func:`random.gauss`.

.. method:: NormalDist.pdf(x)

Using a `probability density function (pdf)
Expand Down
12 changes: 7 additions & 5 deletions Lib/statistics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1135,7 +1135,7 @@ def linear_regression(x, y, /, *, proportional=False):
>>> noise = NormalDist().samples(5, seed=42)
>>> y = [3 * x[i] + 2 + noise[i] for i in range(5)]
>>> linear_regression(x, y) #doctest: +ELLIPSIS
LinearRegression(slope=3.09078914170..., intercept=1.75684970486...)
LinearRegression(slope=3.17495..., intercept=1.00925...)

If *proportional* is true, the independent variable *x* and the
dependent variable *y* are assumed to be directly proportional.
Expand All @@ -1148,7 +1148,7 @@ def linear_regression(x, y, /, *, proportional=False):

>>> y = [3 * x[i] + noise[i] for i in range(5)]
>>> linear_regression(x, y, proportional=True) #doctest: +ELLIPSIS
LinearRegression(slope=3.02447542484..., intercept=0.0)
LinearRegression(slope=2.90475..., intercept=0.0)

"""
n = len(x)
Expand Down Expand Up @@ -1279,9 +1279,11 @@ def from_samples(cls, data):

def samples(self, n, *, seed=None):
"Generate *n* samples for a given mean and standard deviation."
gauss = random.gauss if seed is None else random.Random(seed).gauss
mu, sigma = self._mu, self._sigma
return [gauss(mu, sigma) for _ in repeat(None, n)]
rnd = random.random if seed is None else random.Random(seed).random
inv_cdf = _normal_dist_inv_cdf
mu = self._mu
sigma = self._sigma
return [inv_cdf(rnd(), mu, sigma) for _ in repeat(None, n)]

def pdf(self, x):
"Probability density function. P(x <= X < x+dx) / dx"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Speed-up NormalDist.samples() by using the inverse CDF method instead of
calling random.gauss().