Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: stats.truncnorm: add example about truncation points #18477

Merged
merged 4 commits into from Jun 19, 2023

Conversation

YoussefAli1
Copy link
Contributor

Reference issue

Attempted to solve the documentation problem outlined in issue #18316. The author of this thread was confused on the functionality of scipy.stats.truncnorm.rvs(), and thought that the a, b arguments represented raw values at which the normal distribution is to be cut off, whereas these arguments actually represented the number of standard deviations below and above the mean value at which the distribution is to be truncated.

What does this implement/fix?

Attempted to clarify the documentation to better communicate the purpose of these arguments and forgo any confusion.

Additional information

This is my first open-source contribution. Please let me know how I can better help your project!

@mdhaber
Copy link
Contributor

mdhaber commented May 17, 2023

@YoussefAli1 since this is a doc-only PR, please include [skip actions] [skip cirrus] in the body of your commit messages to avoid running CI jobs that are not needed. (No problem; this is specific to SciPy and not something you'd be expected to know before contributing.)

This change may be helpful, but since this issue keeps coming up, I'm not sure if this slight rewording (while helpful) is enough. In gh-18316, I suggested an example. Can you add something like that to this PR?

@YoussefAli1
Copy link
Contributor Author

Hi @mdhaber, first off, thanks for the clarifying how I should structure my commits. Also, after rereading my edit, I think you’re right in that it’s too small of a change. I will word it better, keeping the example you posted in the thread in mind.

@j-bowhay j-bowhay added scipy.stats Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org labels May 17, 2023
@YoussefAli1
Copy link
Contributor Author

Hi @mdhaber, I've edited the notes and added some code examples of the function's workings with explanatory text, similar to the one you posted in the thread you linked. If these are acceptable, I'll resubmit these edits for a pull request.

class truncnorm_gen(rv_continuous):
r"""A truncated normal continuous random variable.
%(before_notes)s

Notes
-----
A truncated normal distribution of mean ``loc`` (default
0), and standard deviation ``scale`` (default 1). ``a`` and
``b`` denote the number of standard deviations to the left and right 
(respectively) of ``loc`` at which the normal distribution will be truncated.

If using raw integers ``myclip_a`` and ``myclip_b`` in sample space 
as truncation values instead of the number of standard deviations to 
the left and right of ``loc``, then the proper arguments ``a`` and
``b`` can be obtained according to the formula::

    a, b = (myclip_a - loc) / scale, (myclip_b - loc) / scale

Examples
--------
>>> import matplotlib.pyplot as plt
>>> import scipy.stats as stats

Set function arguments::

>>> loc = 0
>>> scale = 2
>>> a = -1.5
>>> b = 1.5
    
Truncate a normal distribution generated using .rvs()::

>>> norm = stats.truncnorm.rvs(a=a, b=b, loc=loc, scale=scale, size=100000)

Plot this truncated distribution::

>>> plt.hist(norm, bins=50)
>>> plt.show()

If the user wishes to truncate the normal distribution at -1.5 and
1.5 as raw values, and *not* as a number of standard deviations
away from ``loc``, these raw values must be converted using the
aforementioned formula::

>>> loc = 0
>>> scale = 2
>>> myclip_a, myclip_b = -1.5, 1.5
>>> a, b = (myclip_a - loc) / scale, (myclip_b - loc) / scale

>>> norm = stats.truncnorm.rvs(a=a, b=b, loc=loc, scale=scale, size=100000)
>>> plt.hist(norm, bins=50)
>>> plt.show()

"""

@mdhaber
Copy link
Contributor

mdhaber commented May 19, 2023

Look at the current documentation of truncnorm:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html

We would not want to replace that example entirely but extend it. Your new text should go after this line.
https://github.com/scipy/scipy/blob/cdd65461d090926bd006884932e6c7d724434980/scipy/stats/_continuous_distns.py#LL9219C16-L9219C16

It would be better to use a different value for loc in your example so the user can what changes in full generality.

I would suggest beginning with a bridge between the examples that are already there and the example you are adding. Something like:

In the examples above, loc=0 and scale=1, so the plot is truncated at a on the left and b on the right. However, we were to produce the same histogram with loc = 1 and scale=2.

(make the histogram again with those values)

Continue by explaining why the truncation points appear to have changed and how to interpret them. Finish by showing how we can make the truncation points actually be a and b again.

Please go ahead and try that, then push the changes here. (Actually, before you make another commit, please merge upstream/main. E.g. check out your branch, git fetch upstream, git merge upstream/main. This should fix the issue with CircleCI, which does not merge main with your branch automatically.)

@mdhaber
Copy link
Contributor

mdhaber commented Jun 9, 2023

Since CircleCI is not working, can someone build the documentation locally and post a screenshot?

@mdhaber mdhaber changed the title Clarified documentation for truncnorm_gen() DOC: stats.truncnorm: add example about truncation points Jun 9, 2023
Copy link
Contributor

@mdhaber mdhaber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the parameter names should have single backticks, but that's not how the rest of this documentation was before, so let's just be consistent.

scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved
scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved
scipy/stats/_continuous_distns.py Outdated Show resolved Hide resolved
@mdhaber mdhaber merged commit c634fb9 into scipy:main Jun 19, 2023
1 check passed
@j-bowhay j-bowhay added this to the 1.12.0 milestone Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants