Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mstats.winsorize documentation needs clarification #7750

Closed
pv opened this issue Aug 18, 2017 · 1 comment · Fixed by #8356
Closed

mstats.winsorize documentation needs clarification #7750

pv opened this issue Aug 18, 2017 · 1 comment · Fixed by #8356
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Milestone

Comments

@pv
Copy link
Member

pv commented Aug 18, 2017

stats.mstats.winsorize documentation needs some clarification:

From #7717 (comment):

This shows the difference (look at the number of values at start/end that are identical):

In [11]: data
Out[11]: 
masked_array(data = [77 87 88 114 151 -- 219 246 253 262 296 299 306 376 428 515 666 1310 2611],
             mask = [False False False False False  True False False False False False False
 False False False False False False False],
       fill_value = 999999)

In [12]: mstats.winsorize(data, (0.2, 0.2))
Out[12]: 
masked_array(data = [114 114 114 114 151 515 219 246 253 262 296 299 306 376 428 515 515 515
 515],
             mask = [False False False False False False False False False False False False
 False False False False False False False],
       fill_value = 999999)

In [13]: mstats.winsorize(data, (0.2, 0.2), inclusive=(False, False))
Out[13]: 
masked_array(data = [151 151 151 151 151 428 219 246 253 262 296 299 306 376 428 428 428 428
 428],
             mask = [False False False False False False False False False False False False
 False False False False False False False],
       fill_value = 999999)

I think it's the description of inclusive in the docstring that's a little confusing, it talks about number of data being masked while the truncation is about indices used to construct the lowidx-upidx interval of data to keep. Can you try to reformulate that description?

@pv pv added Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats labels Aug 18, 2017
@pv pv changed the title mstats.winsorize documentation mstats.winsorize documentation needs clarification Aug 18, 2017
@deepakgouda
Copy link
Contributor

According to the documentation, if inclusive is True then the number of data being masked on each side should be rounded and if False, then it should be truncated. But the code in scipy/scipystats/mstats_basic.py line 1763

if low_include:          #low_include is the first entry of the inclusive tuple
       lowidx = int(low_limit * n)
else:
       lowidx = np.round(low_limit * n).astype(int)

does the opposite.

So, either the documentation has to be changed or the if clause. Shall I make a PR?

deepakgouda added a commit to deepakgouda/scipy that referenced this issue Feb 2, 2018
@ev-br ev-br added this to the 1.1.0 milestone Apr 8, 2018
ev-br added a commit that referenced this issue Apr 8, 2018
DOC: Corrected Documentation Issue #7750 winsorize function
StrahinjaLukic pushed a commit to StrahinjaLukic/scipy that referenced this issue Apr 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues related to the SciPy documentation. Also check https://github.com/scipy/scipy.org scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants