# DOC: specify which definition of skewness is used #9505

Merged
merged 5 commits into from Dec 18, 2018
Merged

# DOC: specify which definition of skewness is used#9505

merged 5 commits into from Dec 18, 2018

## Conversation

Contributor

### sjpet commented Nov 19, 2018

 Since there are a few ways of estimating skewness, it would be helpful to specify which definition is used by scipy in the docstring of scipy.stats.skew.
 DOC: specify which definition of skewness is used 
 ba18ad5 
Member

### rlucas7 commented Nov 21, 2018

 @sjpet I remember in yulesimon distribution I had to sit down and convert either central to non-central or vice versa for skew + kurtosis , glad you're calling this out! I'm not sure what "adjusted Fisher-Pearson" means, Fisher and Pearson have many things attached to their names! The description is probably too specialized for a general user. Looks like it occurs only once on wiki page: https://en.wikipedia.org/wiki/Skewness perhaps a :math: equation in the doc-string would help clarify (make explicit) the formula used?
 Moved details to Notes and added equations 
 bd1281a 
Contributor Author

### sjpet commented Nov 26, 2018 • edited

 I moved the details to Notes as the docstring standard seems to suggest, where I also elaborated a bit and added equations, as suggested by @rlucas7.
Member

### rlucas7 commented on scipy/stats/stats.py in bd1281aNov 27, 2018

 Here I would add a parens after skewness and insert (g_1) to be clear to the reader that g_1 is the skewnewss. Also, beneath, the m_3, m_2 and k_3 could be anything (not defined in the docstring). I'd either use standard math/probability notations (expectation etc) or define the quantities in the docstring.
Contributor Author

### sjpet replied Nov 27, 2018

 I'm not sure I agree. g_1 is not the skewness, it's a specific estimator of the skewness, so it would be something like "The sample skewness is defined as the adjusted Fisher-Pearson standardized moment coefficient (g_1), i.e. g_1 = ..." which is rather redundant. If one wishes to be abundantly clear, sample skewness ≜ g_1 = ...  is an option. As for the second point, I could be persuaded that defining it in terms of expectation is reasonable, but I would expect anyone that is interested in knowing which skewness estimator is used would be familiar enough with statistics to understand what m means in this context.
Member

### rlucas7 replied Nov 27, 2018 • edited

 I'm not sure I agree. g_1 is not the skewness, it's a specific estimator of the skewness, agreed. The way the docstring is written the only way for a reader to determine this is skew statistic (and not skew parameter) is with the word sample or perhaps if they correctly infer what g_1, m_2 etc denote. If the goal is to make it clear and explicit to the reader which one is used, perhaps write something like: """ g_1 = \frac{\sum_{i=1}^N (x_i -\bar{x})^3}{s^3} where \bar{x} is the sample mean and s is the sample standard deviation. """ expect anyone that is interested in knowing which skewness estimator is used would be familiar enough with >statistics to understand what m means in this context. I'm not sure there are clear conventions on the letters used to denote sample skewness that is why I'm recommending to be explicit in the docstring via an equation.
added 2 commits Nov 28, 2018
 Additional mathematical clarifications 
 d3d60d4 
 Change n to N in equation for G_1 for consistency 
 3d4c007 
Contributor Author

### sjpet commented Nov 28, 2018

 Using s can be a trap since that is often used for the unbiased sample standard deviation, which isn't quite what is done here. We're using the biased sample third and second central moments, and given their similarity, it would be odd to clarify only one of them. My new suggestion adds an equation for m_i as the (biased sample) i:th central moment.
Member

### rlucas7 commented Nov 30, 2018

 I'm not sure if the circleci failures are just flaky tests or a setup issue. Does your dev version build and pass all tests successfully? what the errors appear to be in circleci are similar to what I was getting from the most recent release (1.2.0). Here is the issue and how I resolved it, (if it helps) #9496 … On Wed, Nov 28, 2018 at 3:46 AM Stefan Peterson ***@***.***> wrote: Using s can be a trap since that is often used for the unbiased sample standard deviation, which isn't quite what is done here. We're using the biased sample third and second central moments, and given their similarity, it would be odd to clarify only one of them. My new suggestion adds an equation for m_i as the (biased sample) i:th central moment. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9505 (comment)>, or mute the thread . -- Sincerely, -Lucas
Contributor Author

### sjpet commented Nov 30, 2018

 I'm sorry, but that doesn't appear to be similar at all. The issue you linked stopped during test collection due to an import error, whereas the CircleCI failures here are two testcases failing: one of them (scipy/sparse/linalg/isolve/tests/test_lgmres.py::TestLGMRES::test_arnoldi) due to a crashed worker and the other one (scipy/spatial/tests/test_kdtree.py::test_ckdtree_memuse) due to an apparent memory leak. I dare say none of them is related to a docstring change in scipy.stats and besides, they both pass on my local machine.
Member

### rgommers commented Nov 30, 2018

 These failures are common and can be ignored. We're about to resolve the PyPy issues in another PR.
Member

### rlucas7 commented Dec 1, 2018

 These failures are common and can be ignored. Good to know, Thanks Ralf. @sjpet I agree that you should ignore the errors on the CI.
Member

### rlucas7 commented Dec 6, 2018 • edited

 Using s can be a trap since that is often used for the unbiased sample standard deviation, which isn't quite what is done here. We're using the biased sample third and second central moments, and given their similarity, it would be odd to clarify only one of them. My new suggestion adds an equation for m_i as the (biased sample) i:th central moment. Great. This is very clear and explicit for the general reader, thanks for making these changes. My only remaining suggestion is to not use a superscript on $i^th$. The word $i$th is a word in english; https://en.wiktionary.org/wiki/ith unfortunately lots of papers and books use the super scripting needlessly perpetuating the usage.
 Drop superscript for ith 
 62cf02f 
Contributor Author

### sjpet commented Dec 6, 2018

 My only remaining suggestion is to not use a superscript on $i^th$. The word $i$th is a word in english; https://en.wiktionary.org/wiki/ith unfortunately lots of papers and books use the super scripting needlessly perpetuating the usage. As far as I am able to find, using superscript for the ordinal indicator is not incorrect, but a matter of style and perhaps somewhat archaic and generally not recommended (http://publish.ucc.ie/doc/wordedit?sectoc=5#noautostyle, https://english.stackexchange.com/questions/111265/should-ordinal-indicators-be-inline). I have updated accordingly.
Member

### rlucas7 commented Dec 7, 2018

 Thanks @sjpet, Not superscripting is my preference-respecting what came before-if superscripting is common practice in the current docstrings we should maintain consistent usage. To be clear my preference is not superscripting, but I’m flexible on the preference. 👍 Sincerely, … -Lucas Roberts On Dec 6, 2018, at 2:40 AM, Stefan Peterson ***@***.***> wrote: My only remaining suggestion is to not use a superscript on $i^th$. The word $i$th is a word in english; https://en.wiktionary.org/wiki/ith unfortunately lots of papers and books use the super scripting needlessly perpetuating the usage. As far as I am able to find, using superscript for the ordinal indicator is not incorrect, but a matter of style and perhaps somewhat archaic and generally not recommended (http://publish.ucc.ie/doc/wordedit?sectoc=5#noautostyle, https://english.stackexchange.com/questions/111265/should-ordinal-indicators-be-inline). I have updated accordingly. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Contributor Author

### sjpet commented Dec 17, 2018

 I think we're ready then. Anything else needed for this to be merged, @rgommers?
Member

### rgommers commented Dec 18, 2018

 Rendered math looks good to me, and this looks like a useful doc improvement. Thanks @sjpet, and thanks @rlucas7 for reviewing! Merging.
merged commit 0464ae0 into scipy:master Dec 18, 2018
4 checks passed
4 checks passed
ci/circleci: build_docs Your tests passed on CircleCI!
Details
ci/circleci: pypy3 Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
added this to the 1.3.0 milestone Dec 18, 2018