Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use "percentile" as it is defined in statistics #499

Closed
wants to merge 1 commit into from

Conversation

bitglue
Copy link

@bitglue bitglue commented Mar 26, 2015

Merriam-Webster defines percentile as:

  • one of 100 equal parts that a group of people can be divided into in order to rank them
  • a value on a scale of 100 that indicates the percent of a distribution that is equal to or below it <a score in the 95th percentile>

Notice that when we are talking about a "percentile" as a subset of the sample, each percentile is 1% of the samples. For example, if the sample contains 100 values, the 90th percentile is 1 value, not 90 as the previous usage implies.

Relevant issue: #157

Merriam-Webster defines percentile as:

- one of 100 equal parts that a group of people can be divided into in
  order to rank them
- a value on a scale of 100 that indicates the percent of a distribution
  that is equal to or below it <a score in the 95th percentile>

Notice that when we are talking about a "percentile" as a subset of the
sample, each percentile is 1% of the samples. For example, if the sample
contains 100 values, the 90th percentile is 1 value, not 90 as the
previous usage implies.
@terrorobe
Copy link

terrorobe commented Aug 21, 2015

+1 on the clarification!

This leaves no room to interpretation, which is good when it comes to metrics.

@bitglue
Copy link
Author

bitglue commented Mar 15, 2016

@sam-at-github I'm not sure that concise explanation makes sense to me, or that I agree that it's congruent with the definition of "percentile" that comes to mind when I think of the word. When I hear "percentile", I think of a value (not a group of samples) which divides the samples such that some percentage is below, and the rest are above.

The median is the 50th percentile, meaning it's the value below which half the samples fall, and half above. It's not a group of samples: it's a value.

@sgpinkus
Copy link

sgpinkus commented Mar 24, 2016

Oh your right. Not sure what I was thinking. Sorry. A percentile is a value. Every definition I have seen is consistent with the wikipedia definition:

A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20 percent of the observations may be found.

But then, wouldn't a more concise chang be:

Note that the mean metric is the mean value of all timings recorded during the flush interval whereas mean_$PCT is the mean of all timings which fell between the zeroth and the $PCT percentile for that flush interval. And the same holds for sum and upper. See issue #157 for a more detailed explanation of the calculation."

Four words and a link?

@terrorobe
Copy link

terrorobe commented Mar 24, 2016

@sam-at-github With documentation you shouldn't aim for concise, you should aim for understandable. @bitglue proposed a change which is - as far as I can tell - correct, explains each value in detail and provides pointers for people who don't have a background in statistics to know what they're exactly looking at.

@sgpinkus
Copy link

sgpinkus commented Mar 24, 2016

IMO it is too wordy. That's why I commented in the first place. Concise and understandable aren't mutually exclusive. Rather they are complementary. Both are valid goals.

@bitglue
Copy link
Author

bitglue commented Mar 24, 2016

If someone at etsy shows some interest in merging this, I could certainly work on the wording to make it more concise.

@coykitten
Copy link
Contributor

coykitten commented Mar 24, 2016

@bitglue Definitely interested. I'd be happy to see concise language in the first paragraph with the detailed information listed as examples with extra context.

@bitglue
Copy link
Author

bitglue commented Jul 2, 2019

Giving up on this 4-year-old change.

@bitglue bitglue closed this Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants