Percentile 1 method

Evgenii Neumerzhitckii edited this page Aug 6, 2017 · 53 revisions

The percentile method is used to find a percentile value in the dataset. This algorithm is also used by PERCENTILE or PERCENTILE.INC functions in Microsoft Excel and PERCENTILE function in Google Docs Sheets. This method is the same as the 7th sample quantile method from the Hyndman and Fan paper (1996).

The function receives two parameters:

  • values: array of values in the dataset.
  • percentile: percentile between 0 and 1 inclusive.

For example, suppose we want to calculate the 40th percentile for the following measurements: 35, 20, 50, 40, 15. We call the percentile and pass the measurements and the percentile as decimal value 0.4.

Sigma.percentile([35, 20, 50, 40, 15], percentile: 0.4)
// Result: 29

We use the following algorithm to calculate the percentile value:

1) Sort the dataset

Firstly, we sort the dataset from lowest to highest values.

sortedValues = [15, 20, 35, 40, 50]

2) Find the rank

Secondly, we find the rank of 40th percentile. Rank is the position of an element in the dataset. For example, rank 1 is the first element, 2 is the second. This value can be a decimal, for example, 3.35 will be used to find the value between the third and fourth elements.

Equation 1

rank = percentile * (count - 1) + 1    

Where:

  • percentile is the percentile argument of the function.
  • count is the size of the dataset which is equal to the size of the array passed as values argument.

We substitute the arguments into Equation 1:

rank = 0.4 * (5 - 1) = 2.6

3) Get the integer and fractional part of the rank

Next we calculate the integer and fractional parts of the rank. The integer part of 2.6 is 2 and the fractional part of 2.6 is 0.6.

rankInteger = 2
rankFraction = 0.6

4) Find the element at rank

Then, we look at the ordered dataset {15, 20, 35, 40, 50} and find the element corresponding to the rankInteger and rankInteger + 1. In our example, rankInteger is 2, therefore, we need to find the second and third elements which are 20 and 35.

elementValue = 20
elementPlusOneValue = 35

5) Calculate the percentile value

Finally, we calculate the resulting percentile value by interpolating between elementValue and elementPlusOneValue values according to the rankFraction.

percentileValue = elementValue + rankFraction * (elementPlusOneValue - elementValue)
percentileValue = 20 + 0.6 * (35 - 20) = 29

Edge cases

The algorithm is undefined and returns nil in the following situations.

  1. The supplied values array is empty.
  2. The supplied percentile value is negative or greater than 1.

0th, 50th and 100th percentiles

  • The percentile argument of 0 will return the minimum value in the dataset.
  • The percentile argument of 0.5 returns the median value.
  • The percentile argument of 1 returns the maximum value from the dataset.

The result

The algorithm described above found the 40th percentile to be 29 for the set of measurements: 35, 20, 50, 40, 15.

References

Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.