# Quartiles, Quantiles, and Interquartile Range

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Quartiles" data-toc-modified-id="Quartiles-1">Quartiles</a></span></li><li><span><a href="#Quantiles" data-toc-modified-id="Quantiles-2">Quantiles</a></span><ul class="toc-item"><li><span><a href="#Many-Quantiles" data-toc-modified-id="Many-Quantiles-2.1">Many Quantiles</a></span></li><li><span><a href="#Common-Quantiles" data-toc-modified-id="Common-Quantiles-2.2">Common Quantiles</a></span></li></ul></li><li><span><a href="#Interquartile-Range" data-toc-modified-id="Interquartile-Range-3">Interquartile Range</a></span></li></ul></div>

## Quartiles

![quartiles.svg](attachment:quartiles.svg)

The values that split the data into fourths are the quartiles. The first quartile (Q1) is `10`, the second quartile (Q2) is `13`, and the third quartile (Q3) is `22`. These three values split the data into four groups that each contain five datapoints.

The base R function `quantile()` is used to find quartiles.

**Example:**

Find the first, second, and third quartiles of the given dataset:

In [13]:
dataset <- c(50, 10, 4, -3, 4, -20, 2)

first_quartile <- quantile(dataset, 0.25)
second_quartile <- quantile(dataset, 0.5)
third_quartile <- quantile(dataset, 0.75)

first_quartile 
second_quartile 
third_quartile

## Quantiles

The base R function, `quantile()` calculates the quantiles of a dataset. 

The first parameter of `quantile()` is the dataset you are using. The second parameter is a single number or a vector of numbers between `0` and `1`. These numbers represent the places in the data where you want to split

**Example:**

Find the value that splits the first 10% of the data apart from the remaining 90%

In [15]:
dataset <- c(5, 10, -20, 42, -9, 10)
ten_percent <- quantile(dataset, 0.10)
ten_percent

This result technically isn’t a quantile, because it isn’t splitting the dataset into groups of equal sizes.

However, it is useful if you were curious about whether a data point was in the bottom 10% of the dataset.

### Many Quantiles

Quantiles are usually a set of values that split the data into groups of equal size.

**Example:**

To get the 5-quantiles, or the four values that split the data into five groups of equal size:

In [16]:
dataset <- c(5, 10, -20, 42, -9, 10)
ten_percent <- quantile(dataset, c(0.2, 0.4, 0.6, 0.8))
ten_percent

### Common Quantiles

* The `2-quantile` splits the data into two groups of equal size. Half the data will be above this value, and half the data will be below it. This is also known as the `median`<br><br>

* The `4-quantiles`, or the `quartiles`, split the data into four groups of equal size<br><br>

* The `percentiles`, split the data into 100 groups. They are commonly used to compare new data points to the dataset. For example, if your height is above the 80th percentile, your height is above whatever value splits the first 80% of the data from the remaining 20%

## Interquartile Range

![outliers.svg](attachment:outliers.svg)

In this image, most of the data is between `0` and `15`. However, there is one large negative outlier (`-20`) and one large positive outlier (`40`). This makes the range of the dataset `60` (The difference between `40` and `-20`). That’s not very representative of the spread of the majority of the data!

The `interquartile range (IQR)` is a descriptive statistic that tries to solve this problem. The IQR ignores the tails of the dataset, so you know the range around-which your data is centered.

The interquartile range is the difference between the third quartile (Q3) and the first quartile (Q1)

The stats library has an `IQR()` function that can calculate the interquartile range.

In [17]:
dataset = c(4, 10, 38, 85, 193)
interquartile_range = IQR(dataset)
interquartile_range