# Quartiles and Interquartile Range (IQR)

Quartiles and the Interquartile Range (IQR) are measures that provide information about the *spread* and *distribution* of a dataset, particularly focusing on how the data is distributed around the median. They divide a dataset into four equal parts.

We can use quartiles and IQR to:

* **Summarize Data:** Providing key values describing the distribution.
* **Compare Datasets:** Evaluating differences in spread and central tendency.
* **Identify Outliers:** Establishing thresholds for unusually high or low values.
* **Understand Distributions:** Gaining insight into data spread and skewness.
* **Data Analysis:** Using the values before further analysis.

---

## Quartiles (Q1, Q2, Q3)
**Q2 (Second Quartile / Median)** is the middle value of the dataset.  Divides the data into two halves (50% of the data below, 50% above). This is the same as the median we discussed previously.

**Q1 (First Quartile / Lower Quartile)** is the median of the *lower half* of the data.  25% of the data falls below Q1, and 75% falls above. It is also known as 25th percentile.

**Q3 (Third Quartile / Upper Quartile)** is the median of the *upper half* of the data.  75% of the data falls below Q3, and 25% falls above. It is also known as 75th percentile.

**Important Note:** There are slightly different methods for calculating Q1 and Q3, especially when dealing with datasets with an even number of data points. There isn't one universally agreed-upon method. The methods described below are common and illustrate the core concepts.

--

## Calculating Quartiles: Example 1 (Even Number of Data Points)

Let's say we have the following golf scores (already sorted):

`66, 67, 67, 68, 68, 68, 68, 69, 69, 69, 70, 70, 70, 71, 71, 72, 73, 75` (18 data points)

1.  **Find Q2 (Median):** Since there are an even number of data points, the median is the average of the two middle values: (69 + 69) / 2 = 69.
2.  **Find Q1 (Lower Quartile):** Find the median of the *lower half* of the data (excluding the overall median if the total number of data points is odd – in this case, we have an even number so the median is not a datapoint).
    *   Lower half: `66, 67, 67, 68, 68, 68, 68, 69, 69` (9 data points)
    *   Q1 is the middle value of this lower half: 68
3.  **Find Q3 (Upper Quartile):** Find the median of the *upper half* of the data (excluding the overall median if it's an odd number of data points).
    *   Upper half: `69, 69, 70, 70, 70, 71, 71, 72, 73, 75` (9 data points)
    *   Q3 is the middle value of this upper half: 71

---
## Calculating Quartiles: Example 2 (Odd Number of Data Points)
Let's remove the value of 68.

`66, 67, 67, 68, 68, 68, 69, 69, 69, 70, 70, 70, 71, 71, 72, 73, 75` (17 data points)

1.  **Find Q2 (Median):** Since there are an odd number of data points, the median is the single middle value: 69.
2.  **Find Q1 (Lower Quartile):** Find the median of the *lower half* of the data (*excluding* the overall median).
    *   Lower half: `66, 67, 67, 68, 68, 68, 68, 69` (8 data points)
    *  The lower half has an even number of data points. Q1 is the average of the middle two values: (68 + 68)/2 = 68
3.  **Find Q3 (Upper Quartile):** Find the median of the *upper half* of the data (*excluding* the overall median).
    *   Upper half: `69, 70, 70, 70, 71, 71, 72, 73, 75` (8 data points)
    * The upper half has an even number of data points. Q3 is the average of the middle two values: (70+71)/2 = 70.5

---

## Interquartile Range (IQR)
The IQR is the difference between the third quartile (Q3) and the first quartile (Q1).  It represents the range of the *middle 50%* of the data.

**Formula:**  

$IQR = Q3 - Q1$

**Example 1:** IQR = 71 - 68 = 3
**Example 2:** IQR = 71 - 68 = 3

*Larger* IQR indicates more spread.

---

## Five-Number Summary
The five-number summary provides a concise description of a dataset's distribution:

1.  **Minimum:** The smallest value in the dataset.
2.  **Q1 (First Quartile):** The 25th percentile.
3.  **Median (Q2):** The 50th percentile.
4.  **Q3 (Third Quartile):** The 75th percentile.
5.  **Maximum:** The largest value in the dataset.

**Example 1 (Five-Number Summary):** 66, 68, 69, 71, 75  
**Range:** Max - Min = 75 - 66 = 9

---

## Outlier Detection (Using IQR)
A common rule of thumb for identifying potential outliers uses the IQR:

* **Lower Bound:**  Q1 - 1.5 * IQR
* **Upper Bound:**  Q3 + 1.5 * IQR

Any data point *below* the lower bound or *above* the upper bound is considered a potential outlier.

**Example 1:**  
* Lower Bound: 68 - 1.5 * 3 = 63.5
* Upper Bound: 71 + 1.5 * 3 = 75.5
* In this example, there are no outliers according to this rule.

**Important Note:** This is just a *rule of thumb*.  Whether a value is truly an outlier depends on the context of the data and the specific analysis.

---

## Calculating Quartiles and IQR in Python
We can easily calculate quartiles and IQR in Python using NumPy and SciPy.

In [1]:
import numpy as np
from scipy import stats

# Example data (from Example 1 - Even number of data points)
data = np.array([66, 67, 67, 68, 68, 68, 68, 69, 69, 69, 70, 70, 70, 71, 71, 72, 73, 75])

# Calculate quartiles using NumPy
q1 = np.quantile(data, 0.25)  # 25th percentile
median = np.quantile(data, 0.50)  # 50th percentile (median)
q3 = np.quantile(data, 0.75)  # 75th percentile

print(f"Q1: {q1}")
print(f"Median (Q2): {median}")
print(f"Q3: {q3}")

# Calculate IQR
iqr = q3 - q1
# or, using scipy.stats:
iqr_scipy = stats.iqr(data)

print(f"IQR (NumPy): {iqr}")
print(f"IQR (SciPy): {iqr_scipy}") #Both are same.

# Five-Number Summary
minimum = np.min(data)
maximum = np.max(data)
print(f"Five-Number Summary: {minimum}, {q1}, {median}, {q3}, {maximum}")

# Outlier Detection
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr
print(f"Lower Bound for Outliers: {lower_bound}")
print(f"Upper Bound for Outliers: {upper_bound}")

outliers = data[(data < lower_bound) | (data > upper_bound)]
print(f"Outliers: {outliers}") # Output: Outliers: []


# --- Example 2 (Odd number of data points) ---
data2 = np.array([66, 67, 67, 68, 68, 68, 69, 69, 69, 70, 70, 70, 71, 71, 72, 73, 75])
q1_2 = np.quantile(data2, 0.25)
median_2 = np.quantile(data2, 0.50)
q3_2 = np.quantile(data2, 0.75)
iqr_2 = stats.iqr(data2)
print(f"\nExample 2 - Q1: {q1_2}, Median: {median_2}, Q3: {q3_2}, IQR: {iqr_2}")

Q1: 68.0
Median (Q2): 69.0
Q3: 70.75
IQR (NumPy): 2.75
IQR (SciPy): 2.75
Five-Number Summary: 66, 68.0, 69.0, 70.75, 75
Lower Bound for Outliers: 63.875
Upper Bound for Outliers: 74.875
Outliers: [75]

Example 2 - Q1: 68.0, Median: 69.0, Q3: 71.0, IQR: 3.0
