# Measures of Dispersion

- **Dispersion** tells us how much the data values **spread out** from the center (mean/median).  
- While measures of central tendency (mean, median, mode) give a "center", **dispersion shows variability**.  
- High dispersion = data is spread out.  
- Low dispersion = data is closely clustered.  

## Common measures of dispersion:

1. Range
2. Percentage or percentile
3. Quartiles (box plot)
4. Variance
5. Standard Deviation

# 1. *Range*

- The **simplest measure of dispersion**.  
- It shows the **difference between the largest and smallest value** in a dataset. 

{1, 2, 3, 4, 5, 6}

Range = 6-1 = 5


{1, 2, 3, 4, 5, 1000} Outliers

Range = 1000-1 = 999

**Example:**
- Dataset = [12, 15, 18, 22, 25]
    - Maximum Value = 25  
    - Minimum Value = 12  
    - Range = 25 - 12 = **13**

**Key Points:**
- Easy to calculate.  
- Gives a quick idea of spread.  
- **Very sensitive to outliers** (one extreme value can change the range drastically).

# 2. *Percentage & Percentile*


### Percentage**
- A **percentage** is a way of expressing a number as a fraction of **100**.  
- It is widely used to compare values and show proportions.  

### Example 1:
- A student scored **45 marks out of 60**.  
` Percentage = (45/60)*100 = 75\% `

### Key Points:
- Always calculated with respect to a **total/whole**.  
- Helpful in **comparisons** (e.g., exam scores, growth rates, discounts).  
- Percentage values range between **0% and 100%**.

### **Percentile**
- A **percentile** indicates the value below which a given percentage of data falls.  
- It helps to understand the **relative standing** of a value in a dataset. 

### Formula

Percentile = ((No of values below that no.)/(Total nos [n]))*100

### Example:
Dataset = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]  

- **25th Percentile (P25)** → Value below which 25% of data lies = **30**  
- **50th Percentile (P50)** → This is also the **Median** = **55**  
- **75th Percentile (P75)** → Value below which 75% of data lies = **80**

### Key Points:
- **Median** is the 50th percentile.  
- **Quartiles** are special percentiles:  
  - Q1 = 25th percentile  
  - Q2 = 50th percentile (Median)  
  - Q3 = 75th percentile  
- Percentiles are widely used in **exams, health indicators, and data analysis**.  


# 3. *Quartiles*

- **Quartiles** divide a dataset into **four equal parts** after sorting.  
- Each quartile contains **25% of the data**.  
- Quartiles are based on **percentiles**:  
  - **Q1 (First Quartile / 25th Percentile)** → 25% of data falls below this value.  
  - **Q2 (Second Quartile / 50th Percentile / Median)** → 50% of data falls below this value.  
  - **Q3 (Third Quartile / 75th Percentile)** → 75% of data falls below this value.  

- Methods to find 
  - Put the number in ascending and descending order
  - Then cut the numbers into 4 equal parts.
  - The quartiles are at the ends.


### Example:
Dataset = [6, 7, 8, 10, 12, 13, 15, 16, 20]  

- **Q1 (25th percentile)** = 8  
- **Q2 (Median / 50th percentile)** = 12  
- **Q3 (75th percentile)** = 15  

So,  
- 25% of data lies below **Q1**  
- 50% of data lies below **Q2**  
- 75% of data lies below **Q3**


### Key Points:
- Quartiles are used to measure **spread and variability**.  
- Help detect **outliers** (via Interquartile Range, IQR).  
- **IQR = Q3 - Q1** → Spread of the middle 50% of the data.  

![image.png](attachment:image.png)

## Interquartile Range

![image.png](attachment:image.png)

Since IQR deals with Q3 & Q1, it will not effected by outliers.

![image.png](attachment:image.png)

- Upper fence & Lower fence : Anything out side the upper fence and lower fence is called outliers.

lower fence = -1.5* IQR

upper fence = 1.5 * IQR

# 4. Mean Deviation

*Mean Deviation (also called Average Deviation) is the average of the absolute differences of each data point from the mean or median.  
It tells us how much the values in a dataset deviate (spread out) on average.*

![image.png](attachment:image.png)

**Example:**  
Data: 2, 4, 6, 8  

1. Mean = (2 + 4 + 6 + 8) / 4 = 5  
2. Deviations from mean:  
   - |2 - 5| = 3  
   - |4 - 5| = 1  
   - |6 - 5| = 1  
   - |8 - 5| = 3  
3. Mean Deviation = (3 + 1 + 1 + 3) / 4 = 2  

So, the **Mean Deviation = 2**.  

Unlike variance, mean deviation does **not square** the differences; it uses **absolute values**, so the units remain the same as the data.

# 5. Variance

The average of the squared differences from the mean.

*Variance is a measure of how spread out the data is from the mean. It means how the data is distributed.  
It is the average of the squared differences from the mean.*

![image.png](attachment:image.png)

Variance sample = ![alt text](sample_variance.svg)

Why you calculate any statistics of sample?
- No access to complete population.

Why we use (n-1) in place of n in sample variance?
- We use (n-1) rather than n because sample variance will be unbiased stimulator.
- Since no access to population, therefore we are estimating the variance of population using sample.

### How to calculate variance?

- Calculate mean.
- For each number in data, subtract the mean and the number.
- Square of difference.
- Calculate the average of square of differences.

### Why we are squaring the value?

+ve and -ve values are negating each other.

To avoid negation we can't use absolute value as absolute value represent mean deviation.

**Example:**  
Data: 2, 4, 6, 8  

1. Mean = (2 + 4 + 6 + 8) / 4 = 5  
2. Squared differences:  
   - (2 - 5)² = 9  
   - (4 - 5)² = 1  
   - (6 - 5)² = 1  
   - (8 - 5)² = 9  
3. Variance = (9 + 1 + 1 + 9) / 4 = 5  

So, the **variance = 5**.  

![image.png](attachment:image.png)

# 6. Standard Deviation

*Standard Deviation (SD) is a measure of how spread out the values are around the mean.  
It is the square root of the variance, which makes it easier to interpret since it is in the same units as the data.*


![image.png](attachment:image.png)

### Why Standard Devation?

- Variance can be huge number because it talks about spread at an overall level. Comparison of each number wrt variance becomes difficult.

- Variance formula take square of numbers where dimensions are squarred. Unlike variance (which is in squared units), SD is in the same units as the data.  
   Example: If data is in centimeters, SD is also in centimeters → easier to interpret. 

**Example:**  
Data: 2, 4, 6, 8  

1. Mean = (2 + 4 + 6 + 8) / 4 = 5  
2. Squared differences:  
   - (2 - 5)² = 9  
   - (4 - 5)² = 1  
   - (6 - 5)² = 1  
   - (8 - 5)² = 9  
3. Variance = (9 + 1 + 1 + 9) / 4 = 5  
4. Standard Deviation = √5 ≈ 2.236  

So, the **Standard Deviation ≈ 2.236**.  

![image.png](attachment:image.png)