The dispersion is the **Spread of the data**. It measures how far the data is spread.

In most of the dataset, the data values are closely located near the mean. On some other dataset, the values are widely spread out of the mean. These dispersions of data can be measured by

1. Inter Quartile Range (IQR)
2. Range
3. Standard Deviation
4. Variance

### Inter Quartile Range

Quartiles are special percentiles.

1st Quartile Q1 is the same as the 25th percentile.

2nd Quartile Q2 is the same as 50th percentile.

3rd Quratile Q3 is same as 75th percentile

Steps to find quartile

1. The data should be sorted and ordered from the smallest to the largest.
2. For Quartiles, ordered data is divided into 4 equal parts.

For Percentiles, ordered data is divided into 100 equal parts.

**Inter Quartile Range is the difference between the third quartile(Q3) and the first Quartile (Q1)**

**IQR = Q3- Q1**

![image.png](attachment:image.png)

Inter Quartile range is the spread of the middle half(50%) of the data
It is the best measure of variability for skewed distributions or data sets with outliers. Because it's based on values that come from the middle half of the distribution, it's unlikely to be influenced by outliers.

**Uses :**

1. The interquartile range has a breakdown point of 25% due to which it is often preferred over the total range.
2. The IQR is used to build box plots, simple graphical representations of a probability distribution.
3. The IQR can also be used to identify the outliers in the given data set.
4. The IQR gives the central tendency of the data.

**Decision Making**

1. The data set has a higher value of interquartile range (IQR) has more variability.
2. The data set having a lower value of interquartile range (IQR) is preferable.

In [2]:
# Interquartile range using numpy.median

import numpy as np
  
data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97, 
        101, 105, 112, 116]
  
# First quartile (Q1)
Q1 = np.median(data[:10])
  
# Third quartile (Q3)
Q3 = np.median(data[10:])
  
# Interquartile range (IQR)
IQR = Q3 - Q1
  
print(IQR)

34.0


In [3]:
# Interquartile range using numpy.percentile

data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97, 
        101, 105, 112, 116]
  
# First quartile (Q1)
Q1 = np.percentile(data, 25, interpolation = 'midpoint')
  
# Third quartile (Q3)
Q3 = np.percentile(data, 75, interpolation = 'midpoint')
  
# Interquaritle range (IQR)
IQR = Q3 - Q1
  
print(IQR)

34.0


In [4]:
# Interquartile range using scipy.stats.iqr

from scipy import stats
  
data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97, 
        101, 105, 112, 116]
  
# Interquartile range (IQR)
IQR = stats.iqr(data, interpolation = 'midpoint')
  
print(IQR)

34.0


### Range

The range is the difference between the largest and the smallest value in the data.

**Range = Max - Min**

### Standard Deviation

The most common measure of spread is the standard deviation.

**The Standard deviation is the measure of how far the data deviates from the mean value.**

The standard deviation formula varies for population and sample. Both formulas are similar, but not the same.

* Symbol used for Sample Standard Deviation  –  “s” (lowercase)
* Symbol used for Population Standard Deviation – “σ” (sigma, lower case)

Sample standard deviation:
![image-2.png](attachment:image-2.png)

Population standard deviation:
![image.png](attachment:image.png)

A low measure of Standard Deviation indicates that the data are less spread out, whereas a high value of Standard Deviation shows that the data in a set are spread apart from their mean average values. A useful property of the standard deviation is that, unlike the variance, it is expressed in the same units as the data. 

In [6]:
# Python code to demonstrate stdev() function
 
# importing Statistics module
import statistics
 
# creating a simple data - set
sample = [1, 2, 3, 4, 5]
 
# Prints standard deviation
# xbar is set to default value of 1
print("Standard Deviation of sample is % s "
                % (statistics.stdev(sample)))

Standard Deviation of sample is 1.5811388300841898 


### Variance

Variance is a measure of how data points differ from the mean. According to Layman, a variance is a measure of how far a set of data (numbers) are spread out from their mean (average) value. Variance means to find the expected difference of deviation from actual value.

The symbol σ2 represents the population variance and the symbol for s2 represents sample variance.

![image.png](attachment:image.png)

A low value for variance indicates that the data are clustered together and are not spread apart widely, whereas a high value would indicate that the data in the given set are much more spread apart from the average value. 

#### Difference between standard deviation and variance

Variance is the average squared deviations from the mean, while standard deviation is the square root of this number. Both measures reflect variability in a distribution, but their units differ: Standard deviation is expressed in the same units as the original values 

In [8]:
# Python code to demonstrate the working of
# variance() function of Statistics Module
 
# Importing Statistics module
import statistics
 
# Creating a sample of data
sample = [2.74, 1.23, 2.63, 2.22, 3, 1.98]
 
# Prints variance of the sample set
 
# Function will automatically calculate
# it's mean and set it as xbar
print("Variance of sample set is % s" %(statistics.variance(sample)))

Variance of sample set is 0.40924
