# Measures of Dispersion

Measures of dispersion are statistical values that help in understanding how much the data in a dataset varies or is spread out from its central tendency. Dispersion indicates how much the values in a dataset deviate from the mean or median. In other words, it provides a measure of the diversity or variability of the dataset. The most common measures of dispersion are the range, variance, and standard deviation. This article aims to provide a comprehensive overview of the components of measures of dispersion.

Range: Range is the simplest measure of dispersion and it represents the difference between the highest and lowest value in a dataset. It provides a rough idea of how far the data spreads. To calculate the range, subtract the lowest value from the highest value. For instance, consider a dataset with values 5, 7, 10, 12, 15. The range of this dataset would be 15-5=10. However, the range does not account for the variability between the values in the dataset.

#  Range = largest_value - smallest value

In [18]:
import numpy as np
import pandas as pd
import statistics

In [3]:
d = [1,2,3,4,5,6,7,8,9]

In [4]:
smallest_value = min(d)

In [5]:
smallest_value

1

In [6]:
largest_value = max(d)

In [7]:
largest_value

9

In [8]:
range_of_d = largest_value - smallest_value

In [9]:
range_of_d

8

In [10]:
data = pd.read_csv("Students_marks.csv")

In [11]:
data

Unnamed: 0,Roll No.,Name,Gender,Data Structure,Mathematics,Operating System,Average marks,Ranking
0,1,Avneet,F,25,19,17,20.33,Good
1,2,Jay,M,23,21,14,19.33,Okay
2,3,Sadaf,F,28,23,21,24.0,Good
3,4,Prakash,M,21,27,28,25.33,Very Good
4,5,Sarita,F,23,22,9,18.0,Okay
5,6,Vasudha,F,190,30,28,82.66,Very Good
6,7,Ankur,M,24,24,22,23.33,Good
7,8,Laksh,M,26,25,27,26.0,Very Good
8,9,Aman,M,23,18,25,22.0,Good
9,10,Rajat,M,24,22,15,20.33,Good


In [13]:
data["Mathematics"].max() - data["Mathematics"].min()

12

# variance

Variance: The variance is the average of the squared deviation of each data point from the mean. It measures how far the data points are from the mean. Variance is calculated by squaring the difference between each data point and the mean, adding all of these squared differences, and then dividing the sum by the total number of data points in the dataset. The formula for variance is:

In [14]:
data

Unnamed: 0,Roll No.,Name,Gender,Data Structure,Mathematics,Operating System,Average marks,Ranking
0,1,Avneet,F,25,19,17,20.33,Good
1,2,Jay,M,23,21,14,19.33,Okay
2,3,Sadaf,F,28,23,21,24.0,Good
3,4,Prakash,M,21,27,28,25.33,Very Good
4,5,Sarita,F,23,22,9,18.0,Okay
5,6,Vasudha,F,190,30,28,82.66,Very Good
6,7,Ankur,M,24,24,22,23.33,Good
7,8,Laksh,M,26,25,27,26.0,Very Good
8,9,Aman,M,23,18,25,22.0,Good
9,10,Rajat,M,24,22,15,20.33,Good


In [19]:
statistics.variance(data["Operating System"])

43.82222222222222

In [20]:
statistics.variance(data["Mathematics"])

12.988888888888889

# Standard Deviation:

 The standard deviation is the square root of the variance. It is a widely used measure of dispersion as it is easy to interpret and has desirable mathematical properties. It indicates how much the data points deviate from the mean in terms of standard deviations. A small standard deviation indicates that the data points are tightly clustered around the mean, while a large standard deviation indicates that the data points are spread out from the mean.The formula for standard deviation is

In [21]:
data

Unnamed: 0,Roll No.,Name,Gender,Data Structure,Mathematics,Operating System,Average marks,Ranking
0,1,Avneet,F,25,19,17,20.33,Good
1,2,Jay,M,23,21,14,19.33,Okay
2,3,Sadaf,F,28,23,21,24.0,Good
3,4,Prakash,M,21,27,28,25.33,Very Good
4,5,Sarita,F,23,22,9,18.0,Okay
5,6,Vasudha,F,190,30,28,82.66,Very Good
6,7,Ankur,M,24,24,22,23.33,Good
7,8,Laksh,M,26,25,27,26.0,Very Good
8,9,Aman,M,23,18,25,22.0,Good
9,10,Rajat,M,24,22,15,20.33,Good


In [23]:
statistics.pstdev(data["Mathematics"])

3.4190641994557516

In [24]:
statistics.pstdev(data["Operating System"])

6.2801273872430325

# Coefficient of Variation: 

 The coefficient of variation is a measure of dispersion that expresses the standard deviation as a percentage of the mean. It is used to compare the variability of datasets with different means. The formula for coefficient of variation is: 

# Interquartile Range

 Quantile:  A quantile determines how many values in a distribution are crossing a threshold, i.e., how many values are above and below a certain limit. 

Quartiles (Quarter), Quintile (Fifth part) and percentiles (Hundredth)  are some types of quantiles that we use.

The interquartile range is a measure of dispersion that is based on the quartiles of the dataset. The quartiles divide the dataset into four equal parts, with each part representing 25% of the data. The interquartile range is the difference between the upper quartile (Q3) and the lower quartile (Q1). It represents the range of the middle 50% of the dataset, which is less affected by outliers than the range.

If we divide a distribution into four equal portions, we will speak of four quartiles. The first quartile includes all values that are smaller than a quarter of all values. In a graphical representation, it corresponds to 25% of the total area of a distribution. The two lower quartiles comprise 50% of all distribution values. 

The Interquartile range is the distance between the 25th and 75th percentile, which is also the height of the box in a box plot. 



In [25]:
data

Unnamed: 0,Roll No.,Name,Gender,Data Structure,Mathematics,Operating System,Average marks,Ranking
0,1,Avneet,F,25,19,17,20.33,Good
1,2,Jay,M,23,21,14,19.33,Okay
2,3,Sadaf,F,28,23,21,24.0,Good
3,4,Prakash,M,21,27,28,25.33,Very Good
4,5,Sarita,F,23,22,9,18.0,Okay
5,6,Vasudha,F,190,30,28,82.66,Very Good
6,7,Ankur,M,24,24,22,23.33,Good
7,8,Laksh,M,26,25,27,26.0,Very Good
8,9,Aman,M,23,18,25,22.0,Good
9,10,Rajat,M,24,22,15,20.33,Good


In [26]:
Q1 = data["Mathematics"].quantile(0.25)

In [29]:
Q1

21.25

In [30]:
Q3 = data["Mathematics"].quantile(0.75)

In [31]:
Q3

24.75

In [32]:
IQR = Q3 - Q1

In [33]:
IQR

3.5

In [34]:
upper_limit = Q3 + 1.5*IQR
lower_limit = Q1 - 1.5*IQR

In [35]:
upper_limit

30.0

In [36]:
lower_limit

16.0

In [43]:
data[(data["Mathematics"]<upper_limit) & (data["Mathematics"]>lower_limit)]

Unnamed: 0,Roll No.,Name,Gender,Data Structure,Mathematics,Operating System,Average marks,Ranking
0,1,Avneet,F,25,19,17,20.33,Good
1,2,Jay,M,23,21,14,19.33,Okay
2,3,Sadaf,F,28,23,21,24.0,Good
3,4,Prakash,M,21,27,28,25.33,Very Good
4,5,Sarita,F,23,22,9,18.0,Okay
6,7,Ankur,M,24,24,22,23.33,Good
7,8,Laksh,M,26,25,27,26.0,Very Good
8,9,Aman,M,23,18,25,22.0,Good
9,10,Rajat,M,24,22,15,20.33,Good


In [44]:
data[(data["Mathematics"] > upper_limit) & (data["Mathematics"] <lower_limit) ]

Unnamed: 0,Roll No.,Name,Gender,Data Structure,Mathematics,Operating System,Average marks,Ranking
