### Import necessary packages

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# <h1><center><u><b>STATISTICS</b></u></center></h1>

Statistics, at its core, is a discipline which deals with extracting meaning from data. It is the science that incorporates several interconnected elements —
* collection,
* classification,
* comparision,
* analysis,
* interpretation, and
* presentation of data.

In general, statistical methods can be divided into two categories:
* **Descriptive Statistics** and
* **Inferential Statistics**.

Each of them play an essential role in data analysis but they serve distinct purposes and they are used in different scenarios.


<h1><b>Descreptive Statistics</b></h1>

Descriptive Statistics is a set of brief descriptive coefficients that summarize a given data set representative of an entire or sample population.

## **Measures of Central Tendency**

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data.

**Central Tendency**:

  1.   ***Mathematical Average***:
    *   *Arithmetic Mean*:
        *   for individual series
            (ungrouped data)
        *   for discrete series
        *   for continuous series
    *   *Geometric Mean*
    *   *Harmonic Mean*
  2.   ***Positional Average***:
    *   *Median*
    *   *Mode*

### **Arithmetic Mean**

<h4><b>For Individual Series</b></h4>

(Q.) The following table contains the half-yearly bonuses paid to 10 workers in a factory:

| S. No.              | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---------------------|----|----|----|----|----|----|----|----|----|----|
| Half-yearly bonuses | 150| 200| 300| 650| 250| 180| 400| 500| 550| 220|

Find out the arithmetic mean.

(Soln.)

**Method 1:-**
Finding arithmetic mean for individual series (or ungrouped data) using direct method:

\begin{align}
\bar{x} = \frac{\sum x}{N}
\end{align}
$\text{where, $N$ is the number of observations.}$

In [17]:
# @title Using Direct Method { vertical-output: true, display-mode: "form" }
x = [150,200,300,650,250,180,400,500,550,220]
table = {'Half-yearly Bonuses(x)':pd.Series(x)}
df = pd.DataFrame(table)
df.index = range(1,11)
df

Unnamed: 0,Half-yearly Bonuses(x)
1,150
2,200
3,300
4,650
5,250
6,180
7,400
8,500
9,550
10,220


In [18]:
# @title Mean Using Direct Method { vertical-output: true, display-mode: "form" }
mean = df['Half-yearly Bonuses(x)'].mean(axis=0)
mean

340.0

**Method 2:-**
Finding arithmetic mean for individual series (or ungrouped data) using short-cut method:

\begin{align}
\bar{x} = a + \frac{\sum x-a}{n} \\
\end{align}
$\text{where, $a$ is the assumed mean.}$

Let's assume $a = 400.$

In [20]:
# @title Using Short-cut Method { vertical-output: true, display-mode: "form" }
x = np.array([150,200,300,650,250,180,400,500,550,220])
a = 400
d = x - a
table = {'Half-yearly Bonuses(x)':pd.Series(x),'Deviations':pd.Series(d)}
df = pd.DataFrame(table)
df.index = range(1,11)
df

Unnamed: 0,Half-yearly Bonuses(x),Deviations
1,150,-250
2,200,-200
3,300,-100
4,650,250
5,250,-150
6,180,-220
7,400,0
8,500,100
9,550,150
10,220,-180


In [16]:
# @title Mean Using Short-cut Method { vertical-output: true, display-mode: "form" }
mean = a + df['Deviations'].mean(axis=0)
mean

340.0

<h4><b>For Discrete Series</b></h4>

(Q.) Find out the arithmetic mean of the following frequency distribution of marks of students in a test in Mathematics:

| Marks               | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 |
|---------------------|----|----|----|----|----|----|----|----|
| Number of Students  |  3 |  6 | 10 | 12 |  9 |  6 |  2 |  2 |

(Soln.)

**Method 1:-**
Finding arithmetic mean for discrete series using direct method:

\begin{align}
\bar{x} = \frac{\sum fx}{\sum f}
\end{align}

In [28]:
# @title Using Direct Method { vertical-output: true, display-mode: "form" }
x  = np.array([10,20,30,40,50,60,70,80])
f  = np.array([3,6,10,12,9,6,2,2])
fx = np.multiply(f,x)
table = {'Marks(x)':pd.Series(x),'Number of Students(f)':pd.Series(f),'fx':pd.Series(fx)}
df = pd.DataFrame(table)
df.set_index('Marks(x)')
df.index = range(1,9)
df

Unnamed: 0,Marks(x),Number of Students(f),fx
1,10,3,30
2,20,6,120
3,30,10,300
4,40,12,480
5,50,9,450
6,60,6,360
7,70,2,140
8,80,2,160


In [29]:
# @title Mean Using Direct Method { vertical-output: true, display-mode: "form" }
mean = df['fx'].sum() / df['Number of Students(f)'].sum()
mean

40.8

**Method 2:-**
Finding arithmetic mean for discrete series using short-cut method:

\begin{align}
\bar{x} = a + \frac{\sum f \left( x-a \right)}{\sum f}
\end{align}
$\text{where, $a$ is the assumed mean.}$

Let's assume $a = 40.$

In [31]:
# @title Using Short-cut Method { vertical-output: true, display-mode: "form" }
x  = np.array([10,20,30,40,50,60,70,80])
f  = np.array([3,6,10,12,9,6,2,2])
a  = 40
d  = x-a
fd = np.multiply(f,d)
table = {'Marks(x)':pd.Series(x),'Number of Students(f)':pd.Series(f),'fx':pd.Series(fx),'Deviation(d)':pd.Series(d),'fd':pd.Series(fd)}
df = pd.DataFrame(table)
df.set_index('Marks(x)')
df.index = range(1,9)
df

Unnamed: 0,Marks(x),Number of Students(f),fx,Deviation(d),fd
1,10,3,30,-30,-90
2,20,6,120,-20,-120
3,30,10,300,-10,-100
4,40,12,480,0,0
5,50,9,450,10,90
6,60,6,360,20,120
7,70,2,140,30,60
8,80,2,160,40,80


In [32]:
# @title Mean Using Short-cut Method { vertical-output: true, display-mode: "form" }
mean = a + df['fd'].sum() / df['Number of Students(f)'].sum()
mean

40.8

**Method 3:-**
Finding arithmetic mean for discrete series using step-deviation method:

\begin{align}
\bar{x} = a + \frac{\sum f \bar{d}h}{\sum {f}}
\end{align}
$\text{where, $a$ is the assumed mean,}$
\begin{align}
\bar{d} = \frac{x-a}{h},
\end{align}
$\text{and $h$ is the step-deviation.}$

Let's assume $a = 40.$ For the given dataset, $h = 10.$

In [24]:
# @title Using Step-deviation Method { vertical-output: true, display-mode: "form" }
x  = np.array([10,20,30,40,50,60,70,80])
f  = np.array([3,6,10,12,9,6,2,2])
a  = 40
h  = x[1] - x[0]
d  = x - a
d_bar = d / h
fd_bar = np.multiply(f,d_bar)
table = {'Marks(x)':pd.Series(x),'Number of Students(f)':pd.Series(f),'Deviation(d)':pd.Series(d),'d_bar':pd.Series(d_bar),'fd_bar':pd.Series(fd_bar)}
df = pd.DataFrame(table)
df.set_index('Marks(x)')
df.index = range(1,9)
df

Unnamed: 0,Marks(x),Number of Students(f),Deviation(d),d_bar,fd_bar
1,10,3,-30,-3.0,-9.0
2,20,6,-20,-2.0,-12.0
3,30,10,-10,-1.0,-10.0
4,40,12,0,0.0,0.0
5,50,9,10,1.0,9.0
6,60,6,20,2.0,12.0
7,70,2,30,3.0,6.0
8,80,2,40,4.0,8.0


In [26]:
# @title Mean Using Step-deviation Method { vertical-output: true, display-mode: "form" }
mean = a + (df['fd_bar'].sum() / df['Number of Students(f)'].sum()) * h
mean

40.8

<h4><b>For Continuous Series</b></h4>

(Q.) Find the arithmetic mean for the following data:

| Marks | Number of Students |
|-------|--------------------|
|  0-10 |       5            |
| 10-20 |      10            |
| 20-30 |      40            |
| 30-40 |      20            |
| 40-50 |      25            |

(Soln.)

**Method 1:-** Finding arithmetic mean for continuous series using direct method:

\begin{align}
\bar{x} = \frac{\sum{fm}}{\sum{f}}
\end{align}
$\text{where, m is the midpoint of the class interval.}$

In [35]:
# @title Using Direct Method { vertical-output: true, display-mode: "form" }
marks = np.arange(0,60,10)
x  = [f'{marks[i]} - {marks[i+1]}' for i in range(len(marks)-1)]
m  = np.array([(marks[i]+marks[i+1])/2 for i in range(len(marks)-1)])
f  = np.array([5,10,40,20,25])
fm = np.multiply(f,m)
table = {'x':pd.Series(x),'m':pd.Series(m),'f':pd.Series(f),'fm':pd.Series(fm)}
df = pd.DataFrame(table)
df.set_index('x')
df.index = range(1,len(x)+1)
df

Unnamed: 0,x,m,f,fm
1,0 - 10,5.0,5,25.0
2,10 - 20,15.0,10,150.0
3,20 - 30,25.0,40,1000.0
4,30 - 40,35.0,20,700.0
5,40 - 50,45.0,25,1125.0


In [36]:
# @title Mean Using Direct Method { vertical-output: true, display-mode: "form" }
mean = df['fm'].sum() / df['f'].sum()
mean

30.0

**Method 2:-** Finding arithmetic mean for continuous series using short-cut method:

\begin{align}
\bar{x} = a + \frac{\sum{fd}}{\sum{f}}
\end{align}
$\text{where, $a$ is the assumed mean, $m$ is the midpoint of the class interval,}$
\begin{align}
d = x - a,
\end{align}
$\text{and $d$ is the deviation.}$

For the given dataset, $a = 25.$

In [38]:
# @title Using Short-cut Method { vertical-output: true, display-mode: "form" }
marks = np.arange(0,60,10)
a  = 25
x  = [f'{marks[i]} - {marks[i+1]}' for i in range(len(marks)-1)]
m  = np.array([(marks[i]+marks[i+1])/2 for i in range(len(marks)-1)])
f  = np.array([5,10,40,20,25])
d  = m - a
fd = np.multiply(f,d)
table = {'x':pd.Series(x),'m':pd.Series(m),'f':pd.Series(f),'d':pd.Series(d),'fd':pd.Series(fd)}
df = pd.DataFrame(table)
df.set_index('x')
df.index = range(1,len(x)+1)
df

Unnamed: 0,x,m,f,d,fd
1,0 - 10,5.0,5,-20.0,-100.0
2,10 - 20,15.0,10,-10.0,-100.0
3,20 - 30,25.0,40,0.0,0.0
4,30 - 40,35.0,20,10.0,200.0
5,40 - 50,45.0,25,20.0,500.0


In [39]:
# @title Mean Using Short-cut Method { vertical-output: true, display-mode: "form" }
mean = a + df['fd'].sum() / df['f'].sum()
mean

30.0

**Method 3:-**
Finding arithmetic mean for continuous series using step-deviation method:

\begin{align}
\bar{x} = a + \frac{\sum f \bar{d}h}{\sum {f}}
\end{align}
$\text{where, $a$ is the assumed mean,}$
\begin{align}
\bar{d} = \frac{x-a}{h},
\end{align}
$\text{and $h$ is the step-deviation.}$

In [41]:
# @title Using Step-deviation Method { vertical-output: true, display-mode: "form" }
marks = np.arange(0,60,10)
a  = 25
x  = [f'{marks[i]} - {marks[i+1]}' for i in range(len(marks)-1)]
m  = np.array([(marks[i]+marks[i+1])/2 for i in range(len(marks)-1)])
h  = m[1] - m[0]
f  = np.array([5,10,40,20,25])
d  = m - a
d_bar  = d / h
fd_bar = np.multiply(f,d_bar)
table = {'x':pd.Series(x),'m':pd.Series(m),'f':pd.Series(f),'d':pd.Series(d),'d_bar':pd.Series(d_bar),'fd_bar':pd.Series(fd_bar)}
df = pd.DataFrame(table)
df.set_index('x')
df.index = range(1,len(x)+1)
df

Unnamed: 0,x,m,f,d,d_bar,fd_bar
1,0 - 10,5.0,5,-20.0,-2.0,-10.0
2,10 - 20,15.0,10,-10.0,-1.0,-10.0
3,20 - 30,25.0,40,0.0,0.0,0.0
4,30 - 40,35.0,20,10.0,1.0,20.0
5,40 - 50,45.0,25,20.0,2.0,50.0


In [42]:
# @title Mean Using Step-deviation Method { vertical-output: true, display-mode: "form" }
mean = a + (df['fd_bar'].sum() / df['f'].sum()) * h
mean

30.0