### Import necessary packages

In [24]:
import numpy as np
import pandas as pd
import sympy as smp
import matplotlib.pyplot as plt

# <h1><center><u><b>STATISTICS</b></u></center></h1>

Statistics, at its core, is a discipline which deals with extracting meaning from data. It is the science that incorporates several interconnected elements —
* collection,
* classification,
* comparision,
* analysis,
* interpretation, and
* presentation of data.

In general, statistical methods can be divided into two categories:
* **Descriptive Statistics** and
* **Inferential Statistics**.

Each of them play an essential role in data analysis but they serve distinct purposes and they are used in different scenarios.


<h1><b>Descreptive Statistics</b></h1>

Descriptive Statistics is a set of brief descriptive coefficients that summarize a given data set representative of an entire or sample population.

# **Measures of Central Tendency**

A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data.

**Central Tendency**:

  1.   *Mathematical Average*:
    *   Arithmetic Mean
    *   Geometric Mean
    *   Harmonic Mean
  2.   *Positional Average*:
    *   Median
    *   Mode

## **Mathematical Average**

### **Arithmetic Mean**

<h4><b>For Individual Series</b></h4>

(Q.) The following table contains the half-yearly bonuses paid to 10 workers in a factory:

| S. No.              | 1  | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---------------------|----|----|----|----|----|----|----|----|----|----|
| Half-yearly bonuses | 150| 200| 300| 650| 250| 180| 400| 500| 550| 220|

Find out the arithmetic mean.

(A.)

**Method 1:-**
Finding arithmetic mean for individual series (or ungrouped data) using direct method:

\begin{align}
\bar{x} = \frac{\sum x}{N}
\end{align}
$\text{where, $N$ is the number of observations.}$

In [25]:
# @title Using Direct Method { vertical-output: true, display-mode: "form" }
x = [150,200,300,650,250,180,400,500,550,220]
table = {'Half-yearly Bonuses(x)':pd.Series(x)}
df = pd.DataFrame(table)
df.index = range(1,11)
df

Unnamed: 0,Half-yearly Bonuses(x)
1,150
2,200
3,300
4,650
5,250
6,180
7,400
8,500
9,550
10,220


In [26]:
# @title Mean Using Direct Method { vertical-output: true, display-mode: "form" }
mean = df['Half-yearly Bonuses(x)'].mean(axis=0)
mean

340.0

**Method 2:-**
Finding arithmetic mean for individual series (or ungrouped data) using short-cut method:

\begin{align}
\bar{x} = a + \frac{\sum x-a}{n} \\
\end{align}
$\text{where, $a$ is the assumed mean.}$

Let's assume $a = 400.$

In [27]:
# @title Using Short-cut Method { vertical-output: true, display-mode: "form" }
x = np.array([150,200,300,650,250,180,400,500,550,220])
a = 400
d = x - a
table = {'Half-yearly Bonuses(x)':pd.Series(x),'Deviations':pd.Series(d)}
df = pd.DataFrame(table)
df.index = range(1,11)
df

Unnamed: 0,Half-yearly Bonuses(x),Deviations
1,150,-250
2,200,-200
3,300,-100
4,650,250
5,250,-150
6,180,-220
7,400,0
8,500,100
9,550,150
10,220,-180


In [28]:
# @title Mean Using Short-cut Method { vertical-output: true, display-mode: "form" }
mean = a + df['Deviations'].mean(axis=0)
mean

340.0

<h4><b>For Discrete Series</b></h4>

(Q.) Find out the arithmetic mean of the following frequency distribution of marks of students in a test in Mathematics:

| Marks               | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 |
|---------------------|----|----|----|----|----|----|----|----|
| Number of Students  |  3 |  6 | 10 | 12 |  9 |  6 |  2 |  2 |

(A.)

**Method 1:-**
Finding arithmetic mean for discrete series using direct method:

\begin{align}
\bar{x} = \frac{\sum fx}{\sum f}
\end{align}

In [29]:
# @title Using Direct Method { vertical-output: true, display-mode: "form" }
x  = np.array([10,20,30,40,50,60,70,80])
f  = np.array([3,6,10,12,9,6,2,2])
fx = np.multiply(f,x)
table = {'Marks(x)':pd.Series(x),'Number of Students(f)':pd.Series(f),'fx':pd.Series(fx)}
df = pd.DataFrame(table)
df.set_index('Marks(x)')
df.index = range(1,9)
df

Unnamed: 0,Marks(x),Number of Students(f),fx
1,10,3,30
2,20,6,120
3,30,10,300
4,40,12,480
5,50,9,450
6,60,6,360
7,70,2,140
8,80,2,160


In [30]:
# @title Mean Using Direct Method { vertical-output: true, display-mode: "form" }
mean = df['fx'].sum() / df['Number of Students(f)'].sum()
mean

40.8

**Method 2:-**
Finding arithmetic mean for discrete series using short-cut method:

\begin{align}
\bar{x} = a + \frac{\sum f \left( x-a \right)}{\sum f}
\end{align}
$\text{where, $a$ is the assumed mean.}$

Let's assume $a = 40.$

In [31]:
# @title Using Short-cut Method { vertical-output: true, display-mode: "form" }
x  = np.array([10,20,30,40,50,60,70,80])
f  = np.array([3,6,10,12,9,6,2,2])
a  = 40
d  = x-a
fd = np.multiply(f,d)
table = {'Marks(x)':pd.Series(x),'Number of Students(f)':pd.Series(f),'fx':pd.Series(fx),'Deviation(d)':pd.Series(d),'fd':pd.Series(fd)}
df = pd.DataFrame(table)
df.set_index('Marks(x)')
df.index = range(1,9)
df

Unnamed: 0,Marks(x),Number of Students(f),fx,Deviation(d),fd
1,10,3,30,-30,-90
2,20,6,120,-20,-120
3,30,10,300,-10,-100
4,40,12,480,0,0
5,50,9,450,10,90
6,60,6,360,20,120
7,70,2,140,30,60
8,80,2,160,40,80


In [32]:
# @title Mean Using Short-cut Method { vertical-output: true, display-mode: "form" }
mean = a + df['fd'].sum() / df['Number of Students(f)'].sum()
mean

40.8

**Method 3:-**
Finding arithmetic mean for discrete series using step-deviation method:

\begin{align}
\bar{x} = a + \frac{\sum f \bar{d}h}{\sum {f}}
\end{align}
$\text{where, $a$ is the assumed mean,}$
\begin{align}
\bar{d} = \frac{x-a}{h},
\end{align}
$\text{and $h$ is the step-deviation.}$

Let's assume $a = 40.$ For the given dataset, $h = 10.$

In [33]:
# @title Using Step-deviation Method { vertical-output: true, display-mode: "form" }
x  = np.array([10,20,30,40,50,60,70,80])
f  = np.array([3,6,10,12,9,6,2,2])
a  = 40
h  = x[1] - x[0]
d  = x - a
d_bar = d / h
fd_bar = np.multiply(f,d_bar)
table = {'Marks(x)':pd.Series(x),'Number of Students(f)':pd.Series(f),'Deviation(d)':pd.Series(d),'d_bar':pd.Series(d_bar),'fd_bar':pd.Series(fd_bar)}
df = pd.DataFrame(table)
df.set_index('Marks(x)')
df.index = range(1,9)
df

Unnamed: 0,Marks(x),Number of Students(f),Deviation(d),d_bar,fd_bar
1,10,3,-30,-3.0,-9.0
2,20,6,-20,-2.0,-12.0
3,30,10,-10,-1.0,-10.0
4,40,12,0,0.0,0.0
5,50,9,10,1.0,9.0
6,60,6,20,2.0,12.0
7,70,2,30,3.0,6.0
8,80,2,40,4.0,8.0


In [34]:
# @title Mean Using Step-deviation Method { vertical-output: true, display-mode: "form" }
mean = a + (df['fd_bar'].sum() / df['Number of Students(f)'].sum()) * h
mean

40.8

<h4><b>For Continuous Series</b></h4>

(Q.) Find the arithmetic mean for the following data:

| Marks | Number of Students |
|-------|--------------------|
|  0-10 |       5            |
| 10-20 |      10            |
| 20-30 |      40            |
| 30-40 |      20            |
| 40-50 |      25            |

(A.)

**Method 1:-** Finding arithmetic mean for continuous series using direct method:

\begin{align}
\bar{x} = \frac{\sum{fm}}{\sum{f}}
\end{align}
$\text{where, m is the midpoint of the class interval.}$

In [35]:
# @title Using Direct Method { vertical-output: true, display-mode: "form" }
marks = np.arange(0,60,10)
x  = [f'{marks[i]} - {marks[i+1]}' for i in range(len(marks)-1)]
m  = np.array([(marks[i]+marks[i+1])/2 for i in range(len(marks)-1)])
f  = np.array([5,10,40,20,25])
fm = np.multiply(f,m)
table = {'x':pd.Series(x),'m':pd.Series(m),'f':pd.Series(f),'fm':pd.Series(fm)}
df = pd.DataFrame(table)
df.set_index('x')
df.index = range(1,len(x)+1)
df

Unnamed: 0,x,m,f,fm
1,0 - 10,5.0,5,25.0
2,10 - 20,15.0,10,150.0
3,20 - 30,25.0,40,1000.0
4,30 - 40,35.0,20,700.0
5,40 - 50,45.0,25,1125.0


In [36]:
# @title Mean Using Direct Method { vertical-output: true, display-mode: "form" }
mean = df['fm'].sum() / df['f'].sum()
mean

30.0

**Method 2:-** Finding arithmetic mean for continuous series using short-cut method:

\begin{align}
\bar{x} = a + \frac{\sum{fd}}{\sum{f}}
\end{align}
$\text{where, $a$ is the assumed mean, $m$ is the midpoint of the class interval,}$
\begin{align}
d = x - a,
\end{align}
$\text{and $d$ is the deviation.}$

For the given dataset, $a = 25.$

In [37]:
# @title Using Short-cut Method { vertical-output: true, display-mode: "form" }
marks = np.arange(0,60,10)
a  = 25
x  = [f'{marks[i]} - {marks[i+1]}' for i in range(len(marks)-1)]
m  = np.array([(marks[i]+marks[i+1])/2 for i in range(len(marks)-1)])
f  = np.array([5,10,40,20,25])
d  = m - a
fd = np.multiply(f,d)
table = {'x':pd.Series(x),'m':pd.Series(m),'f':pd.Series(f),'d':pd.Series(d),'fd':pd.Series(fd)}
df = pd.DataFrame(table)
df.set_index('x')
df.index = range(1,len(x)+1)
df

Unnamed: 0,x,m,f,d,fd
1,0 - 10,5.0,5,-20.0,-100.0
2,10 - 20,15.0,10,-10.0,-100.0
3,20 - 30,25.0,40,0.0,0.0
4,30 - 40,35.0,20,10.0,200.0
5,40 - 50,45.0,25,20.0,500.0


In [38]:
# @title Mean Using Short-cut Method { vertical-output: true, display-mode: "form" }
mean = a + df['fd'].sum() / df['f'].sum()
mean

30.0

**Method 3:-**
Finding arithmetic mean for continuous series using step-deviation method:

\begin{align}
\bar{x} = a + \frac{\sum f \bar{d}h}{\sum {f}}
\end{align}
$\text{where, $a$ is the assumed mean,}$
\begin{align}
\bar{d} = \frac{x-a}{h},
\end{align}
$\text{and $h$ is the step-deviation.}$

In [39]:
# @title Using Step-deviation Method { vertical-output: true, display-mode: "form" }
marks = np.arange(0,60,10)
a  = 25
x  = [f'{marks[i]} - {marks[i+1]}' for i in range(len(marks)-1)]
m  = np.array([(marks[i]+marks[i+1])/2 for i in range(len(marks)-1)])
h  = m[1] - m[0]
f  = np.array([5,10,40,20,25])
d  = m - a
d_bar  = d / h
fd_bar = np.multiply(f,d_bar)
table = {'x':pd.Series(x),'m':pd.Series(m),'f':pd.Series(f),'d':pd.Series(d),'d_bar':pd.Series(d_bar),'fd_bar':pd.Series(fd_bar)}
df = pd.DataFrame(table)
df.set_index('x')
df.index = range(1,len(x)+1)
df

Unnamed: 0,x,m,f,d,d_bar,fd_bar
1,0 - 10,5.0,5,-20.0,-2.0,-10.0
2,10 - 20,15.0,10,-10.0,-1.0,-10.0
3,20 - 30,25.0,40,0.0,0.0,0.0
4,30 - 40,35.0,20,10.0,1.0,20.0
5,40 - 50,45.0,25,20.0,2.0,50.0


In [40]:
# @title Mean Using Step-deviation Method { vertical-output: true, display-mode: "form" }
mean = a + (df['fd_bar'].sum() / df['f'].sum()) * h
mean

30.0

<h4><b>Mathematical Properties of Arithmetic Mean</b><h4>

1. The total of deviations of the items from the mean is equal to zero.

\begin{align}
\frac{\sum{\left(x-\bar{x}\right)}}{n} &= 0
\end{align}
Proof:
\begin{align}
\bar{x} = \frac{\sum{x}}{n}
\end{align}
Mutliplying both sides by $n$,
\begin{align}
{n}\bar{x} = \sum{x}
\end{align}
Rearranging the equation,
\begin{align}
&\implies
\sum{x} - {n}\bar{x} &= 0 \\
&\implies
\sum{x} - \sum{\bar{x}} &= 0 \\
&\implies
\frac{\sum{\left(x - \bar{x}\right)}}{n} &= 0.
\end{align}

2. Combined mean of two groups:

Let $\bar{x_{1}}$ and $\bar{x_{2}}$ be the arithmetic means of the first and second series respectively. Total number of observations in the first and second groups are $N_{1}$ and $N_{2}$. Then, the combined mean of the two groups is given by,
\begin{align}
\bar{x}_{1,2} = \frac{N_{1}\bar{x}_{1}+N_{2}\bar{x}_{2}}{N_{1}+N_{2}}.
\end{align}

(Q.) A cooperative bank has two branches employing $50$ and $70$ workers respectively. The average salary paid by two respective branches are $360$INR and $390$INR per month. Calculate the mean of the salaries of all the employees.

(A.) Gievn:

$\bar{x}_{1} = 360, \bar{x}_{2} = 390,N_{1} = 50,N_{2} = 70$

$\therefore$ Combined mean
\begin{align}
\bar{x}_{1,2} &= \frac{N_{1}\bar{x}_{1}+N_{2}\bar{x}_{2}}{N_{1}+N_{2}} \\
\implies
\bar{x}_{1,2} &= \frac{50\times360+70\times390}{50+70} \\
\implies
\bar{x}_{1,2} &= 377.5.
\end{align}

3. Weighted Arithmetic Mean:

\begin{align}
\bar{x}_{w} = \frac{\sum{wx}}{\sum{w}}.
\end{align}
where, $w$ is the weight of the observation $x$.

(Q.) Calculate the weighted mean by weighing each price by the quantity consumed.

| Articles of food | Quantity consumed in Kg | Price in INR per Kg |
|------------------|-------------------------|---------------------|
| Flour            | 11.50                   |  5.8                |
| Ghee             | 5.60                    | 58.4                |
| Sugar            | 0.28                    |  8.2                |
| Potato           | 0.16                    |  2.5                |
| Oil              | 0.35                    | 20.0                |

(A.) For the given frequency distribution,

In [41]:
# @title Weighted Mean { vertical-output: true, display-mode: "form" }
food = ['Flour','Ghee','Sugar','Potato','Oil']
w = np.array([11.50,5.60,0.28,0.16,0.35])
x = np.array([5.8,58.4,8.2,2.5,20.0])
wx = np.multiply(w,x)
table = {'Food':pd.Series(food),'w':pd.Series(w),'x':pd.Series(x),'wx':pd.Series(wx)}
df = pd.DataFrame(table)
df


Unnamed: 0,Food,w,x,wx
0,Flour,11.5,5.8,66.7
1,Ghee,5.6,58.4,327.04
2,Sugar,0.28,8.2,2.296
3,Potato,0.16,2.5,0.4
4,Oil,0.35,20.0,7.0


In [42]:
# @title Weighted Mean { vertical-output: true, display-mode: "form" }
mean = df['wx'].sum() / df['w'].sum()
round(mean,2)

22.55

### **Geometric Mean**

<h4><b>For Individual Series</b></h4>

Geometric Mean of $n$ individual data points:
\begin{align}
GM = \left(x_{1} \cdot x_{2} \cdot x_{3} \ldots x_{n}\right)^{1/n}.
\end{align}
Logarithm form of Geometric mean:
\begin{align}
\log{GM} = \frac{\log {x_{1}} + \log {x_{2}} + \log {x_{3}} + \ldots + \log {x_{n}}}{n}.
\end{align}
Taking $\text{antilog}$ on both the sides:
\begin{align}
\therefore GM = \text{antilog}\left(\frac{\log{x_{1}}+\log{x_{2}}+\log{x_{3}}+\ldots+\log{x_{n}}}{n}\right).
\end{align}

<h4><b>For Discrete Series</b></h4>

For a frequency distribution, geometric mean of $n$ values
$x_{1} , x_{2} , x_{3}, \dots, x_{n}.$
of a variate $x$ occuring with frequencies,
$f_{1} , f_{2} , f_{3}, \dots, f_{n}.$
respectively, is given by:
\begin{align}
GM = \left(x_{1}^{f_{1}} \cdot x_{2}^{f_{2}} \cdot x_{3}^{f_{3}} \ldots x_{n}^{f_{n}}\right)^{1/N},
\end{align}
where,
\begin{align}
N = \sum_{i=1}^{n}{f_{i}}.
\end{align}

Taking logarithm on both the sides,
\begin{align}
\log{GM} &= \frac{\log {x_{1}^{f_{1}}} + \log {x_{2}^{f_{2}}} + \log {x_{3}^{f_{3}}} + \ldots + \log {x_{n}^{f_{n}}}}{N} \\
&= \frac{1}{N}\left(f_{1}\cdot\log{x_{1}}+f_{2}\cdot\log{x_{2}}+f_{3}\cdot\log{x_{3}}+\ldots+f_{n}\cdot\log{x_{n}}\right) \\
&= \frac{1}{N}\sum_{i=1}^{n}f_{i}\cdot\log{x_{i}}.
\end{align}
Taking $\text{antilog}$ on both the sides,
\begin{align}
\therefore GM = \text{antilog}\left( \frac{\sum_{i=1}^{n}f_{i}\cdot\log{x_{i}}}{N}\right).
\end{align}

<h4><b>For Continuous Series</b></h4>

For a frequency distribution having varying class intervals, geometric mean of $n$ values,
$x_{1} , x_{2} , x_{3}, \dots, x_{n}.$
of a variate $x$ occuring with varying frequencies, is given by:
\begin{align}
GM = \text{antilog}\left( \frac{\sum_{i=1}^{n}f_{i}\cdot\log{m_{i}}}{N}\right).
\end{align}
where,
\begin{align}
N &= \sum_{i=1}^{n}{f_{i}},
\end{align}
$f_{i}$ is the frequency of the $i^{th}$ class interval and $m_{i}$ is the average of the lower and the upper values of the class interval of the $x_i^{th}$ data point.

### **Harmonic Mean**

<h4><b>For Individual Series</b></h4>

The Harmonic Mean (HM) is defined as the reciprocal of the average of the reciprocals of the data values. In general, the harmonic mean is used when there is a necessity to give greater weight to the smaller items. For $n$ individual data points,
$x_{1},x_{2},x_{3},\dots,x_{n}.$
Harmonic Mean is given by,
\begin{align}
HM = \frac{n}{\sum_{i=1}^{n}\frac{1}{x_{i}}}.
\end{align}

<h4><b>For Discrete Series</b></h4>

For a frequency distribution, harmonic mean of $n$ values,
$x_{1} , x_{2} , x_{3}, \dots, x_{n}.$
of a variate $x$ occuring with frequencies,
$f_{1} , f_{2} , f_{3}, \dots, f_{n}.$
respectively, is given by:
\begin{align}
HM = \frac{N}{\sum_{i=1}^{n}\cdot\frac{f_{i}}{x_{i}}},
\end{align}
where,
\begin{align}
N = \sum_{i=1}^{n}{f_{i}}.
\end{align}

<h4><b>For Continuous Series</b></h4>

For a frequency distribution, harmonic mean of $n$ values,
$x_{1} , x_{2} , x_{3}, \dots, x_{n}.$
of a variate $x$ having varying class intervals, is given by:
\begin{align}
HM = \frac{N}{\sum_{i=1}^{n}\cdot\frac{f_{i}}{m_{i}}},
\end{align}
where,
\begin{align}
N = \sum_{i=1}^{n}{f_{i}}.
\end{align}
and $m_{i}$ is the average of the lower and the upper values of the class interval of the $x_i^{th}$ data point.

## **Moments**

In a probability distribution, the first moment is the expected value, the second central moment is the Variance, the third standardized moment is the Skewness, and the fourth standardized moment is the Kurtosis. The mathematical concept is closely related to the concept of moment in physics. They describe the shape of the function, independently of translation.

### **Moments about mean**

<h4><b>For Individual Series</b></h4>

For an individual series, $x_{1},x_{2},x_{3},\dots,x_{n},$ moment about the mean $\bar{x}$ is given by,
\begin{align}
\mu_{r} &= \frac{\sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{r}}{n}.
\end{align}

<h4><b>For Discrete Series</b></h4>

For a discrete series,
\begin{align}
\mu_{r} &= \frac{\sum_{i=1}^{n}f_{i}\cdot\left(x_{i}-\bar{x}\right)^{r}}{N}.
\end{align}
where,
\begin{align}
N &= \sum_{i=1}^{n}f_{i}.
\end{align}

<h4><b>For Continuous Series</b></h4>

For a continuous series,
\begin{align}
\mu_{r} &= \frac{\sum_{i=1}^{n}f_{i}\cdot\left(m_{i}-\bar{x}\right)^{r}}{N}.
\end{align}
where,
\begin{align}
N &= \sum_{i=1}^{n}f_{i}.
\end{align}
and $m_{i}$ is the average of the lower and the upper values of the class interval of the $x_i^{th}$ data point.

### **Moments about arbitrary point**

<h4><b>For Individual Series</b></h4>

For an individual series, $x_{1},x_{2},x_{3},\dots,x_{n},$ moment about an arbitrary point $A$ is given by,
\begin{align}
\mu_{r}' &= \frac{\sum_{i=1}^{n}\left(x_{i}-A\right)^{r}}{n}.
\end{align}

<h4><b>For Discrete Series</b></h4>

Moment about an arbitrary point $A$ is given by,
\begin{align}
\mu_{r}' &= \frac{\sum_{i=1}^{n}f_{i}\cdot\left(x_{i}-A\right)^{r}}{N}.
\end{align}
where,
\begin{align}
N &= \sum_{i=1}^{n}f_{i}.
\end{align}

<h4><b>For Continuous Series</b></h4>

For a continuous series,
\begin{align}
\mu_{r}' &= \frac{\sum_{i=1}^{n}f_{i}\cdot\left(m_{i}-A\right)^{r}}{N}.
\end{align}
where,
\begin{align}
N &= \sum_{i=1}^{n}f_{i}.
\end{align}
and $m_{i}$ is the average of the lower and the upper values of the class interval of the $x_i^{th}$ data point.

### **Relations between various moments**

1. For $r=1,$
\begin{align}
\mu_{1}' = \bar{x} - A.
\end{align}
2. Moments about the origin:
\begin{align}
v_{r} = \frac{\sum_{i=1}^{n}f{i}\cdot x_{i}^{r}}{N}.
\end{align}
3. Relation bwtween $\mu_{r}$ and $\mu_{r}':$
\begin{align}
\mu_{1} &= 0.\\
\mu_{2} &= \mu_{2}' - \mu_{1}'^{2}.\\
\mu_{3} &= \mu_{3}' - 3\mu_{2}'\mu_{1}' + 2\mu_{1}'^{3}.\\
\mu_{4} &= \mu_{4}' - 4\mu_{3}'\mu_{1}' + 6\mu_{2}'\mu_{1}'^{2} - 3\mu_{1}'^{4}.
\end{align}
4. Relation bwtween $v_{r}$ and $\mu_{r}:$
\begin{align}
v_{1} &= \bar{x}.\\
v_{2} &= \mu_{2} + \bar{x}^{2}.\\
v_{3} &= \mu_{3} + 3\mu_{2}\bar{x} + \bar{x}^{3}.\\
v_{4} &= \mu_{4} + 4\mu_{3}\bar{x} + 6\mu_{2}\bar{x}^{2} + \bar{x}^{4}.
\end{align}

<h4><b>Karl Pearson's Coefficients:</b></h4>

*   $\beta$ coefficients:
$\beta_{1} = \frac{\mu_{3}^{2}}{\mu_{2}^{3}}\text{ and }\beta_{2} = \frac{\mu_{4}}{\mu_{2}^{2}}.$
*   $\gamma$ coefficients:
$\gamma_{1} = \sqrt{\beta_{1}}\text{ and }\gamma_{2} = \beta_{2} - 3.$

### **Moment Generating Functions**

<h4><b>For Continuous Variable</b></h4>

Moment Generating Function for a continuous variable $x$ is defined as,
\begin{align}
M_{x}\left(t\right) &= E\left(e^{tx}\right)\\
&= \int_{a}^{b}e^{tx}f\left(x\right)dx\\ \\
\therefore
M_{x}\left(t\right) &= \int_{a}^{b}e^{tx}f\left(x\right)dx.
\end{align}
The $r^{th}$ moment about the origin,
\begin{align}
v_{r} = \left|\frac{d^{r}}{dt^{r}}M_{x}\left(t\right)\right|_{t=0}.
\end{align}
(Q.) Obtain the moment generating function of the random variable $x$ having the following probability distribution $f(x),$
\begin{equation}
f\left(x\right)=
    \begin{cases}
        x & \text{for } 0<x<1 \\
        2-x & \text{for } 1 \leq x <2 \\
        0 & \text{elsewhere.}
    \end{cases}
\end{equation}

Also determine the moments $v_{1}$, $v_{2}$, variance $\mu_{2}$ and standard deviation.

(A.)
\begin{equation}
M_{x}(t) = \int_{-\infty}^{\infty}e^{tx}f(x)dx
\end{equation}
Given: $f(x)$ evaluates to $0$ everywhere except $x\in(0,1)$ and $x\in[1,2).$
\begin{equation}
M_{x}(t) = \int_{0}^{1}xe^{tx}dx \ + \int_{1}^{2}(2-x)e^{tx}dx
\end{equation}

In [43]:
# @title Evaluating the integral { vertical-output: true, display-mode: "form" }
x, t = smp.symbols('x t',real = True,nonzero = True)
f1 = smp.exp(t*x)*x
f2 = smp.exp(t*x)*(2-x)
M = (smp.integrate(f1,(x,0,1)) + smp.integrate(f2,(x,1,2))).simplify()
M

(exp(2*t) - 2*exp(t) + 1)/t**2

\begin{equation}
M_{x}(t) = \frac{e^{2t}-2e^{t}+1}{t^{2}} = \frac{\left(e^{t}-1\right)^{2}}{t^{2}}
\end{equation}
Using binomial expansion of $e^{t}$,
\begin{equation}
M_{x}(t) = \frac{\left(\left[1+\frac{t}{1!}+\frac{t^{2}}{2!}+\frac{t^{3}}{3!}\dots\right]-1\right)^{2}}{t^{2}}
\end{equation}
Simplifying the expression,
\begin{align}
M_{x}(t) &= \frac{\left(\frac{t}{1!}+\frac{t^{2}}{2!}+\frac{t^{3}}{3!}\dots\right)^{2}}{t^{2}} \\
&= \frac{t^{2}\cdot\left(1+\frac{t}{2}+\frac{t^{2}}{6}\dots\right)^{2}}{t^{2}} \\
&= \left(1+\frac{t}{2}+\frac{t^{2}}{6}\dots\right)\cdot\left(1+\frac{t}{2}+\frac{t^{2}}{6}\dots\right) \\
&= 1+t+\frac{7}{12}t^{2}\dots
\end{align}
Neglecting higher order terms,
\begin{equation}
\therefore M_{x}(t) = 1+t+\frac{7}{12}t^{2}.
\end{equation}
Mean $\bar{x} = v_{1} = \left.\frac{d}{dt}M_{x}(t)\right|_{t=0}$.
Taking derivative of
\begin{align}
\left.\frac{d}{dt}M_{x}(t)\right|_{t=0} &= \left.1+\frac{14}{12}t\right|_{t=0} = 1.\\
\end{align}
$\therefore$ Mean $\bar{x} = v_{1} = 1.$

Variance $(V): v_{2} = \mu_{2}+\bar{x}^{2} \implies \mu_{2} = v_{2} - \bar{x}^{2}$ and $v_{2} = \left.\frac{d^{2}}{dt^{2}}M_{x}(t)\right|_{t=0}$.

Taking derivative w.r.t $t$ on both the sides,
\begin{align}
\left.\frac{d^{2}}{dt^{2}}M_{x}(t)\right|_{t=0} &= \left.\frac{14}{12}\right|_{t=0} = \frac{7}{6}.\\
\end{align}
$\therefore$ Variance $(V),$ $\mu_{2} = v_{2} - \bar{x}^{2} = \frac{7}{6} - 1 = \frac{1}{6}.$

Standard Deviation $(\sigma) = \sqrt{\text{Variance$(V)$}}.$

$\therefore$ Standard Deviation $(\sigma) = \frac{1}{\sqrt{6}}.$

(Q.) Find the moment generating function of the following exponential distribution:

\begin{equation}
f(x) = \frac{1}{c}e^{\frac{-x}{c}};\ x\in[0,\infty),\ c>0.
\end{equation}
Hence, find its mean and standard deviation.

(A.) Using the Moment Generating Function formlae:
\begin{equation}
M_{x}\left(t\right) = \int_{0}^{\infty}e^{tx}\frac{e^{\frac{-x}{c}}}{c}dx = \frac{1}{c}\int_{0}^{\infty}e^{\left(t-\frac{1}{c}\right)x}dx.
\end{equation}

In [81]:
# @title Evaluating the integral { vertical-output: true, display-mode: "form" }
x, c, t = smp.symbols('x c t',real = True,positive = True)
f = 1/c * smp.exp((t-1/c)*x)
M = smp.integrate(f,(x,0,smp.oo)).simplify()
M = smp.piecewise_fold(M).args[0][0]
M

-1/(c*t - 1)

\begin{equation}
M_{x}(t) = -\frac{1}{ct-1} = \left[1-ct\right]^{-1}
\end{equation}
Using binomial expansion,
\begin{equation}
M_{x}(t) = 1+ct+c^{2}t^{2}+c^{3}t^{3}+\dots
\end{equation}
Mean $\bar{x} = v_{1} = \left.\frac{d}{dt}M_{x}(t)\right|_{t=0}$.
Taking derivative of
\begin{align}
\left.\frac{d}{dt}M_{x}(t)\right|_{t=0} = \left[c+2c^{2}t+3c^{3}t^{2}+\dots\right]_{t=0} = c.
\end{align}
$\therefore$ Mean $\bar{x} = v_{1} = c.$

Variance $(V): v_{2} = \mu_{2} - \bar{x}^{2} \implies \mu_{2} = v_{2} - \bar{x}^{2}$ and $v_{2} = \left.\frac{d^{2}}{dt^{2}}M_{x}(t)\right|_{t=0}$.

Taking derivative w.r.t $t$ on both the sides,
\begin{align}
\left.\frac{d^{2}}{dt^{2}}M_{x}(t)\right|_{t=0} &= \left|2c^{2}+6c^{3}t+\dots\right|_{t=0} = 2c^{2}.\\
\end{align}
$\therefore$ Variance $(V),$ $\mu_{2} = v_{2} - \bar{x}^{2} = 2c^{2} - c^{2} = c^{2}.$

Standard Deviation $(\sigma) = \sqrt{\text{Variance$(V)$}}.$

$\therefore$ Standard Deviation $(\sigma) = \sqrt{c^{2}} = c.$