#ChiSquare test of Independence
###To calculate the Chi Square value from Contingency table of categorical variable

* Null Hypothesis - $H_0$ : The random variables are independeant
* Alternate Hypothesis - $H_A$ : The random variables are not independeant

**Steps**:
1. Prepare contingency table from the given data
2. Calculate the expected frequency from the contingency table using the formula
  $$ExpectedFreq = (Marjinal freq of RV1 * Marginal freq of RV2) / total $$
3. Calculate the ChiSquared value by the formula
  $$ (observed freq - expected freq)^2 / expected freq $$
4. Tabulate the ChiSquared value by the function **stats.chi2.ppf(1-0.05, df=dof)**

The null hypothesis will be rejected if **Tabulated Value < Calculated Value**

**Import packages**

In [None]:
import numpy as np
import pandas as pd
from scipy import stats

In [None]:
from google.colab import files
uploaded = files.upload()

Saving chiSquare.csv to chiSquare.csv


In [None]:
import io
df = pd.read_csv(io.BytesIO(uploaded['chiSquare.csv']))

In [None]:
df.head()

Unnamed: 0,City,Brand
0,Mumbai,A
1,Chennai,C
2,Mumbai,A
3,Mumbai,C
4,Chennai,C


In [None]:
contingTab = pd.crosstab(df.City, df.Brand, margins=True)

In [None]:
contingTab

Brand,A,B,C,All
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chennai,165,47,191,403
Mumbai,279,73,225,577
All,444,120,416,980


**Calculate the expected frequency**

In [None]:
contingTab['A']

City
Chennai    165
Mumbai     279
All        444
Name: A, dtype: int64

In [None]:
contingTab['A']['Chennai']

165

In [None]:
contingTab.transpose()['Chennai']['All']

403

In [None]:
contingTab['All']['All']

980

In [None]:
contingTab.transpose()

In [None]:
cities = list(df['City'].unique())
brands = list(df['Brand'].unique())

exp1 = {}

for i in cities:
  exp2 = {}
  for j in brands:
    exp2[j] = contingTab.transpose()[i]['All'] * contingTab[j]['All'] / (contingTab['All']['All'])

  exp1[i] = exp2


In [None]:
403*444/980

In [None]:
exp1

{'Mumbai': {'A': 261.41632653061225,
  'C': 244.93061224489796,
  'B': 70.65306122448979},
 'Chennai': {'A': 182.58367346938775,
  'C': 171.06938775510204,
  'B': 49.3469387755102}}

**Chi square calculation**

In [None]:
chiSquareCal = 0
for i in cities:
  for j in brands:
    val = (contingTab.transpose()[i][j] - exp1[i][j])**2/exp1[i][j]
    chiSquareCal = chiSquareCal + val

In [None]:
chiSquareCal

7.009543616823935

**Degrees of freedom**

In [None]:
dof = (len(cities)-1) * (len(brands)-1)
dof

2

###Tabulated Value of ChiSquare 

In [None]:
stats.chi2.ppf(1-0.05, df=dof)

5.991464547107979

The null hypothesis will be rejected if **Tabulated Value < Calculated Value**

**Shortcut to the chi-squared test**

In [None]:
contab = np.array([contingTab.transpose()['Chennai'][0:3].values,
                  contingTab.transpose()['Mumbai'][0:3].values])
stats.chi2_contingency(contab)

(7.009543616823934,
 0.03005363054744611,
 2,
 array([[182.58367347,  49.34693878, 171.06938776],
        [261.41632653,  70.65306122, 244.93061224]]))

**Equation to find the value of p**

In [None]:
1 - stats.chi2.cdf(chiSquareCal, dof)

0.030053630547446142