# NEP Case Study

## Problem Statement

**A survey was done, after the draft Education Policy 2020 was published in a country, with 578 college teachers. Each of them was asked whether they voted for the ruling party in 2019 or not and whether they are in favor of or against the NEP. The following table shows the result. Does it show evidence that voting preference is independent of opinion on NEP?**

|  | Favours NEP | Against NEP | Total |
| --- | --- | --- | --- |
| Voted for ruling party | 205 | 30 | 235
| Did not vote for ruling party | 64 | 279 | 343
|Total| 269| 309| 578


By observing the data, we can see that most of the people who voted for the ruling party (around 205 out of 235) are in favour of NEP whereas the people who did not vote for the ruling party (around 279 out of 343) are against NEP. Let's perform a hypothesis test to see if there are enough statistical evidence to support our observation.

### Import the necessary libraries

In [1]:
import numpy as np
import pandas as pd
from   scipy.stats import chi2_contingency   # For Chi-Square test 

### Reading the data into the DataFrame

In [2]:
df = pd.read_csv('NEP.csv')

In [3]:
df.head()

Unnamed: 0,-,Favours NEP,Against NEP
0,Voted for ruling party,205,30
1,Did not vote for ruling party,64,279


* The data is in the form of a two-by-two contingency table, with the counts of 'Favours NEP' and 'Against NEP' in the columns; and the counts of 'Voted for ruling party' and 'Did not vote for ruling party' in the rows.

## Step 1: Define null and alternative hypotheses

$H_0:$ Voting preference is independent of Opinion on NEP

$H_a:$ Voting preference is NOT independent of Opinion on NEP

## Step 2: Select Appropriate test

> The formulated hypotheses can be tested using a Chi-square test of independence of attributes, concerning the two categorical variables, opinion on NEP (in favour of/against the policy) and voting preference (voted/did not vote for ruling party).

* Categorical variables - Yes
* Expected value of the number of sample observations in each level of the variable is at least 5 - Yes, the number of observations in each level is greater than 5.
* Random sampling from the population - Yes, we are informed that the collected sample is a simple random sample.

## Step 3: Decide the significance level

Here, we select α= 0.05.

## Step 4: Data Preparation

In [7]:
# prepare the data by dropping the first column
data = df.drop(df.columns[0], axis = 1)
data

Unnamed: 0,Favours NEP,Against NEP
0,205,30
1,64,279


## Step 5: Calculate the p-value

In [8]:
# use chi2_contingency() to find the p-value
chi, p_val, dof, exp_freq = chi2_contingency(data)
# print the p-value
print('The p-value is', p_val)

The p-value is 1.1307328231776248e-58


## Step 6: Compare the p-value with $\alpha$

In [10]:
# print the conclusion based on p-value
if p_val < 0.05:
    print(f'As the p-value {p_val} is less than the level of significance, we reject the null hypothesis.')
else:
    print(f'As the p-value {p_val} is greater than the level of significance, we fail to reject the null hypothesis.')

As the p-value 1.1307328231776248e-58 is less than the level of significance, we reject the null hypothesis.


## Step 7:  Conclusion

Since the pvalue is < 0.05, we reject the null hypothesis. Hence, we have enough statistical evidence to say that voting preference is NOT independent of opinion on NEP.

### Insight

Voting preference is NOT independent of opinion on NEP. 