# Hypothesis Testing Q4 | CustomerOrderForm dataset

### To determine whether the defective percentage varies by center at a 5% significance level, we can use hypothesis testing
### Here are the steps to perform hypothesis testing:

### 1. Import necessary libraries and load the data from the CSV file into a pandas DataFrame.
### 2. Define null and alternative hypotheses.
### 3. Calculate the test statistic and p-value using chi-square contingency test.
### 4.Compare the p-value with the significance level to determine whether to reject or fail to reject the null hypothesis.

In [1]:
# Import necessary libraries
import pandas as pd
from scipy.stats import chi2_contingency

In [2]:
# Load data from CSV file into a pandas DataFrame
df = pd.read_csv('CustomerOrderForm.csv')

In [3]:
df.head()

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free


In [4]:
df.shape

(300, 4)

In [5]:
df.describe()

Unnamed: 0,Phillippines,Indonesia,Malta,India
count,300,300,300,300
unique,2,2,2,2
top,Error Free,Error Free,Error Free,Error Free
freq,271,267,269,280


In [6]:
df.isnull().sum()

Phillippines    0
Indonesia       0
Malta           0
India           0
dtype: int64

In [7]:
# Checking value counts in data
print(df['Phillippines'].value_counts(),'\n',df['Indonesia'].value_counts(),'\n',df['Malta'].value_counts(),'\n',df['India'].value_counts())


Error Free    271
Defective      29
Name: Phillippines, dtype: int64 
 Error Free    267
Defective      33
Name: Indonesia, dtype: int64 
 Error Free    269
Defective      31
Name: Malta, dtype: int64 
 Error Free    280
Defective      20
Name: India, dtype: int64


In [8]:
# Creating Contingency table
contingency_table = [[271,267,269,280],
                    [29,33,31,20]]
print(contingency_table)

[[271, 267, 269, 280], [29, 33, 31, 20]]


### Define null and alternative hypotheses
### H0: The defective percentage does not vary by center.
### HA: The defective percentage varies by center.

In [9]:
alpha = 0.05

In [10]:
# Calculate test statistic and p-value using chi-square contingency test
test_statistic, p_value, dof, expected_values = chi2_contingency(contingency_table)
print("P_value = ", p_value,"\n","degree of freedom = ",dof,'\n', "Expected Values =", expected_values)

P_value =  0.2771020991233135 
 degree of freedom =  3 
 Expected Values = [[271.75 271.75 271.75 271.75]
 [ 28.25  28.25  28.25  28.25]]


In [11]:
# Compare p-value with significance level
if p_value < alpha:
    print("We reject the null hypothesis.")
else:
    print("We fail to reject the null hypothesis.")

We fail to reject the null hypothesis.


### We fail to reject the null hypothesis i.e The defective percentage does not vary by center.