# HYPOTHESIS TESTING

In [1]:
import pandas as pd
import numpy as np
import scipy
from scipy import stats
import statsmodels.api as sm
from scipy.stats import chi2
from scipy.stats import chi2_contingency

### Q1

In [2]:
cutlet=pd.read_csv("/Cutlets.csv")
cutlet.head()

Unnamed: 0,Unit A,Unit B
0,6.809,6.7703
1,6.4376,7.5093
2,6.9157,6.73
3,7.3012,6.7878
4,7.4488,7.1522


In [3]:
p=stats.ttest_ind(cutlet["Unit A"],cutlet["Unit B"])
p[1]

0.4722394724599501

**Assumptions & Test**                       
**µa**= Mean of Unit A                                                              
**µb**= Mean of Unit B                                                              
since the significance level is 5%                                 
∴ **α** = 5/100=0.05                                                          
Null Hypothesis(**h0**)  -> **µa = µb**                                              
Alternate Hypothesis(**h1**) -> **µa ≠ µtb**                                           
using **t-test** it is **2 sample 2 tail**                                        
Here **p=0.472 & α=0.05** i.e **p>α**             
**∴ We Fail to reject the Null Hypothesis(h0) and we reject the Alternate Hypothesis(h1)**   
**∴ There is no significant difference in the diameter of the cutlet between two units A and B.**

### Q2

In [4]:
labtat=pd.read_csv("/LabTat.csv")
labtat.head()

Unnamed: 0,Laboratory 1,Laboratory 2,Laboratory 3,Laboratory 4
0,185.35,165.53,176.7,166.13
1,170.49,185.91,198.45,160.79
2,192.77,194.92,201.23,185.18
3,177.33,183.0,199.61,176.42
4,193.41,169.57,204.63,152.6


In [5]:
p=stats.f_oneway(labtat.iloc[:,0], labtat.iloc[:,1],labtat.iloc[:,2],labtat.iloc[:,3])
p[1]

2.1156708949992414e-57

**Assumptions & Test**                       
**µ1**= Mean of Laboratory 1                                                              
**µ2**= Mean of Laboratory 2             
**µ3**= Mean of Laboratory 3                
**µ4**= Mean of Laboratory 4                                 
since the significance level is 5%                                 
∴ **α** = 5/100=0.05                                                          
Null Hypothesis(**h0**)  -> **µ1 = µ2 = µ3 = µ4**                                                                               
Alternate Hypothesis(**h1**) -> **µ1 ≠ µ2 = µ3 = µ4 (Atleast one Mean µ is different ≠)**                                       
using **ANOVA Test**                                                                                         
Here **p=0.000 & α=0.05** i.e **p<α**             
**∴ We reject the Null Hypothesis(h0) and we accept the Alternate Hypothesis(h1)**   
**∴ there is difference in the average Turn Around Time (TAT) of reports of the laboratories on their preferred list.**

### Q3

In [6]:
buyerratio=pd.read_csv("/BuyerRatio.csv")
buyerratio

Unnamed: 0,Observed Values,East,West,North,South
0,Males,50,142,131,70
1,Females,435,1523,1356,750


In [7]:
p=stats.chi2_contingency([buyerratio["East"],buyerratio["West"],buyerratio["North"],buyerratio["South"]])
p[1]

0.6603094907091882

In [8]:
# critical value(x2c)
c=chi2.ppf(0.95,3)
c

7.814727903251179

In [9]:
#chi square (x2) fail to reject ho
p[0]

1.5959455386610577

**Assumptions & Test**                       
**p[0]** = chi square (x2)                                                              
**p[1]** = p value
**c**    = critical value
since the significance level is 5%                                 
∴ **α** = 5/100=0.05                                                          
Null Hypothesis(**h0**)  -> **All proportions are equal i.e all variables are Independent**                               
Alternate Hypothesis(**h1**) -> **Not all proportions are equal i.e all variables are Dependent**                              
using **Chi Square Test**                                       
Here **p=0.6603 & α=0.05** i.e **p>α**  and **x2=1.596 & χ2c=7.815** i.e **x2<χ2c**      
**∴ We fail to reject the Null Hypothesis(h0) and we reject the Alternate Hypothesis(h1)**   
**∴ Male female buyer rations for sale of products are similar across regions and all proportions are equal.**

### Q4

In [10]:
cof=pd.read_csv("/customerorderform.csv")
cof.head()

Unnamed: 0,Phillippines,Indonesia,Malta,India
0,Error Free,Error Free,Defective,Error Free
1,Error Free,Error Free,Error Free,Defective
2,Error Free,Defective,Defective,Error Free
3,Error Free,Error Free,Error Free,Error Free
4,Error Free,Error Free,Defective,Error Free


In [11]:
p=stats.chi2_contingency([cof["Phillippines"].value_counts(),cof["Indonesia"].value_counts(),
                          cof["Malta"].value_counts(),cof["India"].value_counts()])
p[1]

0.2771020991233144

In [12]:
#chi square (x2) fail to reject ho
p[0]

3.8589606858203545

**Assumptions & Test**                       
**p[0]** = chi square (x2)                                                              
**p[1]** = p value
**c**    = critical value
since the significance level is 5%                                 
∴ **α** = 5/100=0.05                                                          
Null Hypothesis(**h0**)  -> **The defective varies by centre i.e all variables are Independent**                               
Alternate Hypothesis(**h1**) -> **The defective doesn't varies by centre  i.e all variables are Dependent**              
using **Chi Square Test**                                       
Here **p=0.277 & α=0.05** i.e **p>α**  and **x2=3.859 & χ2c=7.815** i.e **x2<χ2c**      
**∴ We fail to reject the Null Hypothesis(h0) and we reject the Alternate Hypothesis(h1)**   
**∴The defective varies by centre**