A Pearson’s chi-square test is a statistical test for categorical data. 
It is used to determine whether your data are significantly different from what you expected. 
There are two types of Pearson’s chi-square tests:

1) The chi-square goodness of fit test is used to test whether the frequency distribution of a categorical variable is different from your expectations.

2) The chi-square test of independence is used to test whether two categorical variables are related to each other.

Chi-square is often written as Χ2 and is pronounced “kai-square” (rhymes with “eye-square”). 
It is also called chi-squared.

Pearson’s chi-square (Χ2) tests, often referred to simply as chi-square tests, are among the most common nonparametric tests. Nonparametric tests are used for data that don’t follow the assumptions of parametric tests, especially the assumption of a normal distribution.

In [1]:
# for calculating chi-stat and p-value manually use 'chi2'
from scipy.stats import chi2

In [2]:
# for calculating chi-stat and p-value automatically use chisquare
from scipy.stats import chisquare

In [None]:
#          HEAD TAIL
#EXPECTED  25    25
#OBSERVED  28    22

In [None]:
# Ho = COIN IS FAIR
# Ha = NOT FAIR

In [None]:
# CHI SQUARE = (OBSERVED - EXPECTED)**2 / EXPECTED

In [None]:
# FOR HEAD = (28-25)**2/25
# FOR TAIL = (22-25)**2/25
# CHI STAT  = FOR HEAD + FOR TAIL

In [None]:
# FOR HEAD = (25-25)**2/25  -  FAIR COIN (ACCEPT NULL HYPOTHESIS)

In [None]:
# RELATION BETWEEN CHI SQUARE AND P-VALUE
# IF CHI SQUARE IS LOW OR EQUAL TO 0 - ACCEPT NULL HYPOTHESIS
# THEREFORE WHEN CHI SQUARE IS LOW - P-VALUES IS HIGH

In [None]:
# IF CHI SQUARE IS HIGH OR EQUAL TO 1 - REJECT NULL HYPOTHESIS
# THEREFORE WHEN CHI SQUARE IS HIGH - P-VALUES IS LOW

In [None]:
# TO REJECT NULL HYPOTHESIS P-VALUE SHOULD BE SMALLER THAN ALPHA
# TO ACCEPT NULL HYPOTHESIS P-VALUE SHOULD BE HIGHER THAN ALPHA

Regarding the hypotheses to be tested, all chi-square tests have the same general null and research hypotheses. 

The null hypothesis states that there is no relationship between the two variables, 

while the research hypothesis states that there is a relationship between the two variables.

In [None]:
# Q1

In [None]:
# chi square for coin toss

In [2]:
# Ho = fair
# Ha = not fair

In [16]:
# alpha = 0.05
exp = [25,25]
obs = [28,22] # under the assumption Ho is True

In [13]:
# chi square
chi_stat = (28-25)**2/25 + (22-25)**2/25
chi_stat

0.72

In [6]:
# parameter = (chi_stat,df(degree of freedom))
p_value = 1-chi2.cdf(chi_stat,df=1)
p_value

0.3961439091520741

In [None]:
# since p value is greater than alpha(0.05), therefore cant reject null hypothesis or accept null hypothesis or coin is fair.

In [None]:
# OR using chisquare

In [17]:
chi_stat,p_value = chisquare(obs,exp)

In [11]:
chi_stat,p_value

(0.72, 0.3961439091520741)

In [None]:
# Q2

In [21]:
# alpha = 0.05
exp = [25,25]
obs = [45,5]

In [22]:
chisquare(obs,exp)

Power_divergenceResult(statistic=32.0, pvalue=1.5417257900280013e-08)

#### ASSIGNMENT

In [None]:
# Q1

In [3]:
# chi2_contingency are used when data are in the form of crosstab table

from scipy.stats import chi2_contingency

In [1]:
observed = [[67,213,74],[411,633,129],[85,51,7],[27,60,15]]

In [5]:
chi2_contingency(observed)

(94.26880078578765,
 3.925170647869838e-18,
 6,
 array([[117.86681716, 191.18397291,  44.94920993],
        [390.55869074, 633.49943567, 148.94187359],
        [ 47.61286682,  77.22968397,  18.15744921],
        [ 33.96162528,  55.08690745,  12.95146727]]))

In [8]:
observed = [[67,213,74], [411,633,129], [85,51,7], [27,60,15]]
test_stat,pvalue,a,b = chi2_contingency(observed)

In [9]:
alpha = 0.05

In [10]:
if pvalue<alpha:
    print('reject null')
else:
    print('accept null')

reject null


In [None]:
# Q2

In [11]:
obs = [[33,218],[25,389],[20,393],[17,178]]

In [12]:
chi_stat,pvalue,a,b = chi2_contingency(obs)
chi_stat,pvalue,a,b

(17.51186847271713,
 0.000554511571355531,
 3,
 array([[ 18.73134328, 232.26865672],
        [ 30.89552239, 383.10447761],
        [ 30.82089552, 382.17910448],
        [ 14.55223881, 180.44776119]]))

In [None]:
# Q3

In [13]:
obs = [[77,149,78],[23,56,36],[8,24,29],[6,15,8]]

In [15]:
stat,pvalue,a,b = chi2_contingency(obs)

In [17]:
pvalue

0.038193742691133806

In [18]:
0.05 > pvalue

True

In [None]:
# Q4

In [65]:
obs = [[335,348,318],[35,23,50]]

In [21]:
alpha = 0.001

In [20]:
chi2_contingency(obs)

(11.519544916042339,
 0.003151828690194211,
 2,
 array([[333.96753832, 334.87015329, 332.16230839],
        [ 36.03246168,  36.12984671,  35.83769161]]))

In [22]:
type(chi2_contingency(obs))

tuple

In [27]:
chi2_contingency(obs)[1] < 0.01

True

In [None]:
# Q5

In [28]:
obs = [[6,38,31],[14,31,4],[50,50,5]]

In [29]:
stat,pvalue,a,b = chi2_contingency(obs)

In [31]:
pvalue

2.0217185191724964e-12

In [34]:
pvalue<0.01

True

In [None]:
# Q6

In [36]:
obs = [[75,106,46],[106,161,61],[98,183,52],[48,102,14]]

In [37]:
stat,pvalue,dof,er = chi2_contingency(obs)

In [38]:
pvalue

0.015293451318673136

In [39]:
pvalue<0.05

True

In [None]:
# Q7

In [48]:
obs=[73,38,18]
obs

[73, 38, 18]

In [49]:
exp = [0.60*129,0.28*129,0.12*129]
exp

[77.39999999999999, 36.120000000000005, 15.479999999999999]

In [40]:
73 +38+18

129

pvalue = 1 - chi2.cdf(chi_stat,df=1)
pvalue

In [50]:
chisquare(obs,exp)

Power_divergenceResult(statistic=0.7582133628645247, pvalue=0.6844725882551137)

In [None]:
66

In [None]:
# Q8

In [None]:
# coin tossed 100 times

In [54]:
exp = [50,50]
obs = [48,52]

In [55]:
chisquare(obs,exp)

Power_divergenceResult(statistic=0.16, pvalue=0.6891565167793516)

In [None]:
# Q9

In [56]:
total = 70+80+50
total

200

In [59]:
obs = [70,80,50]

In [None]:
#expected = 30%, 40%, 30%

In [57]:
exp = [0.30*200,0.40*200,0.30*200]
exp

[60.0, 80.0, 60.0]

In [60]:
chisquare(obs,exp)

Power_divergenceResult(statistic=3.3333333333333335, pvalue=0.1888756028375618)

In [62]:
significance_level = 0.05

In [64]:
chisquare(obs,exp)[1] < significance_level

False

In [None]:
# accept null hypothesis

### REVISION

In [1]:
from scipy.stats import chi2_contingency
from scipy.stats import chisquare

Regarding the hypotheses to be tested, all chi-square tests have the same general null and research hypotheses. 
The null hypothesis states that there is no relationship between the two variables, 
while the research hypothesis states that there is a relationship between the two variables.

In [None]:
# Q1

In [2]:
data = [[67,213,74],[411,633,129],[85,51,7],[27,60,15]]

In [3]:
chi2_contingency([data])

(94.26880078578765,
 3.925170647869838e-18,
 6,
 array([[[117.86681716, 191.18397291,  44.94920993],
         [390.55869074, 633.49943567, 148.94187359],
         [ 47.61286682,  77.22968397,  18.15744921],
         [ 33.96162528,  55.08690745,  12.95146727]]]))

In [None]:
# Ho : not associated, Ha : associated

In [None]:
# Q2

In [None]:
# Ho: not exist, Ha: exist

In [4]:
data = [[33,218],[25,389],[20,393],[17,178]]

In [5]:
chi2_contingency(data)

(17.51186847271713,
 0.000554511571355531,
 3,
 array([[ 18.73134328, 232.26865672],
        [ 30.89552239, 383.10447761],
        [ 30.82089552, 382.17910448],
        [ 14.55223881, 180.44776119]]))

In [None]:
# Educational level and diabetic state are associated

In [None]:
# Q3

In [None]:
# Ho:exist, Ha:not exist

In [8]:
data = [[77,149,78],[23,56,36],[8,24,29],[6,15,8]]

In [9]:
chi2_contingency(data)

(13.322313008960627,
 0.038193742691133806,
 6,
 array([[ 68.08644401, 145.72888016,  90.18467583],
        [ 25.75638507,  55.12770138,  34.11591356],
        [ 13.66208251,  29.24165029,  18.09626719],
        [  6.49508841,  13.90176817,   8.60314342]]))

In [None]:
# not exist

In [None]:
# Q4

In [10]:
data = [[335,348,318],[35,23,50]]

In [11]:
chi2_contingency(data)

(11.519544916042339,
 0.003151828690194211,
 2,
 array([[333.96753832, 334.87015329, 332.16230839],
        [ 36.03246168,  36.12984671,  35.83769161]]))

In [None]:
# not associated

In [None]:
# Q5

In [12]:
data = [[6,38,31],[14,31,4],[50,50,5]]

In [13]:
chi2_contingency(data)

(60.74604310295546,
 2.0217185191724964e-12,
 4,
 array([[22.92576419, 38.97379913, 13.10043668],
        [14.97816594, 25.4628821 ,  8.55895197],
        [32.09606987, 54.56331878, 18.34061135]]))