# Hypothesis Testing

Hypothesis testing a process lays the foundation to the concept of inferential statistics. 

As a means of conducting a hypothesis testing we follow a certain procedure:

1.) Defining the null and alternative hypothesis
2.) State the significance level or the value of alpha
3.) Calculate the degree of freedom
4.) state the decision rule 
5.) Calculate the test statistic
6.) State results depending on the test statistic and p-value
7.) Draw conclusion

# One Sample T-Test

A one sample t-test checks whether a sample mean differs from the population mean.

In [1]:
%matplotlib inline

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math

In [3]:
## we are creating two set of randomly generated variables.
## Variable one contains the population of voters across the country - this becomes our population
## Variable two contains the population of voters in Melbourne - this becomes our sample

np.random.seed(6)

Australia_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
Australia_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
Australia_ages = np.concatenate((population_ages1, population_ages2))

Melbourne_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
Melbourne_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
Melbourne_ages = np.concatenate((minnesota_ages1, minnesota_ages2))

print( Australia_ages.mean() )
print( Melbourne_ages.mean() )

43.000112
39.26


H0 - The sample mean differs from the population mean


H1 - The sample mean does not differs from the population mean

In [4]:
stats.ttest_1samp(a = Melbourne_ages,               
                 popmean = Australia_ages.mean()) 

Ttest_1sampResult(statistic=-2.5742714883655027, pvalue=0.013118685425061678)

In [22]:
# The test result shows the test statistic "t" is equal to -2.574.

# This test statistic tells us how much the sample mean deviates from the null hypothesis. 

In [27]:
## Keeping n = 50 - 1 = 49 and alpha values against two tailed test = 0.05 in the t-distribution table we get value = 2.0096

## It means that the lower limit of t-statistic should be -2.0096 and upper limit should be +2.0096.

## Any value of t-statistic below the lower limit and beyond the upper limit would lead to rejection of null hypothesis 
## based on t-statistic.

## We are keeping critical value against two tailed test because the sample mean can differ from population mean in either
## of the two directions - thus it becomes a two tailed test.

## Below we are calculating the upper and lower limit through stats.t.ppf

In [7]:
stats.t.ppf(q=0.025,  # Quantile to check
            df=49)  # Degrees of freedom

-2.0095752344892093

In [8]:
stats.t.ppf(q=0.975,  # Quantile to check
            df=49)  # Degrees of freedom

2.009575234489209

In [15]:
## We can see that the t-statistic lies outside the respective quantiles.

## Let's move our direction from t-statistic to p-value and see if we are drawing the same inference.

In [16]:
sigma = Melbourne_ages.std()/math.sqrt(50)  # Sample stdev/sample size

stats.t.interval(0.95,                        # Confidence level
                 df = 49,                     # Degrees of freedom
                 loc = Melbourne_ages.mean(), # Sample mean
                 scale= sigma)                # Standard dev estimate

(36.369669080722176, 42.15033091927782)

In [19]:
## We can see that if we construct a 95% confidence interval for the sample - it will not capture the population mean.

## However, at 95% CI - our p-value (= 0.013118685425061678) is less than the significance level ( = 0.05) and thus 
## we can reject null hypothesis.

## Let's increase the CI to 99% and see if the interval subjected on sample is capturing the population mean.

In [20]:
stats.t.interval(alpha = 0.99,                # Confidence level
                 df = 49,                     # Degrees of freedom
                 loc = Melbourne_ages.mean(), # Sample mean
                 scale= sigma)                # Standard dev estimate

(35.40547994092107, 43.11452005907893)

In [21]:
## The 99% CI over sample is capturing the mean of population.

## In this case, our p-value (= 0.013118685425061678) is NOT less than the significance level ( = 0.01) and thus
## we cannot reject the null hypothesis.

Conclusion - The sample mean differs, i.e. it is not equal to its population mean.

# Two Sample T-Test

A two sample t-test checks whether the mean of two independent samples differs from one another. Unlike a one sample t-test which tests against the population mean, a two sample t-test tests against the mean of another independent sample.

In [29]:
## Randomly generating a variable which will depict the age of voters from Brisbane city.

np.random.seed(12)
Brisbane_ages1 = stats.poisson.rvs(loc=18, mu=33, size=30)
Brisbane_ages2 = stats.poisson.rvs(loc=18, mu=13, size=20)
Brisbane_ages = np.concatenate((wisconsin_ages1, wisconsin_ages2))

print( Brisbane_ages.mean() )

42.8


H0: The mean voter age in Melbourne is equal to mean voter age in Brisbane.

H1: The mean voter age in Melbourne is not equal to mean voter age in Brisbane.

In [31]:
stats.ttest_ind(a= Melbourne_ages,
                b= Brisbane_ages,
                equal_var=False)    # Assume samples have equal variance

Ttest_indResult(statistic=-1.7083870793286842, pvalue=0.09073104343957748)

In [32]:
## Here, p-value (= 0.09073104343957748) is greater than the significance level ( = 0.01).

## Thus, we fail to reject the null hypothesis

Conclusion - The mean age of voting in Melbourne is identical to mean age of voting in Brisbane.

# Paired T-Test

In case of a paired t-test, we are testing between the groups which are originally generated from same sample. Thus, the two groups are not independent and rather paired.

For e.g. - a hospital wants to check the effectiveness of a weight loss drug. For this, it generates two set of data, i.e. weight of same set of patients before and after they were subjected to the weight loss drug.

In [41]:
np.random.seed(11)

before= stats.norm.rvs(scale=30, loc=250, size=100)

after = before + stats.norm.rvs(scale=5, loc=-1.25, size=100)

weight_df = pd.DataFrame({"weight_before":before,
                          "weight_after":after,
                          "weight_change":after-before})

weight_df.head()

Unnamed: 0,weight_before,weight_after,weight_change
0,302.483642,305.605006,3.121364
1,241.41781,240.526071,-0.891739
2,235.463046,226.017788,-9.445258
3,170.400443,165.91393,-4.486513
4,249.751461,252.590309,2.838848


In [42]:
weight_df.describe()             

Unnamed: 0,weight_before,weight_after,weight_change
count,100.0,100.0,100.0
mean,250.345546,249.115171,-1.230375
std,28.132539,28.422183,4.783696
min,170.400443,165.91393,-11.495286
25%,230.421042,229.148236,-4.046211
50%,250.830805,251.134089,-1.413463
75%,270.637145,268.927258,1.738673
max,314.700233,316.720357,9.759282


In [43]:
## We see can from the above data summary that the patients lost nearly 1.23 kgs of weight on an average due to the pills.

H0: The mean weight differs before and after the experiment

H1: The mean weight does not differs before and after the experiment

In [44]:
stats.ttest_rel(a = before,
                b = after)

Ttest_relResult(statistic=2.5720175998568284, pvalue=0.011596444318439857)

In [45]:
## Here, we can see that the p-value is greater than the significance level ( = 0.01)

## Thus, we fail to reject the null hypothesis.

Conclusion - The mean weight of patients differs before and after the experiment - thus the pill seems effective.

# Chi-Squared Goodness-Of-Fit Test

Chi-Square goodness of fit test is identical in working to the one sample t-test. It tests whether the distribution of sample is identical to the distribution of its population. Also, since the test is implied on categorical data, we deal with count of respective classes and not their exact values.

In [46]:
import scipy.stats as stats

In [48]:
## generating random set of demographics for Australia and Melbourne

Australia = pd.DataFrame(["white"]*100000 + ["hispanic"]*60000 +\
                        ["black"]*50000 + ["asian"]*15000 + ["other"]*35000)
           

Melbourne = pd.DataFrame(["white"]*600 + ["hispanic"]*300 + \
                         ["black"]*250 +["asian"]*75 + ["other"]*150)

Australia_table = pd.crosstab(index=Australia[0], columns="count")
Melbourne_table = pd.crosstab(index=Melbourne[0], columns="count")

print( "Australia")
print(Australia_table)
print(" ")
print( "Melbourne")
print(Melbourne_table)

Australia
col_0      count
0               
asian      15000
black      50000
hispanic   60000
other      35000
white     100000
 
Melbourne
col_0     count
0              
asian        75
black       250
hispanic    300
other       150
white       600


H0: The distribution of sample is same as that of the population

H1: The distribution of sample is not same as that of the population

In [49]:
## calculating chi-square test statistic

observed = Melbourne_table

Australia_ratios = Australia_table/len(Australia)  # Get population ratios

expected = Australia_ratios * len(Melbourne)   # Get expected counts

chi_squared_stat = (((observed-expected)**2)/expected).sum()

print(chi_squared_stat)

col_0
count    18.194805
dtype: float64


In [50]:
## Note: The chi-squared test assumes none of the expected counts are less than 5.

In [51]:
crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence*
                      df = 4)   # Df = number of variable categories - 1

print("Critical value")
print(crit)

p_value = 1 - stats.chi2.cdf(x=chi_squared_stat,  # Find the p-value
                             df=4)
print("P value")
print(p_value)

Critical value
9.487729036781154
P value
[0.00113047]


In [52]:
## Since our chi-square statistic ( = 18.194805) is greater than the critical value ( = 9.487729036781154)
## we reject the null hypothesis and conclude that sample distribution is not identical to that of distribution of population.

In [None]:
## The same inference can be drawn on the basis of p-value. 

## The p-value ( = 0.00113047) is less than the significance level ( = 0.05) and thus the rejection of null hypothesis is 
## evident.

In [53]:
## Direct way of conducting a chi-square goodness of fit test

stats.chisquare(f_obs= observed,   # Array of observed counts
                f_exp= expected)   # Array of expected counts

Power_divergenceResult(statistic=array([18.19480519]), pvalue=array([0.00113047]))

Conclusion - Since the null hypothesis got rejected we can say that distribution of demographics in Melbourne is not 
identical to that in entire Australia.

# Chi-Squared Test of Independence

The chi-squared test of independence essentially checks whether two categorical variables are independent of each other.

For e.g. It can be implied to check whether gender of a person impacts the casted vote. If these two variables, when tested, comes independent - then it can inferred that sex of an individual has no impact on the casting of vote.

In [54]:
np.random.seed(10)

# Sample data randomly at fixed probabilities
voter_race = np.random.choice(a= ["asian","black","hispanic","other","white"],
                              p = [0.05, 0.15 ,0.25, 0.05, 0.5],
                              size=1000)

# Sample data randomly at fixed probabilities
voter_party = np.random.choice(a= ["democrat","independent","republican"],
                              p = [0.4, 0.2, 0.4],
                              size=1000)

voters = pd.DataFrame({"race":voter_race, 
                       "party":voter_party})

voter_tab = pd.crosstab(voters.race, voters.party, margins = True)

voter_tab.columns = ["democrat","independent","republican","row_totals"]

voter_tab.index = ["asian","black","hispanic","other","white","col_totals"]

observed = voter_tab.iloc[0:5,0:3]   # Get table without totals for later use
voter_tab

Unnamed: 0,democrat,independent,republican,row_totals
asian,21,7,32,60
black,65,25,64,154
hispanic,107,50,94,251
other,15,8,15,38
white,189,96,212,497
col_totals,397,186,417,1000


H0:  The vote casted to a specific party is independent of the race to which the individual belongs 

H1:  The vote casted to a specific party is dependent on the race to which the individual belongs

In [55]:
expected =  np.outer(voter_tab["row_totals"][0:5],
                     voter_tab.loc["col_totals"][0:3]) / 1000

expected = pd.DataFrame(expected)

expected.columns = ["democrat","independent","republican"]
expected.index = ["asian","black","hispanic","other","white"]

expected

Unnamed: 0,democrat,independent,republican
asian,23.82,11.16,25.02
black,61.138,28.644,64.218
hispanic,99.647,46.686,104.667
other,15.086,7.068,15.846
white,197.309,92.442,207.249


In [56]:
chi_squared_stat = (((observed-expected)**2)/expected).sum().sum()

print(chi_squared_stat)

## We call .sum() twice: once to get the column sums and a second time to add the column sums together, returning the 
## sum of the entire 2D table.

7.169321280162059


In [57]:
crit = stats.chi2.ppf(q = 0.95, # Find the critical value for 95% confidence*
                      df = 8)   # *

print("Critical value")
print(crit)

p_value = 1 - stats.chi2.cdf(x=chi_squared_stat,  # Find the p-value
                             df=8)
print("P value")
print(p_value)

## The degrees of freedom for a test of independence equals the product of the number of categories in each variable minus 1. 
## In this case we have a 5x3 table so df = 4x2 = 8.

Critical value
15.50731305586545
P value
0.518479392948842


In [58]:
## Here, the p-value (= 0.518479392948842) is greater than the significance level (=0.05)

## &&

## The chi-square test statistic ( = 7.169321280162059) is less than the critical value (= 15.50731305586545)

## Both of which individually depicts that the test fails to reject the null - hypothesis.

In [59]:
## Direct way of applying the chi-squared test of independence

stats.chi2_contingency(observed= observed)

## The output includes - test statistic, p-value, degree of freedom. followed by expected counts.

(7.169321280162059,
 0.518479392948842,
 8,
 array([[ 23.82 ,  11.16 ,  25.02 ],
        [ 61.138,  28.644,  64.218],
        [ 99.647,  46.686, 104.667],
        [ 15.086,   7.068,  15.846],
        [197.309,  92.442, 207.249]]))

Conclusion - It can be concluded that the relationship between race of an individual and the casted voted is statistically insiginificant, i.e. the variables are independent of each other.

# One-Way Anova

ANOVA as per its name stands for "analysis of variance". The basic implementation of ANOVA is wherein you want to see how a numerical value differs across few categorical groups. Unlike t-test wherein you used to compare means of two groups, ANOVA is implied to compare more than two groups.

Another fundamental difference between ANOVA and t-test is - we use "t-distribution" for t-test and "f-distribution" for ANOVA.

The reason that we dont carry out a specific t-test for each group and then draw the results is it increases the chances of false positives in the test.

Another method of ANOVA is two-way anova. (I am keeping that for some other project)

In [60]:
import scipy.stats as stats

In [62]:
## creating set of voters as per their age and then grouping them based on their race.

np.random.seed(12)

races =   ["asian","black","hispanic","other","white"]

# Generate random data
voter_race = np.random.choice(a= races,
                              p = [0.05, 0.15 ,0.25, 0.05, 0.5],
                              size=1000)

voter_age = stats.poisson.rvs(loc=18,
                              mu=30,
                              size=1000)

# Group age data by race
voter_frame = pd.DataFrame({"race":voter_race,"age":voter_age})
groups = voter_frame.groupby("race").groups

# Etract individual groups
asian = voter_age[groups["asian"]]
black = voter_age[groups["black"]]
hispanic = voter_age[groups["hispanic"]]
other = voter_age[groups["other"]]
white = voter_age[groups["white"]]


H0:   The mean age of voters across the race is identical

H1:   The mean age of voters across the race is not identical.

In [67]:
# Perform the ANOVA
stats.f_oneway(asian, black, hispanic, other, white)

F_onewayResult(statistic=1.7744689357329695, pvalue=0.13173183201930463)

In [68]:
## Since the p-value is more than 0.05 - we fail to reject the null hypothesis.

Conclusion - Since the tests results came in favour of the null hypothesis, we can conclude by saying that there is no statistical difference between the age of voters across the race.