# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.

# Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests



Let's considera Data science enthusiast student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.

He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.

In [None]:
from scipy import stats 
import numpy as np

In [None]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Note :You can import ttest function from scipy to perform t tests

# First T test

**One sample t-test**

* Check if the mean of the GPU1 is equal to zero.

1. Null Hypothesis is that mean is equal to zero.
2. Alternate hypothesis is that it is not equal to zero.

In [None]:
from scipy.stats import ttest_1samp
from statsmodels.stats.power import ttest_power

t_stats, p_values = ttest_1samp(GPU1,0)
print(t_stats, p_values)

Pvalues < 0.05 The Pvalues is significantly less the alpha(0.05).Hence with evidence we can reject the null hypothesis. Meaning GPU1 mean is not equal to zero.

In [None]:
# Lets calculate the power of test also
from statsmodels.stats.power import ttest_power
(np.mean(GPU1)- 0 )/np.std(GPU1)

In [None]:
print(ttest_power(9.1019,nobs=15,alpha = 0.05,alternative = "two-sided"))

100% probability

**Given,**

* Null Hypothesis : There is no significant difference between data sets
* Alternate Hypothesis : There is a significant difference
* Do two-sample testing and check whether to reject Null Hypothesis or not.

In [None]:
from scipy.stats import ttest_ind
t_stats,p_values = ttest_ind(GPU1,GPU2)
print(t_stats,p_values)

p_values < 0.05 . Reject null hypothesis. There is a significant difference between GPU1 and GPU2

In [None]:
(np.mean(GPU1)-np.mean(GPU2))/np.sqrt(((15-1)*np.var(GPU1)+ (15-1)*np.var(GPU2))/15+15-2)

In [None]:
print(ttest_power(0.2885,nobs=15,alpha=0.05,alternative="two-sided"))


18.06% probability

He is trying a third GPU - GPU3.

In [None]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.

In [None]:
from scipy.stats import ttest_ind
t_stats,p_values = ttest_ind(GPU1,GPU3)
print(t_stats,p_values)

p_values > 0.05 . Donot reject null hypothesis. There is a significant difference between GPU1 and GPU3

In [None]:
(np.mean(GPU1)-np.mean(GPU3))/np.sqrt(((15-1)*np.var(GPU1) + (15-1) * np.var(GPU3))/15+15-2)


In [None]:
print(ttest_power(0.1826,nobs = 15,alpha=0.05,alternative="two-sided"))

10.11% probability

# ANOVA

If you need to compare more than two data sets at a time, an ANOVA is your best bet.

The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [None]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

**Perform levene test on the data**

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality

In [None]:
# check the equality of variances and normality of various distribution
stats.levene(e1,e2,e3)

Pvalue > 0.05, Do not Reject the Ho. Meaning atleast one one the variances are unequal.Hence,we cannot depend on the result of ANOVA

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [None]:
#e1.shape
#e2.shape
#e3.shape
stats.f_oneway(e1,e2,e3)

Pvalues > 0.05 .Hence donot reject the null hypothesis. Meaning two or more groups not have the same population mean

In one or two sentences explain about TypeI and TypeII errors.

# Type I error : (alpha)

By mistake rejecting a null hypothesis when it is true(also known as a "false positive" finding or conclusion), Alpha is the probability of rejecting null hypothesis when it is true.It is the level of significance of test. (1-Alpha) is the probability of accepting the null hypothesis when it is true.It is the confidence level of the test.

# Type II error: (Beta)

Accepting the null hypothesis when it is false(also known as a "false negative" finding or conclusion). Beta is the probability of accepting the null hypothesis when it is false. It is the (1-Beta) is the power of test. How clear the demarcation between the null and alternate hypothesis.



**Consider You are a manager of a restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. State the null and alternative hypothesis.**

# Null Hypothesis : Ho = 4.5

The waiting time has not changed in the past month from previous months and the mean is equal to 4.5minutes.

# Alternative Hypothesis : Ha != 4.5

The waiting time to place an order has changed in the past month from previous months and the mean is not equal to 4.5minutes.

# Chi square test

Let's create a small dataset for dice rolls of four players

In [None]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

df = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

* chi2 stat
* p-value
* degree of freedom
* contingency

In [None]:
df.dtype

In [None]:
#converting into categorical dtype
import pandas as pd
d1 = pd.Categorical(d1)
d2 = pd.Categorical(d2)
d3 = pd.Categorical(d3)
d4 = pd.Categorical(d4)
d5 = pd.Categorical(d5)
d6 = pd.Categorical(d6)
lst = [d1,d2,d3,d4,d5,d6]
df = pd.DataFrame(lst) 
df

In [None]:
#stats.chi2_contingency(dice)

chi2, p, dof, ex = stats.chi2_contingency(df, correction=False)
print("chi2 :",chi2)
print("P_value: ",p)
print("degree of freedom:",dof)
print('\n')
print("ex :",ex)

Pvalues > 0.01 hence donot reject the null hypothesis

# Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.



In [None]:
import scipy.stats as sc
z_scores = sc.zscore(df)
z_scores

In [None]:
import scipy as sc
p_values = 1 - sc.special.ndtr(z_scores)
#p_value = sc.norm.pdf(abs(z_scores))
p_values

In [None]:
#z_scores.mean()
p_values.mean()


Pvalues > 0.05 hence donot reject null hypothesis.

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. In some cases, you might be interested in testing differences between samples of the same group at different points in time. We can conduct a paired t-test using the scipy function stats.ttest_rel().

In [None]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [None]:
#before.shape
#after.shape
stats.ttest_rel(before,after)

Pvalues < 0.05. Hence reject the null hypothesis. Ho says mu1 = mu2 & Ha m1 !=m2

Reference : https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

**Upvote if you like the kernel**

>Thank you