<a href="https://colab.research.google.com/github/piyush1856/hypothesis-testing-scipy/blob/main/t_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from scipy.stats import norm
from scipy.stats import ttest_1samp
from scipy.stats import ttest_ind # 2 sample t-test
from scipy.stats import ttest_rel # paired / dependent t-test

Traditionally it is known that a green gram cultivation yields 12.0 quintals per hectare on an average.

In order to increase crop yields, scientists have developed a new variety of green grams, that can supposedly produce more than the expected average yield of 12 quintals per hectare.

To test the same, this variety of green grams was tested on 10 randomly selected farmer's fields.

The yield (quintals/hectare) was recorded as: [14.3,12.6,13.7,10.9,13.7,12.0,11.4,12.0,12.6,13.1]

With a 5% significance level, can we conclude that the average yield of this variety of green grams is more than the expected yield (12 quintals/hectare)?

In [5]:
# H0 : C <= 12
# Ha : C > 12

data =  [14.3,12.6,13.7,10.9,13.7,12.0,11.4,12.0,12.6,13.1]
population_mean = 12

t_stat, pvalue = ttest_1samp(data, population_mean, alternative = 'greater')
print(t_stat, pvalue)

alpha = 0.05

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

1.8356436949305335 0.04979938002326665
Reject H0


Samples of Body fat percentages of few gym going men and women are recorded.

men = [13.3, 6.0, 20.0, 8.0, 14.0, 19.0, 18.0, 25.0, 16.0, 24.0, 15.0, 1.0, 15.0]

women = [22.0, 16.0, 21.7, 21.0, 30.0, 26.0, 12.0, 23.2, 28.0, 23.0]

Perform an appropriate test to check if the mean body fat percentage of men and women is statistically different.

Assume the significance level to be 5%.

In [7]:
# H0 : M = W
# Ha : M != W

men = [13.3, 6.0, 20.0, 8.0, 14.0, 19.0, 18.0, 25.0, 16.0, 24.0, 15.0, 1.0, 15.0]
women = [22.0, 16.0, 21.7, 21.0, 30.0, 26.0, 12.0, 23.2, 28.0, 23.0]

t_stat, pvalue = ttest_ind(men, women)
print(t_stat, pvalue)

alpha = 0.05

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

-2.799960442832919 0.010730607904197957
Reject H0


The quality assurance department claims that on average the non-fat milk contains more than 190 mg of Calcium per 500 ml packet.

To check this claim 45 packets of milk are collected and the content of calcium is recorded.

Perform an appropriate test to check the claim with a 90% confidence level.

data = [193, 321, 222, 158, 176, 149, 154, 223, 233, 177, 280, 244, 138, 210, 167, 129, 254, 167, 194, 191, 128, 191, 144, 184, 330, 216, 212, 142, 216, 197, 231, 133, 205, 192, 195, 243, 224, 137, 234, 171, 176, 249, 222, 234, 191]

In [9]:
# H0 : NFM <= 190
# Ha : NFM > 190

population_mean = 190
data = [193, 321, 222, 158, 176, 149, 154, 223, 233, 177, 280, 244, 138, 210, 167, 129, 254, 167, 194, 191, 128, 191, 144, 184, 330, 216, 212, 142, 216, 197, 231, 133, 205, 192, 195, 243, 224, 137, 234, 171, 176, 249, 222, 234, 191]

t_stat, pvalue = ttest_1samp(data, population_mean, alternative = 'greater')
print(t_stat, pvalue)

alpha = 0.1

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

1.3689029903414232 0.08898891556150607
Reject H0


There are 8 females and 12 males in a coaching class.

After a practice test, the coach wants to know whether the average score of females is greater than the average score of males.

Given data describes the scores of females and males in his class.

female_scores=[25,30,45,49,47,35,32,42]

male_scores=[45,47,25,22,29,32,27,28,40,49,50,33]

Use an appropriate test to check whether the assumption of the coach is significant or not, at a 2% significance level?

In [13]:
# H0 : M = F
# Ha : F > M
female_scores = [25,30,45,49,47,35,32,42]

male_scores = [45,47,25,22,29,32,27,28,40,49,50,33]

t_stat, pvalue = ttest_ind(female_scores, male_scores, alternative = 'greater')
print(t_stat, pvalue)

alpha = 0.02

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

0.5795450171026676 0.2847023809445894
Fail to Reject H0


An experiment was performed to compare the effectiveness of Ammonium Chloride and urea on the grain yield (in quintal per hectare) and the results are given in the arrays below:

Ammonium_chloride = [13.4, 10.9, 11.2, 11.8, 14, 15.3, 14.2, 12.6, 17, 16.2, 16.5, 15.7]

Urea = [12, 11.7, 10.7, 11.2, 14.8, 14.4, 13.9, 13.7, 16.9, 16, 15.6, 16]

Conduct an appropriate test to compare the same with a 95% confidence level and choose the appropriate option below.

In [15]:
# H0 : A = U
# Ha : A != U

Ammonium_chloride = [13.4, 10.9, 11.2, 11.8, 14, 15.3, 14.2, 12.6, 17, 16.2, 16.5, 15.7]
Urea = [12, 11.7, 10.7, 11.2, 14.8, 14.4, 13.9, 13.7, 16.9, 16, 15.6, 16]

t_stat, pvalue = ttest_ind(Ammonium_chloride, Urea)
print(t_stat, pvalue)

alpha = 0.05

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

0.1846496543760765 0.8551954147800473
Fail to Reject H0


The Zumba trainer claims to the customers, that their new dance routine helps to reduce more weight.

Weight of 8 people were recorded before and after following the new Zumba training for a month:

wt_before = [85, 74, 63.5, 69.4, 71.6, 65,90,78]

wt_after = [82, 71, 64, 65.2, 67.8, 64.7,95,77]

Test the trainer's claim with 90% confidence. Further, what would be the pvalue?

In [19]:
# H0 : before = after
# Ha : before > after

wt_before = [85, 74, 63.5, 69.4, 71.6, 65,90,78]

wt_after = [82, 71, 64, 65.2, 67.8, 64.7,95,77]

statistic, pvalue = ttest_rel(wt_before,wt_after, alternative = 'greater')

print("Test statistic:",statistic)
print("pvalue:",pvalue)

alpha = 0.1

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

Test statistic: 1.1421853793555032
pvalue: 0.14546808501326386
Fail to Reject H0


You are appointed as a Data Analyst for a training program deployed by the Government of India.

The participants’ skills were tested before and after the training using some metrics on a scale of 10.

before = [2.45, 0.69, 1.80, 2.80, 0.07, 1.67, 2.93, 0.47, 1.45, 1.34]   

after = [7.71, 2.17, 5.65, 8.79, 0.23, 5.23, 9.19, 1.49, 4.56, 4.20]

Conduct an appropriate test to assess a statistically significant increase in the average skill score after the training program, and then answer the below questions accordingly.

Note: Perform the test at alpha = 5%.

In [21]:
# H0 : before = after
# Ha : before < after

before = [2.45, 0.69, 1.80, 2.80, 0.07, 1.67, 2.93, 0.47, 1.45, 1.34]
after = [7.71, 2.17, 5.65, 8.79, 0.23, 5.23, 9.19, 1.49, 4.56, 4.20]

statistic, pvalue = ttest_rel(before,after, alternative = 'less')

print("Test statistic:",statistic)
print("pvalue:",pvalue)

alpha = 0.95

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

Test statistic: -5.111096450191605
pvalue: 0.00031778119819482275
Reject H0


The average British man is 175.3 cm tall. A survey recorded the heights of 10 UK men and we want to know whether the mean of the sample is different from the population mean.

survey_height = [177.3, 182.7, 169.6, 176.3, 180.3, 179.4, 178.5, 177.2, 181.8, 176.5]

Perform an appropriate test and choose the correct option below, that we can conclude with a 5% significance.

In [23]:
# H0 : M = 175.3
# Ha : M != 175.3

population_mean = 175.3
survey_height = [177.3, 182.7, 169.6, 176.3, 180.3, 179.4, 178.5, 177.2, 181.8, 176.5]

t_stat, pvalue = ttest_1samp(survey_height, population_mean)
print(t_stat, pvalue)

alpha = 0.05

if pvalue < alpha:
  print('Reject H0')

2.295568968083183 0.04734137339747034
Reject H0


Samples of IQ scores are collected from two competing schools, as follows:

school_1 = [115, 111, 112, 101, 95, 98, 100, 90, 89, 108]

school_2 = [107, 103, 91, 99, 104, 98, 117, 113, 92, 96, 108, 115, 116, 88]

Perform an appropriate test with a 5% significance level to check if there is any statistically significant difference in the mean IQ's of these schools.

In [25]:
school_1 = [115, 111, 112, 101, 95, 98, 100, 90, 89, 108]
school_2 = [107, 103, 91, 99, 104, 98, 117, 113, 92, 96, 108, 115, 116, 88]

t_stat, pvalue = ttest_ind(school_1, school_2)
print(t_stat, pvalue)

alpha = 0.05

if pvalue < alpha:
  print('Reject H0')
else:
  print ('Fail to Reject H0')

-0.36928503344341035 0.7154458095186707
Fail to Reject H0
