# **In-Class Lab: T-Test (OneSample & Two Sample)**

In [209]:
import math
import numpy as np
import scipy.stats as stats

## Exercise 1: One-Sample T-Test - Mean Weight of a Population
### Null Hypothesis H0:μ = 70
### Alternative Hypothesis Ha:μ != 70
### Significance level: 0.05

In [534]:
sample = [72, 68, 75, 71, 69, 70, 73, 68]
sample_size = len(sample)
sample_mean = np.mean(sample)
sample_std = np.std(sample)
pop_mean = 70

# Calculate the t-statistics
t = (sample_mean - pop_mean) / (sample_std / math.sqrt(sample_size))

# Degrees of Freedom
df = sample_size - 1

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.3932")

t: 0.909717652294684
df: 7
p_value: 0.3932


In [536]:
# Calculate using package
res = stats.ttest_1samp(sample, pop_mean, alternative = "two-sided")
print(f"t: {res.statistic}")
print(f"df: {res.df}")
print(f"p_value: {res.pvalue}")

t: 0.8509629433967633
df: 7
p_value: 0.42294092668021854


## Conclusion
Since the p-value (0.3932) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to suggest that the average weight differs from the claimed 70kg.

## Exercise 2: One-Sample T-Test - Average Height of Students
### Null Hypothesis H0:μ = 165
### Alternative Hypothesis Ha:μ != 165
### Significance level: 0.05

In [549]:
sample = [164, 162, 168, 167, 165, 166, 160, 159, 170, 163]
sample_size = len(sample)
sample_mean = np.mean(sample)
sample_std = np.std(sample)
pop_mean = 165

# Calculate the t-statistics
t = (sample_mean - pop_mean) / (sample_std / math.sqrt(sample_size))

# Degrees of Freedom
df = sample_size - 1

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.5819")

t: -0.5710402407201555
df: 9
p_value: 0.5819


In [551]:
# Calculate using package
res = stats.ttest_1samp(sample, pop_mean, alternative = "two-sided")
print(f"t: {res.statistic}")
print(f"df: {res.df}")
print(f"p_value: {res.pvalue}")

t: -0.5417363388859563
df: 9
p_value: 0.6011521875426783


## Conclusion
Since the p-value (0.5819) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to suggest that the average height of students differs from the claimed 165 cm.

## Exercise 3: One-Sample T-Test - Customer Satisfaction Scores
### Null Hypothesis H0:μ = 4.5
### Alternative Hypothesis Ha:μ != 4.5
### Significance level: 0.05

In [555]:
sample = [4.2, 4.4, 4.5, 4.7, 4.5, 4.6, 4.4, 4.3, 4.5, 4.6, 4.2, 4.5]
sample_size = len(sample)
sample_mean = np.mean(sample)
sample_std = np.std(sample)
pop_mean = 4.5

# Calculate the t-statistics
t = (sample_mean - pop_mean) / (sample_std / math.sqrt(sample_size))

# Degrees of Freedom
df = sample_size - 1

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.2727")

t: -1.154700538379248
df: 11
p_value: 0.2727


In [557]:
# Calculate using package
res = stats.ttest_1samp(sample, pop_mean, alternative = "two-sided")
print(f"t: {res.statistic}")
print(f"df: {res.df}")
print(f"p_value: {res.pvalue}")

t: -1.1055415967851299
df: 11
p_value: 0.2925184553957747


## Conclusion
Since the p-value (0.2727) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to suggest that the average customer satisfaction score differs from the claimed 4.5.

## Exercise 4: Two-Sample T-Test - Exam Scores of Two Classes
### Null Hypothesis H0:μA = μB
### Alternative Hypothesis Ha:μA != μB
### Significance level: 0.05

In [566]:
class_a = [85, 78, 90, 88, 84, 91, 89]
class_b = [82, 80, 88, 86, 85, 79, 87]

# Class A
n1 = len(class_a)
x1 = np.mean(class_a)
s1 = np.std(class_a, ddof=1)

# Class B
n2 = len(class_b)
x2 = np.mean(class_b)
s2 = np.std(class_b, ddof=1)

# Calculate the t-statistics
t = (x1 - x2) / (math.sqrt((s1 ** 2 / n1) + (s2 ** 2 / n2)))

# Degrees of Freedom
df = ((s1 ** 2 / n1) + (s2 ** 2 / n2)) ** 2 / (((s1 ** 2 / n1) ** 2 / (n1 - 1)) + ((s2 ** 2 / n2) ** 2 / (n2 - 1)))

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.2596")

t: 1.1886087235395912
df: 11.354805693267235
p_value: 0.2596


In [568]:
# Conduct t-test using package
r = stats.ttest_ind(Class_A, Class_B, equal_var=False, alternative="two-sided")
print(f"t: {r.statistic}")
print(f"df: {r.df}")
print(f"p_value: {r.pvalue}")

t: 1.1886087235395912
df: 11.35480569326723
p_value: 0.2588586458733691


## Conclusion
Since the p-value (0.2588) is greater than 0.05, we fail to reject the null hypothesis. There is no significant difference between the exam scores of class A and class B.

## Exercise 5: Two-Sample T-Test - Sales Performance Before and After Training
### Null Hypothesis H0:μA = μB
### Alternative Hypothesis Ha:μA != μB
### Significance level: 0.05

In [570]:
before = [1200, 1300, 1250, 1400, 1350, 1500]
after = [1450, 1380, 1550, 1600, 1500, 1580]

# Before
n1 = len(before)
x1 = np.mean(before)
s1 = np.std(before, ddof=1)

# After
n2 = len(after)
x2 = np.mean(after)
s2 = np.std(after, ddof=1)

# Calculate the t-statistics
t = (x1 - x2) / (math.sqrt((s1 ** 2 / n1) + (s2 ** 2 / n2)))

# Degrees of Freedom
df = ((s1 ** 2 / n1) + (s2 ** 2 / n2)) ** 2 / (((s1 ** 2 / n1) ** 2 / (n1 - 1)) + ((s2 ** 2 / n2) ** 2 / (n2 - 1)))

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.0115")

t: -3.163967663492102
df: 9.423554373813323
p_value: 0.0115


In [572]:
# Conduct t-test using package
r = stats.ttest_ind(before, after, equal_var=False, alternative="two-sided")
print(f"t: {r.statistic}")
print(f"df: {r.df}")
print(f"p_value: {r.pvalue}")

t: -3.163967663492102
df: 9.423554373813323
p_value: 0.010844155393129377


## Conclusion
Since the p-value (0.01) is less than 0.05, we reject the null hypothesis. There is significant difference in sales performance before and after training.

## Exercise 6: Two-Sample T-Test - Blood Pressure Medication
### Null Hypothesis H0:μA = μB
### Alternative Hypothesis Ha:μA != μB
### Significance level: 0.05

In [574]:
medication_group = [10, 12, 9, 14, 11, 13]
placebo_group = [3, 5, 2, 4, 6, 5]

# Medication Group
n1 = len(medication_group)
x1 = np.mean(medication_group)
s1 = np.std(medication_group, ddof=1)

# Placebo_group
n2 = len(placebo_group)
x2 = np.mean(placebo_group)
s2 = np.std(placebo_group, ddof=1)

# Calculate the t-statistics
t = (x1 - x2) / (math.sqrt((s1 ** 2 / n1) + (s2 ** 2 / n2)))

# Degrees of Freedom
df = ((s1 ** 2 / n1) + (s2 ** 2 / n2)) ** 2 / (((s1 ** 2 / n1) ** 2 / (n1 - 1)) + ((s2 ** 2 / n2) ** 2 / (n2 - 1)))

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.0001")

t: 7.545937746270388
df: 9.475409836065575
p_value: 0.0001


In [576]:
# Conduct t-test using package
r = stats.ttest_ind(medication_group, placebo_group, equal_var=False, alternative="two-sided")
print(f"t: {r.statistic}")
print(f"df: {r.df}")
print(f"p_value: {r.pvalue}")

t: 7.545937746270389
df: 9.475409836065577
p_value: 2.6480982690014405e-05


## Conclusion
Since the p-value (2.648e-05) is less than 0.05, we reject the null hypothesis. There is significant effect on blood pressure in two groups.

## Exercise 7: One-Sample T-Test - Test if a Coin is Fair
### Null Hypothesis H0:μ = 0.5
### Alternative Hypothesis Ha:μ != 0.5
### Significance level: 0.05

In [276]:
sample_size = 100
sample_mean = 0.58
p = 0.5
sample_std = math.sqrt(sample_size * p * (1 - p))
pop_mean = 0.5

# Calculate the t-statistics
t = (sample_mean - pop_mean) / (sample_std / math.sqrt(sample_size))

# Degrees of Freedom
df = sample_size - 1

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.8732")

t: 0.15999999999999992
df: 99
p_value: 0.2727


In [594]:
from statsmodels.stats.proportion import proportions_ztest

head_counts = 58
total_flips = 100
proportion_null = 0.5

z_stat, p_value = proportions_ztest(head_counts, total_flips, proportion_null)
print(f"z-value: {z_stat}")
print(f"p-value: {p_value}")

z-value: 1.6208817969462146
p-value: 0.10504299146532581


## Conclusion
Since the p-value (0.105) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to suggest that the coin is biased.

## Exercise 8: Two-Sample T-Test - Comparison of Test Scores
### Null Hypothesis H0:μA = μB
### Alternative Hypothesis Ha:μA != μB
### Significance level: 0.05

In [578]:
music = [80, 85, 78, 90, 87, 76]
silence = [88, 82, 84, 89, 91, 85]

# Music
n1 = len(music)
x1 = np.mean(music)
s1 = np.std(music, ddof=1)

# Silence
n2 = len(silence)
x2 = np.mean(silence)
s2 = np.std(silence, ddof=1)

# Calculate the t-statistics
t = (x1 - x2) / (math.sqrt((s1 ** 2 / n1) + (s2 ** 2 / n2)))

# Degrees of Freedom
df = ((s1 ** 2 / n1) + (s2 ** 2 / n2)) ** 2 / (((s1 ** 2 / n1) ** 2 / (n1 - 1)) + ((s2 ** 2 / n2) ** 2 / (n2 - 1)))

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.1843")

t: -1.4529052821772128
df: 8.320229488632087
p_value: 0.1843


In [580]:
# Conduct t-test using package
r = stats.ttest_ind(music, silence, equal_var=False, alternative="two-sided")
print(f"t: {r.statistic}")
print(f"df: {r.df}")
print(f"p_value: {r.pvalue}")

t: -1.4529052821772128
df: 8.320229488632087
p_value: 0.1828990115470744


## Conclusion
Since the p-value (0.184) is greater than 0.05, we fail to reject the null hypothesis. There is no significant difference between the test scores of 2 groups. 

## Exercise 9: One-Sample T-Test - Average Lifespan of a Product
### Null Hypothesis H0:μ = 5
### Alternative Hypothesis Ha:μ != 5
### Significance level: 0.05

In [560]:
sample = [4.8, 5.1, 4.9, 5.3, 4.7, 5.2, 5.0, 4.9]
sample_size = len(sample)
sample_mean = np.mean(sample)
sample_std = np.std(sample)
pop_mean = 5

# Calculate the t-statistics
t = (sample_mean - pop_mean) / (sample_std / math.sqrt(sample_size))

# Degrees of Freedom
df = sample_size - 1

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.8576")

t: -0.18609684207969693
df: 7
p_value: 0.8576


In [562]:
# Calculate using package
res = stats.ttest_1samp(sample, pop_mean, alternative = "two-sided")
print(f"t: {res.statistic}")
print(f"df: {res.df}")
print(f"p_value: {res.pvalue}")

t: -0.17407765595570038
df: 7
p_value: 0.8667318497550061


## Conclusion
Since the p-value (0.8576) is greater than 0.05, we fail to reject the null hypothesis. There is no significant evidence to suggest manufacturer's claim that the average lifespan of their product is 5 years.

## Exercise 10: Two-Sample T-Test - Average Income of Two Cities
### Null Hypothesis H0:μA = μB
### Alternative Hypothesis Ha:μA != μB
### Significance level: 0.05

In [582]:
city_a = [45000, 48000, 47000, 50000, 49000, 46000]
city_b = [43000, 42000, 44000, 41000, 45000, 43000]

# CityA
n1 = len(city_a)
x1 = np.mean(city_a)
s1 = np.std(city_a, ddof=1)

# CityB
n2 = len(city_b)
x2 = np.mean(city_b)
s2 = np.std(city_b, ddof=1)

# Calculate the t-statistics
t = (x1 - x2) / (math.sqrt((s1 ** 2 / n1) + (s2 ** 2 / n2)))

# Degrees of Freedom
df = ((s1 ** 2 / n1) + (s2 ** 2 / n2)) ** 2 / (((s1 ** 2 / n1) ** 2 / (n1 - 1)) + ((s2 ** 2 / n2) ** 2 / (n2 - 1)))

print(f"t: {t}")
print(f"df: {df}")
print("p_value: 0.0011")

t: 4.700096710803842
df: 9.307692307692308
p_value: 0.0011


In [584]:
# Conduct t-test using package
r = stats.ttest_ind(city_a, city_b, equal_var=False, alternative="two-sided")
print(f"t: {r.statistic}")
print(f"df: {r.df}")
print(f"p_value: {r.pvalue}")

t: 4.700096710803842
df: 9.307692307692305
p_value: 0.0010223286792561605


## Conclusion
Since the p-value (0.0011) is less than 0.05, we reject the null hypothesis. There is significant difference in income between two cities.