In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import scipy.stats as stats
%matplotlib inline

we use a two-sample t-test for equality of means with unknown standard deviations. This method is useful when comparing the means of two independent samples where the population variances are unknown.

**Assumptions for 2 sample t-test**
* the data should be continious
* Each sample should be approximately normally distributed.
* The variances of the two samples should be roughly equal.
* The samples should be randomly sampled.

**Example 1:**  A retail chain wants to know if there’s a significant difference in the average monthly sales between two of its branches. Data has been collected for the last 12 months for each branch, and the sales figures are as follows:
* Branch A Sales (in thousands)= [82, 91, 88, 90, 86, 89, 92, 87, 85, 84, 90, 93]
* Branch B Sales (in thousands)= [75, 78, 74, 80, 76, 77, 79, 74, 78, 76, 75, 77]

Testing the null hypothesis 

>$H_0:\mu_A=\mu_B$

against the null hyppothesis

>$H_0:\mu_A\neq\mu_B$


In [2]:
#checking for the assumptions
# Data
branch_a = np.array([82, 91, 88, 90, 86, 89, 92, 87, 85, 84, 90, 93])
branch_b = np.array([75, 78, 74, 80, 76, 77, 79, 74, 78, 76, 75, 77])

#Normality test
shapiro_a = stats.shapiro(branch_a)
shapiro_b = stats.shapiro(branch_b)

#Variance equality test
levene_test = stats.levene(branch_a, branch_b)
print(shapiro_a)
print(shapiro_b)
print(levene_test)

ShapiroResult(statistic=0.9744110107421875, pvalue=0.9510981440544128)
ShapiroResult(statistic=0.9548278450965881, pvalue=0.708243191242218)
LeveneResult(statistic=3.8091872791519434, pvalue=0.06381770275720347)


The assumption of normality is met because the p-value is greater than 0.05 for both groups. Similarly, the assumption of equal variances is also met, as the p-value is above 0.05 for both groups. So it satisfies all the assumptions


In [3]:
#Testing the hypothesis using p value method
t_stat,p_value=stats.ttest_ind(branch_a,branch_b,equal_var=True)
p_value

7.551307557431908e-10

since the p value is less than alpha we reject the null hypothesis

In [4]:
#testing the hypothesis using critical value method
df=(len(branch_a)+len(branch_b))-2
alpha=0.05
#critical value
critical_value=stats.t.ppf(1-alpha/2,df)
critical_value

2.0738730679040147

In [5]:
t_stat

10.260036145576816

since the test stat is greater than the critical value we reject the null hypothesis

**Example 2:**  An instructor wants to test if two different teaching methods produce different mean scores on a final exam. She gathers scores from 15 students taught using Method A and 15 students using Method B.
* Method A Scores: [78, 82, 88, 76, 81, 79, 83, 84, 85, 80, 82, 87, 86, 79, 81]
* Method B Scores: [71, 72, 75, 73, 74, 76, 77, 72, 73, 74, 75, 76, 74, 72, 73]

Testing the null hypothesis

>$H_0:\mu_A=\mu_B$

against the alternate hypothesis

>$H_1:\mu_A\neq\mu_B$

In [6]:
#checking for assumptions

method_a= [78, 82, 88, 76, 81, 79, 83, 84, 85, 80, 82, 87, 86, 79, 81]
method_b= [71, 72, 75, 73, 74, 76, 77, 72, 73, 74, 75, 76, 74, 72, 73]

#normality test
norm_a=stats.shapiro(method_a)
norm_b=stats.shapiro(method_b)

#equal variance test
var=stats.levene(method_a,method_b)

print(norm_a)
print(norm_b)
print(var)

ShapiroResult(statistic=0.9798839092254639, pvalue=0.968704879283905)
ShapiroResult(statistic=0.958675742149353, pvalue=0.6693927645683289)
LeveneResult(statistic=5.447470817120624, pvalue=0.026999494258396706)


The assumption of normality is met because the p-value is greater than 0.05 for both groups. Similarly, the assumption of equal variances is also met, as the p-value is above 0.05 for both groups. So it satisfies all the assumptions

In [7]:
#calculating hypothesis using p value method
t_stat,p_value=stats.ttest_ind(method_a,method_b,equal_var=True)
p_value

5.205407585307876e-09

since p value is less than alpha we reject the null hypothesis

In [8]:
#calculating hypothesis using critical value method
df=(len(method_a)+len(method_b))-2
alpha=0.05
#critical value
critical_value=stats.t.ppf(1-alpha/2,df)
critical_value

2.048407141795244

In [9]:
t_stat

8.279819684796212

since the test stat is greater than the critical value we reject the null hypothesis

**Example 3:** wo retail stores, Store A and Store B, want to know if there’s a significant difference in average weekly sales over a period of 10 weeks.

Testing the null nypothesis 
>$H_0:\mu_A=\mu_B$

against the alternate hypothesis

>$H_1:\mu_A\neq\mu_B$


In [10]:
#checking the assumptions
# Sample data
store_A =[230, 250, 245, 240, 255, 260, 250, 240, 245, 255]
store_B=[225, 235, 240, 238, 250, 248, 235, 240, 230, 245]

#checking the normality

norm_a=stats.shapiro(store_A)
norm_b=stats.shapiro(store_B)
#checking for the equal variances
var=stats.levene(store_A,store_B)
print(norm_a)
print(norm_b)
print(var)

ShapiroResult(statistic=0.9650188088417053, pvalue=0.8412168025970459)
ShapiroResult(statistic=0.9731948375701904, pvalue=0.9187834858894348)
LeveneResult(statistic=0.21818181818181817, pvalue=0.6460333051897074)


The assumption of normality is met because the p-value is greater than 0.05 for both groups. Similarly, the assumption of equal variances is also met, as the p-value is above 0.05 for both groups. So it satisfies all the assumptions

In [11]:
#checking for the hypothesis using p value approach
t_stat,p_value=stats.ttest_ind(store_A,store_B,equal_var=True)
p_value

0.03746345484653633

since p value is less than alpha we reject the null hypothesis

In [12]:
#calculating the hypothesis using critical value
alpha=0.05
df=(len(store_A)+len(store_B))-2
#critical_value
critical_value=stats.t.ppf(1-alpha/2,df)
critical_value

2.10092204024096

In [13]:
t_stat

2.2464211843101096

since test stat is greater than critical value we reject the null hypothesis

**Example 4:** Two classes (X and Y) want to know if Class X has higher average test scores than Class Y.

testing the null hypothesis 

>$H_0:\mu_A=\mu_B$

against the alternate hypothesis
>$H_1:\mu_A>\mu_B$

In [14]:
#checking for the assumptions
class_X = [75, 80, 82, 78, 76, 85, 79, 77, 84, 83]
class_Y = [72, 74, 78, 73, 71, 76, 74, 72, 75, 77]

#normality test
norm_a=stats.shapiro(class_X)
norm_b=stats.shapiro(class_Y)
#variance test
var=stats.levene(class_X,class_Y)
print(norm_a)
print(norm_b)
print(var)

ShapiroResult(statistic=0.9527984857559204, pvalue=0.7016611695289612)
ShapiroResult(statistic=0.9611218571662903, pvalue=0.798586368560791)
LeveneResult(statistic=2.5928571428571425, pvalue=0.12474330263945262)


The assumption of normality is met because the p-value is greater than 0.05 for both groups. Similarly, the assumption of equal variances is also met, as the p-value is above 0.05 for both groups. So it satisfies all the assumptions

In [15]:
#checking the hypothesis using p value approach
t_stat,p_value=stats.ttest_ind(class_X,class_Y,equal_var=True,alternative='greater')
p_value

0.00020504480685939883

since p value is less than alpha we reject the null hypothesis

In [16]:
#calculating hypothesis using critical value method
df=(len(class_X)+len(class_Y))-2
alpha=0.05
critical_value=stats.t.ppf(1-alpha,df)
critical_value

1.7340636066175354

In [17]:
t_stat

4.322539189865529

since test stat is greater than the critical value we reject the null hypothesis

**Example 5:** Two factories, Factory A and Factory B, manufacture the same product. Management wants to know if Factory A produces higher-quality items, on average, than Factory B.

Testing the null hypothesis

>$H_0:\mu_A=\mu_B$

against the alternate hypothesis

>$H_1:\mu_A>\mu_B$

In [18]:
#checking for assumptions
factory_A = [85, 87, 89, 88, 90, 92, 88, 87, 91, 90]
factory_B= [80, 82, 81, 83, 85, 84, 83, 82, 84, 85]

#normality test
norm_a=stats.shapiro(factory_A)
norm_b=stats.shapiro(factory_B)
#variance test
var=stats.levene(factory_A,factory_B)
print(norm_a)
print(norm_b)
print(var)


ShapiroResult(statistic=0.9774341583251953, pvalue=0.9499722123146057)
ShapiroResult(statistic=0.9480838179588318, pvalue=0.6458869576454163)
LeveneResult(statistic=0.7309644670050758, pvalue=0.40380736669006767)


The assumption of normality is met because the p-value is greater than 0.05 for both groups. Similarly, the assumption of equal variances is also met, as the p-value is above 0.05 for both groups. So it satisfies all the assumptions

In [19]:
#checking the hypothesis using p value approach
t_stat,p_value=stats.ttest_ind(factory_A,factory_B,equal_var=True,alternative='greater')
p_value

1.0884380624297208e-06

since the p value is less than the alpha we reject the null hypothesis

In [20]:
#calculating the hypothesis using critical value approach
alpha=0.05
df=(len(factory_A)+len(factory_B))-2
#critical_value
critical_value=stats.t.ppf(1-alpha,df)
critical_value

1.7340636066175354

In [21]:
t_stat

6.824841502808801

since test stat is greater than the critical value we reject the null hypothesis

**Example 6:** A company wants to determine if customer satisfaction scores in Region X are significantly lower than in Region Y.



Testing the null hypothesis

>$H_0:\mu_A=\mu_B$

against the alternate hypothesis

>$H_1:\mu_A<\mu_B$

In [22]:
#checking for the assumptions
region_X = [70, 72, 68, 69, 71, 67, 70, 69, 72, 68]
region_Y = [75, 76, 74, 77, 75, 78, 76, 77, 74, 76]
#normality test
norm_X = stats.shapiro(region_X)
norm_Y = stats.shapiro(region_Y)
#variance test
p_var = stats.levene(region_X, region_Y)
print(norm_X)
print(norm_Y)
print(p_var)

ShapiroResult(statistic=0.9433293342590332, pvalue=0.5906188488006592)
ShapiroResult(statistic=0.941914439201355, pvalue=0.5745060443878174)
LeveneResult(statistic=1.1162790697674414, pvalue=0.3046957902099586)


The assumption of normality is met because the p-value is greater than 0.05 for both groups. Similarly, the assumption of equal variances is also met, as the p-value is above 0.05 for both groups. So it satisfies all the assumptions

In [23]:
#checking the hypothesis using p value approach
t_stat, p_value = stats.ttest_ind(region_X, region_Y, equal_var=True, alternative='less')
p_value

1.94307207199171e-08

since the p value is less than the alpha we reject the null hypothesis

In [24]:
#checking the hypothesis using critical value approach
df = len(region_X) + len(region_Y) - 2
critical = stats.t.ppf(alpha, df)
critical

-1.734063606617536

In [25]:
t_stat

-9.075870678421362

since absolute(test stat) is greater than the critical value we reject the null hypothesis

**Example 7:** A factory wants to determine if the average production time for Team A is significantly lower than for Team B.



Testing the null hypothesis 

>$H_0:\mu_A=\mu_B$

against the alternate hypothesis

>$H_1:\mu_A<\mu_B$

In [26]:
#checking for the assumptions
# Production times for Team A and Team B
team_A = [40, 42, 39, 41, 40, 38, 39, 41, 40, 39]
team_B = [45, 47, 46, 48, 45, 46, 47, 48, 46, 45]
#normality test
norm_A = stats.shapiro(team_A)
norm_B = stats.shapiro(team_B)
#variance test
p_var = stats.levene(team_A, team_B)
print(norm_A)
print(norm_B)
print(p_var)

ShapiroResult(statistic=0.9519402980804443, pvalue=0.691487729549408)
ShapiroResult(statistic=0.8780678510665894, pvalue=0.12398403882980347)
LeveneResult(statistic=0.0, pvalue=1.0)


The assumption of normality is met because the p-value is greater than 0.05 for both groups. Similarly, the assumption of equal variances is also met, as the p-value is above 0.05 for both groups. So it satisfies all the assumptions

In [27]:
#calculating the hypothesis using p value approach
t_stat, p_value = stats.ttest_ind(team_A, team_B, equal_var=True, alternative='less')
p_value

2.084247962015636e-10

since the p value is less than the alpha we reject the null hypothesis

In [28]:
#calculating the hypothesis using critical value approach
alpha=0.05
df = len(team_A) + len(team_B) - 2
critical = stats.t.ppf(alpha, df)
critical

-1.734063606617536

In [29]:
t_stat

-12.143146215046576

since absolute(test stat) is greater than the critical value we reject the null hypothesis