### Instructions

1. We will have another simple example on two sample t test (pooled- when the variances are equal). But this time this is a one sided t-test

In a packing plant, a machine packs cartons with jars. It is supposed that a new machine will pack faster on the average than the machine currently used. To test that hypothesis, the times it takes each machine to pack ten cartons are recorded. The results, in seconds, are shown in the tables in the file `files_for_lab/machine.txt`.
Assume that there is sufficient evidence to conduct the t test, does the data provide sufficient evidence to show if one machine is better than the other

In [None]:
# Libraries
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import scipy.stats as st
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [None]:
data = pd.read_csv("files_for_lab/machine.txt",sep="\t", encoding="utf-16")
data

Unnamed: 0,New machine,Old machine
0,42.1,42.7
1,41.0,43.6
2,41.3,43.8
3,41.8,43.3
4,42.4,42.5
5,42.8,43.5
6,43.2,43.1
7,42.3,41.7
8,41.8,44.0
9,42.7,44.1


In [22]:
list(data)

['New machine', '    Old machine']

In [34]:
#null_hypothesis = "The new machine packing speed is equal to the old machine packing speed"  # H₀: mean_new == mean_old
#alt_hypothesis = "The average packing speed of the new machine is lower than average packing speed of the old machine"  # H₁ mean_new < mean_old

df = data.copy()
new = df["New machine"]
old = df["    Old machine"]

# Significance level
alpha = 0.05

# Perform two-sample t-test for independent samples
t_stat, p_value = st.ttest_ind(new, old, equal_var=True, alternative="less")
print(f"Test Statistic (t): {t_stat:.2f}")
print(f"P-Value: {p_value:.4f}")
print()

# Decision-Making
if p_value > alpha:
    print("Fail to Reject the Null Hypothesis: The packing speed for the new machine and the old machine are not significantly different.")
else:
    print("Reject the Null Hypothesis: There is sufficient evidence to conclude that the average packing speed for the new machine is lower than the average packing speed for the old machine.")


Test Statistic (t): -3.40
P-Value: 0.0016

Reject the Null Hypothesis: There is sufficient evidence to conclude that the average packing speed for the new machine is lower than the average packing speed for the old machine.


2. An additional problem (not mandatory): In this case we can't assume that the population variances are equal. Hence in this case we cannot pool the variances.
   Independent random samples of 17 sophomores and 13 juniors attending a large university yield the following data on grade point averages. Data is provided in the file `files_for_lab/student_gpa.txt`.
   At the 5% significance level, do the data provide sufficient evidence to conclude that the mean GPAs of sophomores and juniors at the university differ?

   Test statistics can be calculated as: [link to the image - Test statistics calculation for Unpooled Variance Case](https://education-team-2020.s3-eu-west-1.amazonaws.com/data-analytics/7.04/7.04-unpooled_variances.png)

   Degrees of freedom is `(n1-1)+(n2-1)`.

In [29]:
data2 = pd.read_csv("files_for_lab/student_gpa.txt",sep="\t")
data2

Unnamed: 0,Sophomores,Juniors
0,3.04,2.56
1,1.71,2.77
2,3.3,2.7
3,2.88,3.0
4,2.11,2.98
5,2.6,3.47
6,2.92,3.26
7,3.6,3.2
8,2.28,3.19
9,2.82,2.65


In [30]:
list(data2)

['Sophomores', '  Juniors']

In [39]:
soph = data2["Sophomores"]
jun = data2["  Juniors"]

In [44]:
#null_hypothesis = "The mean GPAs of sophomores and juniors are equal"  # H₀: mean_soph == mean_jun
#alt_hypothesis = "The mean GPAs of sophomores and juniors are not equal"  # H₁ mean_soph != mean_jun

# Perform paired t-test
t_stat, p_value = st.ttest_rel(soph, jun, nan_policy="omit")
print(f"Test Statistic (t): {t_stat:.2f}")
print(f"P-Value: {p_value:.5f}")
print()

# Significance level
alpha = 0.05

# Decision-Making
if p_value > alpha:
    print("Fail to Reject the Null Hypothesis: No significant difference in the mean GPAs of sophomores and juniors.")
else:
    print("Reject the Null Hypothesis: There is a significant change in the mean GPAs of sophomores and juniors.")

Test Statistic (t): -1.18
P-Value: 0.26130

Fail to Reject the Null Hypothesis: No significant difference in the mean GPAs of sophomores and juniors.
