# Hypothesis Testing (T-test)

**Hypothesis testing** is a statistical method used to determine whether there is enough evidence in a sample to support or reject a claim about the population. A **T-test** is used when the **population standard deviation is unknown** and the **sample size is small** (typically $n < 30$). However, it can also be applied for larger samples if the population standard deviation is not available.

---

## Purpose of T-test

The T-test helps determine whether the difference between the **sample mean** and the **population mean** is statistically significant or just due to chance. It is commonly used for:  
- Testing whether the sample mean differs from the population mean.  
- Comparing two independent sample means.  
- Testing paired samples (dependent samples).

---

## Steps in Hypothesis Testing (T-test)

1. **Formulate Hypotheses**  
   - **Null Hypothesis** $H_0$: The average weight of a bag of chips is **greater than or equal to 150 grams** (no significant difference).  
     $$H_0: \mu \geq 150$$  
   - **Alternative Hypothesis** $H_1$: The average weight of a bag of chips is **less than 150 grams**.  
     $$H_1: \mu < 150$$  

2. **Set the Significance Level**  
   The significance level, denoted by $\alpha$, represents the probability of rejecting the null hypothesis when it is true. Typically, $\alpha = 0.05$.  

3. **Calculate the T-statistic**  
   The T-statistic measures how far the sample mean is from the population mean in terms of standard errors:  
   $$T = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}$$  
   Where:  
   - $\bar{X} = 148$ is the sample mean  
   - $\mu_0 = 150$ is the population mean under the null hypothesis  
   - $s = 5$ is the sample standard deviation  
   - $n = 25$ is the sample size  

   Substituting the values:  
   $$T = \frac{148 - 150}{5 / \sqrt{25}} = \frac{-2}{5 / 5} = \frac{-2}{1} = -2.0$$  

4. **Determine the Critical Value or p-value**  
   - For a **one-tailed T-test** at $\alpha = 0.05$ with $n - 1 = 24$ degrees of freedom, the critical value from the T-table is approximately $t_{0.05, 24} = -1.711$.  

5. **Decision**  
   - Since $T = -2.0 < -1.711$, we reject $H_0$.  

6. **Conclusion**  
   - The sample provides sufficient evidence to reject the manufacturer’s claim. The average weight of the bags of potato chips is likely **less than 150 grams**.

---

## Example Question:

**A manufacturer claims that the average weight of a bag of potato chips is 150 grams. A sample of 25 bags is taken, and the average weight is found to be 148 grams, with a standard deviation of 5 grams. Test the manufacturer's claim using a one-tailed T-test with a significance level of 0.05.**


In [48]:
import scipy.stats as st
import numpy as np

In [49]:
# Given 
sample_mean = 148
pop_mean = 150
sample_size = 25
std = 5
alpha = 0.05

In [50]:
df = sample_size-1

In [51]:
t_table = st.t.ppf(alpha, df)
t_table

-1.7108820799094282

In [52]:
t_cal = (sample_mean - pop_mean)/(std/np.sqrt(sample_size))
t_cal

-2.0

In [53]:
if(t_table<t_cal):
    print('Ha is right')  # H0 is right means claim of the company is wrong.
else:
    print('H0 is right')

H0 is right


## Example 2:

**A company wants to test whether there is a difference in productivity between two teams. They randomly select 20 employees from each team and record their productivity scores. The mean productivity score for Team A is 80 with a standard deviation of 5, while the mean productivity score for Team B is 75 with a standard deviation of 6. Test at a 5% level of significance whether there is a difference in productivity between the two teams.**

---

### Step 1: Set the Hypotheses
- **Null Hypothesis $H_0$**: There is no difference in the mean productivity scores between Team A and Team B.  
  $H_0: \mu_A = \mu_B$

- **Alternative Hypothesis $H_1$**: There is a difference in the mean productivity scores between Team A and Team B.  
  $H_1: \mu_A \neq \mu_B$

This will be a **two-tailed test** since we are testing for any difference (not direction-specific).

---

### Step 2: Calculate the Test Statistic
The formula for the **t-statistic** for two independent samples is:

$$
t = \frac{\bar{X}_A - \bar{X}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}}
$$

Where:  
- $\bar{X}_A = 80$: Mean of Team A  
- $\bar{X}_B = 75$: Mean of Team B  
- $s_A = 5$: Standard deviation of Team A  
- $s_B = 6$: Standard deviation of Team B  
- $n_A = 20$: Sample size for Team A  
- $n_B = 20$: Sample size for Team B  

Now, plug in the values:

$$
t = \frac{80 - 75}{\sqrt{\frac{5^2}{20} + \frac{6^2}{20}}}  
= \frac{5}{\sqrt{\frac{25}{20} + \frac{36}{20}}}  
= \frac{5}{\sqrt{1.25 + 1.8}}  
= \frac{5}{\sqrt{3.05}}  
= \frac{5}{1.747} \approx 2.86
$$

---

### Step 3: Determine the Critical Value and p-value
- **Degrees of freedom** (using Welch's approximation) is:

$$
df \approx \frac{\left(\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}\right)^2}{\frac{\left(\frac{s_A^2}{n_A}\right)^2}{n_A - 1} + \frac{\left(\frac{s_B^2}{n_B}\right)^2}{n_B - 1}}
$$

Plugging in the values:

$$
df \approx \frac{(1.25 + 1.8)^2}{\frac{1.25^2}{19} + \frac{1.8^2}{19}}  
\approx \frac{9.3025}{0.0822 + 0.1705}  
\approx \frac{9.3025}{0.2527} \approx 36.8
$$

We round down to 36 degrees of freedom.

- **Critical value** for a two-tailed test with $\alpha = 0.05$ and 36 degrees of freedom is approximately **±2.028**.

- For simplicity $df$ = ($n_A$ + $n_B$ - 2 )

---

### Step 4: Conclusion
- **Compare the t-statistic**: $t = 2.86$  
- **Critical value**: ±2.028  

Since $|t| = 2.86$ is greater than the critical value of 2.028, we **reject the null hypothesis**.

---

### Step 5: Final Conclusion
At the 5% significance level, there is sufficient evidence to conclude that there is a significant difference in the mean productivity scores between Team A and Team B.

---


In [54]:
# Given
a_sample_mean = 80
b_sample_mean = 75
a_sample_size = b_sample_size = 20
std_a = 5
std_b = 6
alpha = 0.05

In [55]:
df = (a_sample_size + b_sample_size - 2) # employees are different in two sample
df

38

In [56]:
t_table = st.t.ppf(alpha, df)
t_table

-1.6859544601667373

In [57]:
t_cal = (a_sample_mean - b_sample_mean)/np.sqrt((std_a**2/a_sample_size + std_b**2/b_sample_size))
t_cal

2.862991671569341

In [58]:
if(t_table<t_cal):
    print('Ha is right')  
else:
    print('H0 is right')    # Ha is right means claim is right (there is difference between productivity)

Ha is right


**Example 3:**

A company wants to test whether a new training program improves the typing speed of its employees. The typing speed of 20 employees was recorded before and after the training program. The data is given below. Test at a 5% level of significance whether the training program has an effect on the typing speed of the employees.

- **Before:** 50, 60, 45, 65, 55, 70, 40, 75, 80, 65, 70, 60, 50, 55, 45, 75, 60, 50, 65, 70
- **After:** 60, 70, 55, 75, 65, 80, 50, 85, 90, 70, 75, 65, 55, 60, 50, 80, 65, 55, 70, 75


In [59]:
# Given
before = np.array([50, 60, 45, 65, 55, 70, 40, 75, 80, 65, 70, 60, 50, 55, 45, 75, 60, 50, 65, 70])
after = np.array([60, 70, 55, 75, 65, 80, 50, 85, 90, 70, 75, 65, 55, 60, 50, 80, 65, 55, 70, 75])

sample_size = 20
alpha = 0.025

In [60]:
std_before = np.std(before)
std_after = np.std(after)
mean_after = np.mean(after)
mean_before = np.mean(before)
sample_size = 20

df = sample_size-1  # employees are same in two sample


In [61]:
t_table = st.t.ppf(1-alpha, df)
t_table

2.093024054408263

In [62]:
t_cal = (mean_after - mean_before)/np.sqrt(std_after**2/sample_size + std_before**2/sample_size)
t_cal

2.061200527128206

In [63]:
if(t_table<t_cal):
    print('Ha is right')  
else:
    print('H0 is right')    # H0 is right means claim is wrong (program has no effect)

H0 is right
