# Hypothesis Testing (Z-test)

**Hypothesis testing** is a statistical method used to determine whether there is enough evidence in a sample to support or reject a claim about the population. The **Z-test** is used when the **population standard deviation** is known, and the **sample size is large** (typically $n \geq 30$).

---

## Purpose of Z-test

The Z-test helps determine whether the difference between the **sample mean** and the **population mean** is statistically significant or due to random chance. It is commonly used for:  
- Testing whether the sample mean differs from the population mean.  
- Comparing two independent sample means.  
- Testing population proportions for large samples.

---

## Steps in Hypothesis Testing (Z-test)

1. **Formulate Hypotheses**  
   - **Null Hypothesis** $H_0$: There is no significant difference between the sample mean and population mean.  
     $$H_0: \mu = \mu_0$$  
   - **Alternative Hypothesis** $H_1$: There is a significant difference between the sample mean and population mean.  
     $$H_1: \mu \neq \mu_0$$  

2. **Set the Significance Level**  
   The significance level is denoted by $\alpha$ and represents the probability of rejecting the null hypothesis when it is true. A typical choice is $\alpha = 0.05$.  

3. **Calculate the Z-statistic**  
   The Z-statistic measures how far the sample mean is from the population mean in terms of standard errors:  
   $$Z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}$$  
   Where:  
   - $\bar{X}$ is the sample mean  
   - $\mu_0$ is the population mean under the null hypothesis  
   - $\sigma$ is the population standard deviation  
   - $n$ is the sample size  

4. **Determine the Critical Value or p-value**  
   - Use a **Z-table** to find the critical value corresponding to the chosen $\alpha$.  
   - Alternatively, calculate the **p-value**, which is the probability of obtaining a result as extreme as the observed Z-statistic if $H_0$ is true.

5. **Decision Rule**  
   - If $|Z| > Z_{\alpha/2}$ (critical value), reject $H_0$.  
   - If the **p-value** is less than $\alpha$, reject $H_0$.

6. **Conclusion**  
   - If $H_0$ is rejected, the sample provides sufficient evidence to support the alternative hypothesis.  
   - If $H_0$ is not rejected, there is insufficient evidence to support the alternative hypothesis.

---

## Example

Suppose the average height of adults in a population is $\mu_0 = 170$ cm with a known standard deviation $\sigma = 5$ cm. A sample of $n = 50$ adults has an average height of $\bar{X} = 172$ cm. Test at the 5% significance level ($\alpha = 0.05$) whether the sample mean differs significantly from the population mean.

1. **Hypotheses**:  
   $$H_0: \mu = 170 \quad \text{vs} \quad H_1: \mu \neq 170$$  

2. **Calculate Z-statistic**:  
   $$Z = \frac{172 - 170}{5 / \sqrt{50}} = \frac{2}{0.707} \approx 2.83$$  

3. **Determine Critical Value**:  
   - At $\alpha = 0.05$, the critical value from the Z-table is $Z_{0.025} = 1.96$.  

4. **Decision**:  
   - Since $|2.83| > 1.96$, reject $H_0$.  

5. **Conclusion**:  
   - The sample mean is significantly different from the population mean at the 5% significance level.

---

## Question 1:

**Scenario:**  
A teacher claims that the mean score of the students in his class is greater than 82, with a standard deviation of 20. A sample of 81 students was selected, with a mean score of 90. Test the teacher's claim at a 5% level of significance.

---

### Step 1: Set the Hypotheses  
- **Null Hypothesis $H_0$**: The mean score of the students is 82 or less.  
  $H_0: \mu \leq 82$

- **Alternative Hypothesis $H_1$**: The mean score of the students is greater than 82.  
  $H_1: \mu > 82$

This will be a **one-tailed test** since the teacher claims the mean score is greater than 82.

---

### Step 2: Calculate the Test Statistic  
Since the **population standard deviation $\sigma$** is known, we use the **z-test** for a single sample.

The formula for the **z-statistic** is:

$$
z = \frac{\bar{X} - \mu_0}{\frac{\sigma}{\sqrt{n}}}
$$

Where:  
- $\bar{X} = 90$: Sample mean  
- $\mu_0 = 82$: Claimed population mean  
- $\sigma = 20$: Population standard deviation  
- $n = 81$: Sample size  

Now, plug in the values:

$$
z = \frac{90 - 82}{\frac{20}{\sqrt{81}}}  
= \frac{8}{\frac{20}{9}}  
= \frac{8}{2.22} \approx 3.6
$$

---

### Step 3: Determine the Critical Value  
For a **one-tailed z-test** at a **5% level of significance** ($\alpha = 0.05$), the critical value is:

$$
z_{\alpha} = 1.645
$$

---

### Step 4: Conclusion  
- **Compare the z-statistic**: $z = 3.6$  
- **Critical value**: $z_{\alpha} = 1.645$

Since $z = 3.6$ is greater than the critical value of $1.645$, we **reject the null hypothesis**.

---

### Step 5: Final Conclusion  
At the 5% significance level, there is sufficient evidence to support the teacher's claim that the mean score of the students is greater than 82.

---

In [1]:
import scipy.stats as st
import numpy as np

In [2]:
# Given
sample_mean = 90
sample_size = 81
pop_mean = 82
std = 20

# Let
alpha = 0.05

In [3]:
# Calculating Z-Statistic
z_cal = (sample_mean - pop_mean)/(std/np.sqrt(sample_size))
z_cal


3.5999999999999996

In [4]:
# Z-Table value
z_table = st.norm.ppf(1-alpha)
z_table

1.6448536269514722

In [5]:
if(z_table<z_cal):
    print('Ha is right')  # Ha means teacher is right
else:
    print('H0 is right')

Ha is right


## Question 2:

**Scenario:**  
Imagine you work for an e-commerce company, and your team is responsible for analyzing customer purchase data. You want to determine whether a new website design has led to a significant increase in the average purchase amount compared to the old design.

---

### Step 1: Set the Hypotheses  
- **Null Hypothesis $H_0$**: There is no increase in the average purchase amount with the new design.  
  $H_0: \mu_{\text{new}} = \mu_{\text{old}}$

- **Alternative Hypothesis $H_1$**: The new design has led to a significant increase in the average purchase amount.  
  $H_1: \mu_{\text{new}} > \mu_{\text{old}}$

This will be a **one-tailed test** since we are testing for an increase in the average purchase amount.

---

### Step 2: Calculate the Sample Statistics  
- **Old design data (n = 30):**  
  $[45.2, 42.8, 38.9, 43.5, 41.0, 44.6, 40.5, 42.7, 39.8, 41.4, 44.3, 39.7, 42.1, 40.6, 43.0, 42.2, 41.5, 39.6, 44.0, 41.3, 38.7, 43.9, 42.8, 43.7, 41.3, 40.9, 41.9, 43.6, 42.5, 41.6]$

  - Mean $ \bar{X}_{\text{old}} = 42.1$  
  - Standard deviation $s_{\text{old}} = 1.9$

- **New design data (n = 30):**  
  $[48.5, 49.1, 50.2, 47.8, 48.7, 49.9, 48.0, 50.5, 48.9, 49.6, 48.2, 48.9, 49.7, 50.3, 49.4, 50.1, 48.6, 48.3, 49.0, 50.0, 48.4, 49.3, 49.5, 48.8, 50.6, 50.4, 48.1, 49.2, 50.7, 50.8]$

  - Mean $ \bar{X}_{\text{new}} = 49.2$  
  - Standard deviation $s_{\text{new}} = 1.6$

---

### Step 3: Calculate the Test Statistic  
Since the **population standard deviation $\sigma$** is known, we use the **z-test** for two independent samples.

The formula for the z-statistic is:

$$
z = \frac{\bar{X}_{\text{new}} - \bar{X}_{\text{old}}}{\sigma \sqrt{\frac{1}{n_{\text{new}}} + \frac{1}{n_{\text{old}}}}}
$$

Where:  
- $\bar{X}_{\text{new}} = 49.2$  
- $\bar{X}_{\text{old}} = 42.1$  
- $\sigma = 2.5$  
- $n_{\text{new}} = n_{\text{old}} = 30$

Now, plug in the values:

$$
z = \frac{49.2 - 42.1}{2.5 \sqrt{\frac{1}{30} + \frac{1}{30}}}
= \frac{7.1}{2.5 \sqrt{0.0667}}
= \frac{7.1}{2.5 \times 0.2582}
= \frac{7.1}{0.6455} \approx 11.0
$$

---

### Step 4: Determine the Critical Value  
For a **one-tailed z-test** at a **5% level of significance** ($\alpha = 0.05$), the critical value is:

$$
z_{\alpha} = 1.645
$$

---

### Step 5: Conclusion  
- **Compare the z-statistic**: $z = 11.0$  
- **Critical value**: $z_{\alpha} = 1.645$

Since $z = 11.0$ is greater than the critical value of $1.645$, we **reject the null hypothesis**.

---

### Step 6: Final Conclusion  
At the 5% significance level, there is sufficient evidence to conclude that the new website design has led to a significant increase in the average purchase amount compared to the old design.

---

In [6]:
# Given

old_design_data = np.array([45.2, 42.8, 38.9, 43.5, 41.0, 44.6, 40.5, 42.7, 39.8, 41.4, 44.3, 39.7, 
 42.1, 40.6, 43.0, 42.2, 41.5, 39.6, 44.0, 41.3, 38.7, 43.9, 42.8, 43.7, 
 41.3, 40.9, 41.9, 43.6, 42.5, 41.6])

new_design_data = np.array([48.5, 49.1, 50.2, 47.8, 48.7, 49.9, 48.0, 50.5, 48.9, 49.6, 48.2, 48.9, 
 49.7, 50.3, 49.4, 50.1, 48.6, 48.3, 49.0, 50.0, 48.4, 49.3, 49.5, 48.8, 
 50.6, 50.4, 48.1, 49.2, 50.7, 50.8])

std = 2.5

# Let
alpha = 0.05

In [7]:
new_mean = np.mean(new_design_data)
old_mean = np.mean(old_design_data)
sample_size = len(new_design_data)

old_mean, new_mean, sample_size

(41.986666666666665, 49.316666666666656, 30)

In [8]:
# Calculating Z-Statistic (assuming past and present mean of population is same)
z_cal = (new_mean - old_mean)/(std*(np.sqrt(1/sample_size + 1/sample_size)))
z_cal

11.355587171080135

In [9]:
# Z-Table Value
z_table = st.norm.ppf(1-alpha)
z_table

1.6448536269514722

In [10]:
if(z_table<z_cal):
    print('Ha is right')  # Ha means new website is better that old website
else:
    print('H0 is right')

Ha is right
