# Hypothesis Testing

In business, many decisions have to be made every day. Instead of relying on guesswork, performing experiments and forming hypotheses is a more data-driven approach to decision-making.

---

## What is Hypothesis Testing?

- Hypothesis testing is a statistical mechanism used to make decisions or inferences about population parameters based on sample data.  
- It helps to **prove or disprove claims** being tested.  
- It provides a structured framework to define the problem and make data-centric decisions.  

---

### **Statistical Hypothesis**

When a researcher already has an idea or assumption about the outcome before doing an experiment, a **statistical hypothesis** provides a structured framework for testing and decision-making.

There are two types of hypotheses:

1. **Null Hypothesis (H₀)** – assumes no change or effect; old beliefs hold true.  
2. **Alternative Hypothesis (Hₐ)** – assumes there is a new effect, difference, or relationship.

---

### **Example 1: Census of Height**

$$H_{0}: \mu = 160$$  
$$H_{a}: \mu \neq 160$$

---

### **Example 2: Fish Farm**

Suppose we want to test if the average length of fish is greater than 2 kg.

$$H_{0}: \mu = 2$$  
$$H_{a}: \mu > 2$$

In general, any new claim or proposed change is defined in the **alternative hypothesis**.

---

## Types of Hypothesis Tests

### 1. **Two-Tailed Test**
Used when we are checking for any difference (increase or decrease).

$$H_{0}: \mu = 163$$  
$$H_{a}: \mu \neq 163$$

This test requires further investigation to determine the direction of the difference.

---

### 2. **One-Tailed Test**
Used when we expect the difference to be in a specific direction (greater or smaller).

$$H_{0}: \text{length} = 2$$  
$$H_{a}: \text{length} > 2$$

Used when the researcher believes the true value will be greater (or smaller) than the hypothesized value.

---

## Interpreting Results

- If the **null hypothesis (H₀)** is rejected, and the **alternative hypothesis (Hₐ)** is accepted, we say the result is **statistically significant**.  
- This means the outcome is unlikely to have occurred by random chance.

Example:  
If 2.1 kg is statistically higher than 2 kg, it may be statistically significant, but not necessarily **practically significant** for business use.

---

## Steps of Performing a Hypothesis Test

1. **State the hypotheses** – H₀ and Hₐ  
2. **Choose the significance level (α)** – usually 0.05  
3. **Select the appropriate statistical test** (z-test, t-test, etc.)  
4. **Collect the sample data**  
5. **Compute the test statistic and compare with the critical value**  
6. **Make a decision** – reject or fail to reject H₀  

---

# Type I and Type II Errors

- **Type I Error (α):** Rejecting a true null hypothesis.  
- **Type II Error (β):** Failing to reject a false null hypothesis.  

A researcher cannot commit both errors in the same test.

- **α (alpha)** occurs when H₀ is rejected.  
- **β (beta)** occurs when H₀ is not rejected but is false.

---

## Example of a Statistical Test (When Population Information is Known)

The **Z-test formula**:

$$z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}}$$

Where:
- $\bar{x}$ = sample mean  
- $\mu$ = population mean (hypothesized)  
- $\sigma$ = population standard deviation  
- $n$ = sample size  

Example Hypothesis:

$$H_{0}: \mu = 170$$  
$$H_{a}: \mu \neq 170$$

This is a **two-tailed test**.

The critical z-value for a 0.05 significance level (two-tailed) is **±1.96**.

If the calculated z < -1.96 or > +1.96 → **Reject H₀**

---

### Example Result:

- Observed z-value: -2.45  
- Critical value: ±1.96  
- Decision: **Reject H₀**

 We did **not** make a Type I error because the value is actually not equal to 170.

---

# p-value (Observed Significance Level)

Another approach to making a decision is using the **p-value**.

The p-value represents the smallest α level at which H₀ can be rejected.

- If **p < 0.05** → strong evidence against H₀ → **Reject H₀**  
- If **p > 0.05** → weak evidence → **Fail to reject H₀**  
- If **p ≈ 0.05** → borderline evidence → decision uncertain  

Example:  
If z = 2.45, p = 0.006  
→ Reject H₀ at α = 0.05  
→ Fail to reject H₀ at α = 0.001  

---

# t-test for Mean Estimation of Population

Used when population standard deviation (σ) is **unknown**.

$$t = \frac{\bar{x} - \mu}{\frac{s}{\sqrt{n}}}$$

Where:  
- $\bar{x}$ = sample mean  
- $\mu$ = population mean (expected)  
- $s$ = sample standard deviation  
- $n$ = sample size  
- **Degrees of freedom (df)** = n - 1  

If **p < 0.05**, reject the null hypothesis.

---

### Example Results:

| Test | Hypothesized Mean | t-Statistic | p-Value | Decision |
|------|--------------------|-------------|----------|-----------|
| t-test 1 | 170 | -2.26 | 0.026 | Reject H₀ |
| t-test 2 | 168 | -0.68 | 0.49 | Fail to reject H₀ |
| t-test 3 | 169 | -1.47 | 0.14 | Fail to reject H₀ |

---

### Summary:

- **Reject H₀** → p < 0.05 (statistically significant)  
- **Fail to Reject H₀** → p > 0.05 (not significant)  

---

 **Conclusion:**
Hypothesis testing allows data-driven decisions by providing a structured way to test claims.  
However, statistical significance should always be interpreted with **business context and practical importance** in mind.



## Sasa lets make things easy for us devs ny making a funcions that calculate everything for us 

Import Libraries

In [2]:
import numpy as np
import math
from scipy import stats
import pandas as pd

# ==========================
# Function: compare_two_means
# Purpose: Compare the means of two independent samples
# ==========================
def compare_two_means(sample1, sample2, label1="Group 1", label2="Group 2", alpha=0.05, assume_equal_var=False):
    """
    Performs a complete comparison between two independent samples including:
    - Descriptive statistics
    - Variance equality check (Levene's test)
    - Appropriate t-test (Student's or Welch's)
    - Confidence interval of mean difference
    - Effect size (Cohen's d)
    - Final interpretation summary
    """

    # --------------------------
    # Step 0: Header
    # --------------------------
    print("="*70)
    print(f"Comparing Means between {label1} and {label2}")
    print("="*70)

    # --------------------------
    # Step 1: Hypotheses
    # --------------------------
    print("\nHypotheses:")
    print(f"H0: The means of {label1} and {label2} are equal.")
    print(f"H1: The means of {label1} and {label2} are not equal.")     

    # --------------------------
    # Step 2: Descriptive Statistics
    # --------------------------
    mean1, mean2 = np.mean(sample1), np.mean(sample2)
    std1, std2 = np.std(sample1, ddof=1), np.std(sample2, ddof=1)
    var1, var2 = np.var(sample1, ddof=1), np.var(sample2, ddof=1)
    n1, n2 = len(sample1), len(sample2)

    print("\nStep 2: Sample Summary")
    print(f"{label1}: n={n1}, mean={mean1:.2f}, std={std1:.2f}")
    print(f"{label2}: n={n2}, mean={mean2:.2f}, std={std2:.2f}\n")

    # --------------------------
    # Step 3: Variance Check (Levene’s Test)
    # --------------------------
    print("Step 3: Variance Check")
    lev_stat, lev_p = stats.levene(sample1, sample2)
    print(f"Levene’s test p-value: {lev_p:.4f} → {'Equal variances' if lev_p > alpha else 'Unequal variances'}")

    equal_var = assume_equal_var if assume_equal_var else (lev_p > alpha)
    print(f"→ Proceeding with {'Student’s t-test (equal variances)' if equal_var else 'Welch’s t-test (unequal variances)'}\n")

    # --------------------------
    # Step 4: Perform T-Test
    # --------------------------
    t_stat, p_val = stats.ttest_ind(sample1, sample2, equal_var=equal_var)
    print("Step 4: T-Test")
    print(f"t-statistic = {t_stat:.3f}, p-value = {p_val:.4f}")
    print(f"→ {'Reject H0' if p_val < alpha else 'Fail to reject H0'}\n")

    # --------------------------
    # Step 5: Confidence Interval (95%)
    # --------------------------
    print("Step 5: Confidence Interval (95%)")
    mean_diff = mean1 - mean2
    se = math.sqrt(var1/n1 + var2/n2)
    df = n1 + n2 - 2 if equal_var else ( (var1/n1 + var2/n2)**2 ) / \
        ((var1/n1)**2/(n1-1) + (var2/n2)**2/(n2-1))
    t_crit = stats.t.ppf(1 - alpha/2, df=df)
    lower = mean_diff - t_crit * se
    upper = mean_diff + t_crit * se
    print(f"Mean difference = {mean_diff:.3f}")
    print(f"95% CI: ({lower:.3f}, {upper:.3f})")
    print(f"→ 0 {'is NOT' if lower>0 or upper<0 else 'IS'} within the interval → "
          f"{'Significant difference' if lower>0 or upper<0 else 'No significant difference'}\n")

    # --------------------------
    # Step 6: Effect Size (Cohen’s d)
    # --------------------------
    print("Step 6: Effect Size (Cohen’s d)")
    pooled_std = math.sqrt(((n1 - 1)*std1**2 + (n2 - 1)*std2**2) / (n1 + n2 - 2))
    cohen_d = mean_diff / pooled_std
    effect_strength = ("Very small" if abs(cohen_d) < 0.2 else
                       "Small" if abs(cohen_d) < 0.5 else
                       "Medium" if abs(cohen_d) < 0.8 else "Large")
    print(f"Cohen’s d = {cohen_d:.3f} → {effect_strength} effect\n")

    # --------------------------
    # Step 7: Final Interpretation
    # --------------------------
    print("Step 7: Interpretation Summary")
    print("-"*70)
    if p_val < alpha:
        print(f"We reject the null hypothesis. There IS a statistically significant difference between {label1} and {label2}.")
    else:
        print(f"We fail to reject the null hypothesis. There is NO statistically significant difference between {label1} and {label2}.")
    print(f"Effect Size: {effect_strength} ({cohen_d:.3f})")
    print(f"Confidence Interval: ({lower:.3f}, {upper:.3f})")
    print("="*70)


In [5]:
# Load data from CSV
data = pd.read_excel("../Datasets/Sample - Superstore.xls")

# Extract sample values
appSalesData = data[data['Sub-Category'] == 'Appliances']['Sales'].values
accSalesData = data[data['Sub-Category'] == 'Accessories']['Sales'].values

# Compare means
compare_two_means(appSalesData, accSalesData, label1="Appliances", label2="Accessories", alpha=0.05)


Comparing Means between Appliances and Accessories

Hypotheses:
H0: The means of Appliances and Accessories are equal.
H1: The means of Appliances and Accessories are not equal.

Step 2: Sample Summary
Appliances: n=466, mean=230.76, std=388.95
Accessories: n=775, mean=215.97, std=334.97

Step 3: Variance Check
Levene’s test p-value: 0.1525 → Equal variances
→ Proceeding with Student’s t-test (equal variances)

Step 4: T-Test
t-statistic = 0.708, p-value = 0.4791
→ Fail to reject H0

Step 5: Confidence Interval (95%)
Mean difference = 14.781
95% CI: (-27.725, 57.287)
→ 0 IS within the interval → No significant difference

Step 6: Effect Size (Cohen’s d)
Cohen’s d = 0.041 → Very small effect

Step 7: Interpretation Summary
----------------------------------------------------------------------
We fail to reject the null hypothesis. There is NO statistically significant difference between Appliances and Accessories.
Effect Size: Very small (0.041)
Confidence Interval: (-27.725, 57.287)
