<a href="https://colab.research.google.com/github/swopnimghimire-123123/Maths_For_ML/blob/main/08_Inferential_Stat_%26_Hypothesis_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Colab 8: Inferential Statistics & Hypothesis Testing Basics
`Learning Goals`

- Understand inferential statistics (sample → population).

- Learn about hypotheses (null vs alternative).

- Understand errors (Type I & II).

- Learn about significance level (α) and p-values.

- Build intuition for when/why hypothesis testing is used.

### Inferential Statistics & Hypothesis Testing Basics

**1. Inferential Statistics**

*   **Essence:** Using a sample to draw conclusions about a population.
*   **Use Case:** We can’t test everyone, so we test a sample.
*   **Example:** Surveying 100 students to estimate average study time for the entire school.

**2. Null vs Alternative Hypothesis**

*   **Null (H₀):** “No effect” / “No difference.”
*   **Alternative (H₁):** “There is an effect.”
*   **Example:**
    *   H₀: “Studying has no effect on exam scores.”
    *   H₁: “Studying increases exam scores.”

**3. Errors**

*   **Type I Error (False Positive):** Rejecting H₀ when it’s actually true.
*   **Example:** Saying a medicine works when it doesn’t.
*   **Type II Error (False Negative):** Not rejecting H₀ when it’s false.
*   **Example:** Failing to detect that the medicine really works.

**4. Significance Level (α)**

*   The cutoff for deciding whether to reject H₀.
*   Common choice: α = 0.05 → 5% chance of making a Type I error.

**5. p-value**

*   Probability of seeing the data (or more extreme) if H₀ is true.
*   If p-value < α → reject H₀.
*   **Example:** p = 0.02 < 0.05 → evidence against H₀.

**6. General flow of hypothesis testing:**

*   State H₀ and H₁.
*   Choose significance level α.
*   Collect sample data.
*   Calculate a test statistic (z, t, etc.).
*   Find p-value.
*   Compare p-value to α → decide.

#7.  Let's test: Do students who study more hours score highter than 70 on average?

In [None]:
import numpy as np
from scipy import stats

# Sample: exam scores of students who studied a lot
scores = [72, 75, 78, 80, 85, 90, 88]

# Null hypothesis H0: mean = 70
# Alternative hypothesis H1: mean > 70

t_stat, p_value = stats.ttest_1samp(scores, 70)

print("t-statistic:", t_stat)
print("p-value:", p_value/2)  # one-tailed test

### If p-value < 0.05 → reject H₀ → studying really helps.

t-statistic: 4.371732137816399
p-value: 0.002355091980694754


### Practice Problems

1.  **Problem 1:**
    Take a sample of exam scores [60, 65, 70, 72, 68, 74].
    Test if the mean is different from 65.

2.  **Problem 2:**
    Simulate 30 coin flips and test if the probability of heads is 0.5. (Hint: Binomial test)

In [None]:
# Problem 1
import numpy as np
from scipy import stats

# Sample
score = [60, 65, 70, 72, 68, 74]

t_stat, p_value = stats.ttest_1samp(score, 65)
print("t-statistic:",t_stat)
print("p-value:",p_value)

''' If p-value < 0.05 → reject H₀ → the mean is significantly
   different from 65.

t-statistic: 1.5280897273222351
p-value: 0.18702919929376058


In [None]:
import numpy as np
from scipy.stats import binomtest

# Simulating 30 coin flips ( 0 = tails , 1 = heads)
np.random.seed(42)
flips = np.random.choice([0,1],size=30,p=[0.5,0.5])
heads_count = np.sum(flips)
print("Coin flips:",flips)
print("Number of heads:",heads_count)
print("Number of tails:",30-heads_count)

# perform binomial test
test_result = binomtest(heads_count, n=30, p=0.5)


# print result
print("\nBinomial test result:")
print("p-value:", test_result.pvalue)
if test_result.pvalue < 0.05:
    print("Reject null hypothesis: Coin might not be fair")
else:
    print("Fail to reject null hypothesis: Coin seems fair")

Coin flips: [0 1 1 1 0 0 0 1 1 1 0 1 1 0 0 0 0 1 0 0 1 0 0 0 0 1 0 1 1 0]
Number of heads: 13
Number of tails: 17

Binomial test result:
p-value: 0.5846647117286922
Fail to reject null hypothesis: Coin seems fair


### Conclusion

*   Inferential statistics lets us generalize from sample → population.
*   Hypothesis testing is a structured way to decide using data.
*   Null hypothesis is the default, p-values & α guide decisions.
*   Errors are always possible, but we control their rates.
*   In ML, hypothesis testing connects to feature importance, A/B testing, model evaluation.