

### Hypothesis Testing: Making Decisions with Data

At its core, **Inferential Statistics** is the science of using data from a small **sample** to draw a **conclusion (or inference)** about a much larger **population**. Hypothesis testing is the formal, structured procedure we use to make these data-driven decisions.

Think of it as the scientific method applied to data. You start with a claim or an assumption, collect evidence, and then decide whether the evidence is strong enough to support or reject that claim.

### The 4-Step Hypothesis Testing Mechanism

The entire process can be perfectly understood using the legal analogy like: a person is accused of a crime and must face trial in court.

#### Step 1: State the Null Hypothesis (H₀)

The **Null Hypothesis (H₀)** is the default assumption, the status quo, or the statement of "no effect" or "no difference." It is the claim that we initially assume to be true before we look at any evidence.

*   **In the Courtroom:** The defendant is **"presumed innocent until proven guilty."** The null hypothesis is `H₀: The person is not guilty`. This is the starting assumption of the court.
*   **In Statistics:** It's the baseline claim we are testing against. It almost always contains a statement of equality (=, ≤, or ≥).

**Example:**
A new college opens in a district where the average pass percentage is 85%.
*   **H₀: The new college's pass percentage is the same as the district average (μ = 85%).**
We start by assuming the new college is not special or different.

#### Step 2: State the Alternate Hypothesis (H₁)

The **Alternate Hypothesis (H₁ or Hₐ)** is the "opposite" of the null hypothesis. It is the claim the researcher is actually trying to find evidence for. It represents a new theory or a claim that there *is* an effect or a difference.

*   **In the Courtroom:** The prosecutor's claim is `H₁: The person is guilty`. They must present evidence to convince the jury to reject the initial assumption of innocence.
*   **In Statistics:** This is what we will conclude if our evidence is strong enough. It contains a statement of inequality (≠, <, or >).

**Example :**
We want to see if the new college has a *different* pass percentage.
*   **H₁: The new college's pass percentage is not the same as the district average (μ ≠ 85%).**
This is the claim we are testing.

#### Step 3: Gather Evidence (Experiments & Statistical Analysis)

This is the phase where we collect data and analyze it to see how well it supports the alternate hypothesis.

*   **In the Courtroom:** The prosecutor presents evidence to the jury, such as **DNA samples, fingerprint tests, or witness testimonies**.
*   **In Statistics:** We collect a **sample** and calculate **test statistics** from the data. The goal is to determine the probability of seeing our sample results if the null hypothesis were actually true. This probability is the famous **p-value**.

**Key outputs of this step:**
*   **Test Statistic:** A number calculated from the sample data (e.g., a z-score or t-score) that measures how far our sample result is from the null hypothesis.
*   **p-value:** The probability of obtaining our sample results (or more extreme results) purely by random chance, *assuming the null hypothesis is true*. A small p-value means our result is very surprising and unlikely to have happened by chance if H₀ is true.

#### Step 4: Make a Decision (Accept or Reject the Null Hypothesis)

This is the "verdict" stage. We compare the strength of our evidence (the p-value) to a pre-determined threshold called the **significance level (α, alpha)**, which is often set at 0.05 (or 5%).

There are two possible outcomes:

1.  **Reject the Null Hypothesis (H₀):**
    *   **Condition:** If the p-value is very small (typically p ≤ α).
    *   **Meaning:** The evidence from our sample is so strong that it's highly unlikely to have occurred by random chance if H₀ were true. Therefore, we reject the null hypothesis in favor of the alternate hypothesis.
    *   **Courtroom Analogy:** The evidence (DNA, etc.) is overwhelming. The jury rejects the assumption of innocence and delivers a **"Guilty"** verdict.

2.  **Fail to Reject the Null Hypothesis (H₀):**
    *   **Condition:** If the p-value is not small enough (p > α).
    *   **Meaning:** The evidence from our sample is not strong enough. The results could have plausibly happened by random chance even if H₀ were true. We don't have enough evidence to say H₀ is false.
    *   **Courtroom Analogy:** The evidence is weak or inconclusive. The jury cannot reject the assumption of innocence. The verdict is **"Not Guilty."**

**Crucial Point:** Notice we say "fail to reject H₀" not "accept H₀." Just like a "not guilty" verdict doesn't prove the person is innocent—it just means there wasn't enough evidence to convict—failing to reject the null hypothesis doesn't prove it's true. It simply means our sample didn't provide sufficient evidence to overturn it.



### The P-value: The Measure of Surprise

In hypothesis testing, we start by assuming the **null hypothesis (H₀)** is true. We then collect data and ask ourselves: "If the null hypothesis were really true, how surprising is our data?" The **p-value** is the number that quantifies this surprise.

**Formal Definition:**
> The p-value is the probability of observing your sample results (or results even more extreme) purely by random chance, **assuming the null hypothesis is true**.

*   A **small p-value** (e.g., ≤ 0.05) means your observed data is very surprising and unlikely to have happened by chance if H₀ were true. This is strong evidence *against* the null hypothesis.
*   A **large p-value** (e.g., > 0.05) means your observed data is not surprising. It's something that could plausibly happen by chance if H₀ were true. This is weak evidence *against* the null hypothesis.

---

### The 5-Step Hypothesis Test with the P-value (Coin Example)

Let's walk through the detailed example : we want to determine if a coin is fair.

#### Step 1: State the Null Hypothesis (H₀)
We start with the default assumption of "no effect" or "no difference."
*   **H₀: The coin is fair.** (The probability of heads, P(H), is 0.5).

#### Step 2: State the Alternate Hypothesis (H₁)
This is the claim we are looking for evidence to support.
*   **H₁: The coin is not fair.** (P(H) is not 0.5).

#### Step 3: Conduct the Experiment
We need to gather evidence.
*   **Experiment:** Flip the coin 100 times and count the number of heads.

#### Step 4: Set the Significance Level (α) and Understand the Distribution
This is the crucial step where the p-value comes into play.

*   **Significance Level (α):** This is our "threshold of surprise." Before we even run the experiment, we decide on a cutoff point. A common choice is **α = 0.05 (or 5%)**. This means we are willing to accept a 5% risk of incorrectly rejecting a true null hypothesis.

*   **The Distribution Graph:** The bell curve represents the **sampling distribution** of all possible outcomes of flipping a *fair* coin 100 times.
    *   **Center (50 heads):** The most likely outcome for a fair coin is 50 heads.
    *   **The 95% Confidence Interval (CI):** This is the central area of the graph. If the coin is fair, 95% of the time our result will fall within this range. These are the "not surprising" outcomes.
    *   **The Rejection Regions (The Tails):** The two extreme ends of the distribution make up the remaining 5%. Since our α = 0.05 and we are testing for "not equal to" (a two-tailed test), we split this into 2.5% on each side. If our result lands in one of these tails, it's considered a **statistically significant** (i.e., very surprising) result.

#### Step 5: Calculate the P-value and Make a Conclusion
Now we look at our experimental result and see where it falls. Let's imagine we ran the experiment and got **70 heads**.

1.  **Locate the result:** We find 70 on the graph. It's far out in the right-hand rejection region.
2.  **Calculate the p-value:** We calculate the probability of getting a result *at least as extreme* as 70 heads, assuming the coin is fair. "As extreme" means 70 or more heads, OR its mirror image, 30 or fewer heads. The p-value is the total area of both of these tails combined. Let's say, for this result, the calculated p-value is **0.002**.
3.  **Compare p-value to α:** This is the final decision rule.
    *   `p-value (0.002) ≤ α (0.05)`

**The Conclusion:**
Since our p-value is less than our significance level, we **Reject the Null Hypothesis (H₀)**.

**Interpretation:**
"Assuming the coin was fair, the probability of getting a result as extreme as 70 heads by pure random chance is only 0.2% (p=0.002). This is so unlikely that we reject our initial assumption. The evidence strongly suggests that the coin is not fair."

If we had gotten 54 heads, the p-value would have been large (e.g., p=0.35). Since 0.35 > 0.05, we would **Fail to Reject the Null Hypothesis**, because a result of 54 heads is not surprising for a fair coin.



### The Student's t-Distribution: Handling the Unknown

In statistical analysis, the Z-test is a powerful tool for hypothesis testing. However, it relies on a critical piece of information that is often unavailable in real-world scenarios: the **population standard deviation (σ)**.

The fundamental question then becomes:
> "How do we perform a reliable analysis when we don't know the population's true standard deviation?"

The answer is the **Student's t-distribution**.

#### When to Use the t-distribution Instead of the Z-distribution

You should use the t-distribution (and a **t-test**) under the following conditions:
1.  The **population standard deviation (σ) is unknown**.
2.  The **sample size is small (typically n < 30)**.
3.  The population from which the sample is drawn is assumed to be approximately normally distributed.

Instead of the unknown `σ`, we use the **sample standard deviation (s)** as an estimate.

#### The t-statistic vs. the Z-statistic

The formulas for the two statistics look very similar, but their underlying distributions are different.

*   **Z-statistic:**
    `Z = (x̄ - μ) / (σ / √n)`
    This formula uses the *true population parameter* `σ`. The Z-statistic follows a standard normal distribution (Z-distribution).

*   **t-statistic:**
    `t = (x̄ - μ) / (s / √n)`
    This formula uses the *sample statistic* `s` as an estimate for `σ`. The t-statistic follows a Student's t-distribution.

The key difference is that `s` (the sample standard deviation) will vary from sample to sample, while `σ` is a fixed constant for the population. This extra variability introduced by estimating the standard deviation is what makes the t-distribution different from the Z-distribution.

#### Characteristics of the t-distribution

The t-distribution is a family of distributions that shares some similarities with the normal distribution but has one key difference:
*   **Shape:** It is symmetric and bell-shaped, just like the normal distribution.
*   **Tails:** The t-distribution has **"fatter" or "heavier" tails** than the normal distribution. This means it assigns more probability to extreme values. This accounts for the extra uncertainty we have because we are using `s` to estimate `σ`.
*   **Degrees of Freedom (df):** The exact shape of the t-distribution is determined by a parameter called the **degrees of freedom (df)**.

As the **degrees of freedom increase** (which happens as the sample size `n` gets larger), the t-distribution gets closer and closer to the standard normal distribution. By the time `n` is over 30, the two are practically identical.

#### Understanding Degrees of Freedom (df)

Degrees of freedom represent the number of independent pieces of information available to estimate a parameter. For a one-sample t-test, the formula is:

**df = n - 1**

where `n` is the sample size.

**Intuitive Explanation:**
Imagine you have a sample of 3 people (`n=3`), and you know the average height of these 3 people is 170 cm.
*   The first person's height can be anything—it's free to vary.
*   The second person's height can also be anything—it's free to vary.
*   However, once you know the heights of the first two people, the height of the **third person is fixed**. It must be a specific value to make the average of all three equal to 170 cm.

So, in a sample of 3, only 2 values are "free" to vary. Therefore, the degrees of freedom is `3 - 1 = 2`.

When performing a t-test, you will use a **t-table**, which requires you to know both your desired significance level (alpha) and the degrees of freedom to find the critical t-value for your hypothesis test.


### Hypothesis Testing & Statistical Analysis: An Overview

Hypothesis testing is the formal procedure for using sample data to draw conclusions about a population. The goal is to determine if there is enough evidence to reject a default assumption (the null hypothesis). The choice of statistical test depends on the type of data and the research question.

this notes cover some of the most fundamental tests:
*   **Z-test & t-test:** Used to compare the **average** (mean) of a group to a known value or to compare the averages of two groups.
*   **Chi-Square (χ²) test:** Used to analyze **categorical data** to see if there is a relationship between two variables (e.g., is there a relationship between gender and voting preference?).
*   **ANOVA (Analysis of Variance):** Used to compare the **averages** of three or more groups simultaneously. It analyzes the **variance** between and within groups.

This guide will focus on the **Z-test**, as detailed in notes.

---

### The Z-test

The Z-test is a statistical test used to determine whether two population means are different when the variances are known and the sample size is large.

**When to use a Z-test:**
1.  You are comparing a sample mean to a population mean.
2.  The **population standard deviation (σ) is known**.
3.  The **sample size (n) is large (typically n > 30)**, or the population is known to be normally distributed.

The output of a Z-test is a **Z-score** (or Z-statistic), which is then used to calculate a **p-value**.

---

### Example 1: Average Heights (A Two-Tailed Test)

This is a classic example of a two-tailed test, where we are looking for any significant *difference* (either higher or lower) from the population mean.

**Scenario:** The average height of all residents in a city is **168 cm** (μ) with a population standard deviation of **3.9 cm** (σ). A doctor measures a sample of **36 individuals** (n) and finds their average height to be **169.5 cm** (x̄).

**Question:** At a 95% confidence level (α = 0.05), is there enough evidence to say the true average height is different from 168 cm?

#### Step 1: State the Null and Alternate Hypotheses
*   **Null Hypothesis (H₀):** `μ = 168 cm` (The true average height is 168 cm).
*   **Alternate Hypothesis (H₁):** `μ ≠ 168 cm` (The true average height is *not* 168 cm).

#### Step 2: Establish the Decision Boundary (Critical Value Method)
*   **Confidence Level:** 95%
*   **Significance Level (α):** `1 - 0.95 = 0.05`.
*   **Test Type:** Two-tailed test, because we are testing for "not equal to." We split the alpha between the two tails: `0.05 / 2 = 0.025` in each tail.
*   **Critical Z-values:** We look up the Z-score that corresponds to the cumulative area of `1 - 0.025 = 0.975`. This gives us a critical Z-value of **±1.96**.

**Decision Rule:** If our calculated Z-score is **less than -1.96** or **greater than +1.96**, we will reject the null hypothesis.

#### Step 3: Calculate the Z-statistic
`Z = (x̄ - μ) / (σ / √n)`
`Z = (169.5 - 168) / (3.9 / √36)`
`Z = 1.5 / (3.9 / 6)`
`Z = 1.5 / 0.65`
`Z ≈ 2.31`

#### Step 4: Make a Conclusion
*   **Using the Critical Value Method:** Our Z-score of **2.31** is greater than the critical value of **1.96**. Therefore, it falls in the rejection region.
*   **Using the P-value Method:**
    1.  Find the probability associated with a Z-score of 2.31. The area to the right of 2.31 is approximately **0.01044**.
    2.  Since this is a two-tailed test, we must double this value: `p-value = 2 * 0.01044 = 0.02088`.
    3.  Compare the p-value to alpha: `0.02088 < 0.05`.

**Final Conclusion:** Both methods lead to the same conclusion: We **reject the null hypothesis**. The evidence suggests that the average height of the residents is significantly different from 168 cm. Based on the sample, it appears to be increasing.

---

### Example 2: Light Bulb Warranty (A One-Tailed Test)

This is an example of a one-tailed test, where we are looking for a difference in a *specific direction* (in this case, "less than").

**Scenario:** A factory produces bulbs with an average warranty of **5 years** (μ) and a standard deviation of **0.5 years** (σ). A worker tests a sample of **40 bulbs** (n) and finds the average life is **4.8 years** (x̄).

**Question:** At a 2% significance level (α = 0.02), is there enough evidence to support the idea that the bulbs malfunction in *less than* 5 years?

#### Step 1: State the Hypotheses
*   **Null Hypothesis (H₀):** `μ = 5 years` (The average warranty is 5 years).
*   **Alternate Hypothesis (H₁):** `μ < 5 years` (The average warranty is *less than* 5 years). This is what the worker believes.

#### Step 2: Establish the Decision Boundary
*   **Significance Level (α):** 0.02.
*   **Test Type:** One-tailed test (left-tailed, because we are testing for "less than"). The entire rejection region of 2% is in the left tail.
*   **Critical Z-value:** We look for the Z-score corresponding to an area of 0.02 in the left tail, which is approximately **-2.05**.

**Decision Rule:** If our calculated Z-score is **less than -2.05**, we will reject H₀. (we use -2.53, which seems to be the calculated value, not the critical one).

#### Step 3: Calculate the Z-statistic
`Z = (x̄ - μ) / (σ / √n)`
`Z = (4.8 - 5.0) / (0.50 / √40)`
`Z = -0.2 / (0.50 / 6.32)`
`Z = -0.2 / 0.079`
`Z ≈ -2.53`

#### Step 4: Make a Conclusion
*   **Using the Critical Value Method:** The calculated Z-score of **-2.53** is less than the critical value of **-2.05**. It falls in the rejection region.
*   **Using the P-value Method:**
    1.  The p-value is the area to the left of our calculated Z-score of -2.53. This area is **0.0057**.
    2.  Compare the p-value to alpha: `0.0057 < 0.02`.

**Final Conclusion:** The statement "0.0570 < 0.02 => False"  we should **reject the null hypothesis**. There is strong evidence to support the worker's belief that the bulbs last, on average, for less than 5 years. The warranty should likely be revised.


### One-Sample t-Test: The Goal

The one-sample t-test is a statistical procedure used to determine if the **mean of a single sample** is significantly different from a **known or hypothesized population mean**. We use a t-test instead of a Z-test because the **population standard deviation (σ) is unknown**, and we must use the **sample standard deviation (s)** as an estimate.

---

### The Scenario: Testing a New Medication

**Problem:** The average IQ in the general population is **100**. A research team wants to know if a new medication affects intelligence. They give the medication to a sample of **30 participants** and find that their average IQ is **140**, with a sample standard deviation of **20**.

**Question:** Using a 95% confidence level (α = 0.05), is there enough evidence to conclude that the medication has an effect (either positive or negative) on intelligence?

---

### The Hypothesis Testing Procedure

The standard 5-step process for hypothesis testing.

#### Step 1: State the Null and Alternate Hypotheses

First, we state our assumptions. The null hypothesis represents "no effect," while the alternate hypothesis represents what the researchers are trying to find evidence for.

*   **Null Hypothesis (H₀): `μ = 100`**
    This assumes the medication has **no effect**, and the average IQ of the group taking it is the same as the general population mean.

*   **Alternate Hypothesis (H₁): `μ ≠ 100`**
    This assumes the medication **does have an effect**, causing the average IQ to be different from the population mean. This is a **two-tailed test** because we are interested in any difference, whether it's an increase or a decrease in IQ.

#### Step 2: Set the Significance Level (α)

This is our threshold for what we consider "statistically significant."
*   **Significance Level (α) = 0.05**
    This means we are willing to accept a 5% risk of incorrectly rejecting the null hypothesis when it is actually true (a Type I error).

#### Step 3: Determine the Degrees of Freedom (df)

The shape of the t-distribution depends on the sample size, which is accounted for by the degrees of freedom.
*   **Formula:** `df = n - 1`
*   **Calculation:** `df = 30 - 1 = 29`

#### Step 4: Establish the Decision Rule (Find the Critical Value)

Based on our `α` and `df`, we find the critical t-values from a t-distribution table. These values create the boundaries for our "rejection regions."

*   **Significance Level (α):** 0.05
*   **Test Type:** Two-tailed, so we split the alpha: `0.05 / 2 = 0.025` in each tail.
*   **Degrees of Freedom (df):** 29
*   Looking up these values in a t-table gives us a critical t-value of **±2.045**.

**Decision Rule:** from t-statistic table,  **less than -2.045** or **greater than +2.045**, it will fall into the rejection region, and we will reject the null hypothesis.

#### Step 5: Calculate the Test Statistic (t-statistic)

Now, we calculate the t-statistic for our sample. This value tells us how many standard errors our sample mean is away from the population mean.

*   **Formula:** `t = (x̄ - μ) / (s / √n)`
    *   x̄ = Sample mean (140)
    *   μ = Population mean (100)
    *   s = Sample standard deviation (20)
    *   n = Sample size (30)

*   **Calculation:**
    `t = (140 - 100) / (20 / √30)`
    `t = 40 / (20 / 5.477)`
    `t = 40 / 3.65`
    `t ≈ 10.96`

---

### Conclusion and Interpretation

Now we compare our calculated t-statistic to our critical value from Step 4.

*   **Calculated t-statistic = 10.96**
*   **Critical t-value = ±2.045**

**Decision:**
Since **10.96 is much greater than 2.045**, our result falls deep inside the rejection region. Therefore, we **Reject the Null Hypothesis (H₀)**.

**Interpretation in Plain English:**
The results from our sample are highly statistically significant. There is very strong evidence to suggest that the medication has an effect on intelligence. Since the sample mean (IQ of 140) is significantly higher than the population mean (IQ of 100), we can conclude that the medication appears to substantially increase intelligence scores.



### Choosing Between a T-test and a Z-test: A Practical Guide

When conducting a hypothesis test for a single population mean, the choice between a t-test and a Z-test boils down to what information you have about the population. The decision process follows two key questions, just as laid out in your flowchart.

---

#### Question 1: Do you know the population standard deviation (σ)?

This is the most critical question. The **population standard deviation (σ)** is a parameter that describes the true variability of the entire population. In most real-world research, this value is **unknown**.

**Scenario A: NO, the population standard deviation (σ) is unknown.**
*   **Action:** **Use a t-test.**
*   **Reasoning:** Since the true population standard deviation is unknown, you must estimate it using the **sample standard deviation (s)**. This estimation introduces extra uncertainty into your analysis. The t-distribution is specifically designed to handle this uncertainty with its "fatter tails," making it the appropriate and more conservative choice. This is the most common scenario in practice.

**Scenario B: YES, the population standard deviation (σ) is known.**
*   **Action:** Proceed to Question 2.
*   **Reasoning:** Knowing the true population `σ` is rare but can happen in specific situations (e.g., in manufacturing quality control where long-term data exists, or in standardized testing like IQ scores). If you do have this information, you have the option of using the more powerful Z-test, but the sample size also plays a role.

---

#### Question 2 (only if you answered YES to Question 1): Is the sample size (n) large?

This question helps us decide if the conditions are robust enough for the Z-test.

**Sub-Scenario B1: YES, the sample size is large (n > 30).**
*   **Action:** **Use a Z-test.**
*   **Reasoning:** With a known population standard deviation `σ` and a large sample size, all the conditions for a Z-test are perfectly met. This is the ideal situation for using a Z-test.

**Sub-Scenario B2: NO, the sample size is small (n ≤ 30).**
*   **Action:** **Use a t-test (with a strong assumption).**
*   **Reasoning:** Even if you know `σ`, working with a very small sample is risky. Statisticians often prefer the more conservative t-test in this situation to account for the potential that the small sample may not be perfectly representative. To use a test in this scenario at all, you must also be confident that the **underlying population is approximately normally distributed**. Your flowchart simplifies this by defaulting to the t-test, which is a very safe and common practice.

---

### Summary Table

This table provides a concise summary of the logic in your flowchart.

| Do you know the population standard deviation (σ)? | Is the sample size (n) > 30? | Which Test to Use? |
|:----------------------------------------------------:|:----------------------------:|:--------------------:|
| **NO** (This is the most common case)              |           (Not relevant)           | **t-test**           |
| **YES**                                            |            **YES**             | **Z-test**           |
| **YES**                                            |             **NO**             | **t-test** *         |

*\*You can technically use a Z-test here if the population is known to be normally distributed, but using a t-test is often considered safer practice.*

By following the simple two-question logic in your flowchart, you can confidently select the appropriate statistical test for your data.

![image.png](attachment:image.png)



### Type I and Type II Errors: The Risks of Decision-Making

In hypothesis testing, we use sample data to make a decision about a population. However, because we are not testing the entire population, our decision is never 100% certain. There is always a risk of making an incorrect conclusion. These potential mistakes are known as **Type I and Type II errors**.

As your notes clearly state, the situation can be broken down into two components:
*   **The Reality:** The true state of the null hypothesis (which is unknown to us). It is either true or false.
*   **Our Decision:** Based on our sample data, we either reject or fail to reject the null hypothesis.

Combining these two components gives us four possible outcomes, two of which are correct decisions and two of which are errors.

---

### The Four Possible Outcomes

This table summarizes the four scenarios laid out in your notes:

|                       | **Reality: H₀ is TRUE** (The defendant is innocent) | **Reality: H₀ is FALSE** (The defendant is guilty) |
|-----------------------|:----------------------------------------------------:|:-----------------------------------------------------:|
| **Our Decision: <br> Reject H₀**  | **Type I Error** (Convicting an innocent person)   | **Correct Decision (Power)** (Convicting a guilty person) |
| **Our Decision: <br> Fail to Reject H₀(accept)** | **Correct Decision** (Acquitting an innocent person) | **Type II Error** (Acquitting a guilty person)     |

---

#### Correct Decision #1 (Outcome 4 in your notes)
*   **Scenario:** The null hypothesis is true in reality, and we decide **not to reject** it.
*   **Example:** A new drug has no effect on blood pressure (H₀ is true), and our study concludes that there is not enough evidence to say it has an effect.
*   **Result:** This is a good and correct decision.

#### Correct Decision #2 (Outcome 1 in your notes)
*   **Scenario:** The null hypothesis is false in reality, and we correctly **reject** it.
*   **Example:** A new marketing campaign really does increase sales (H₀ is false), and our analysis correctly concludes that sales have significantly increased.
*   **Result:** This is also a good and correct decision. This is often called the **power** of a test.

---

### The Two Types of Errors

#### Type I Error (Outcome 2 in your notes)
A Type I error is a **"false positive."** It occurs when you **reject a null hypothesis that is actually true**.

*   **In a Nutshell:** You conclude that there *is* an effect or a difference, when in reality, there isn't one.
*   **Legal Analogy:** Convicting an innocent person. You rejected the "not guilty" hypothesis, but it was true.
*   **Medical Analogy:** A medical test incorrectly tells a healthy person that they have a disease.
*   **The Probability (Alpha, α):** The probability of making a Type I error is controlled by the **significance level (α)** that you set for your test. If you set α = 0.05, you are accepting a 5% risk of making a Type I error.

#### Type II Error (Outcome 3 in your notes)
A Type II error is a **"false negative."** It occurs when you **fail to reject a null hypothesis that is actually false**.

*   **In a Nutshell:** You fail to find an effect or a difference that actually exists.
*   **Legal Analogy:** Acquitting a guilty person. You failed to reject the "not guilty" hypothesis, but it was false.
*   **Medical Analogy:** A medical test fails to detect a disease that a patient actually has.
*   **The Probability (Beta, β):** The probability of making a Type II error is denoted by **Beta (β)**. This is related to the "power" of a test (Power = 1 - β). Increasing your sample size is one of the most effective ways to decrease the chance of a Type II error.

### The Trade-off

There is an inherent trade-off between Type I and Type II errors.
*   If you make it **harder to reject H₀** (by lowering your alpha, e.g., from 0.05 to 0.01) to reduce your risk of a Type I error, you simultaneously **increase your risk of a Type II error**.
*   Conversely, making it easier to reject H₀ increases the risk of a Type I error while decreasing the risk of a Type II error.

The choice of which error is "worse" depends entirely on the context of the problem. Is it worse to send an innocent person to jail (Type I) or let a guilty person go free (Type II)? The answer shapes the standards of evidence required.



### Bayes' Theorem: The Foundation of Bayesian Statistics

Bayesian statistics is an approach to data analysis that treats probability as a degree of belief in a statement. The core engine that drives this approach is **Bayes' Theorem**, a mathematical formula for updating our beliefs in light of new evidence.

---

### Part 1: The Building Blocks of Probability

To understand Bayes' Theorem, we first need to distinguish between two types of events, as outlined in your notes.

#### 1. Independent Events

Two events are independent if the outcome of one has absolutely no effect on the outcome of the other.

*   **Example: Rolling a Die and Tossing a Coin**
    *   The probability of rolling a '4' is `P(Roll=4) = 1/6`.
    *   The probability of a coin landing on 'Heads' is `P(Toss=Heads) = 1/2`.
    *   These events are independent. Rolling the die does not change the probability of the coin toss.
    *   The probability of *both* happening is simply their probabilities multiplied: `P(Roll=4 and Toss=Heads) = (1/6) * (1/2) = 1/12`.

#### 2. Dependent Events and Conditional Probability

Two events are dependent if the outcome of the first event changes the probability of the second event. This introduces the crucial idea of **Conditional Probability**.

*   **Conditional Probability:** The probability of an event B occurring, *given that* event A has already occurred. This is written as **P(B|A)** and read as "the probability of B given A."

*   **Example from your notes:** Imagine a box with 5 balls (2 Red, 3 Yellow). You draw two balls *without replacement*.
    *   The probability of the first ball being Red is `P(R) = 2/5`.
    *   Now, *given that the first ball was Red*, there are only 4 balls left (1 Red, 3 Yellow). The probability of the second ball being Yellow is now different.
    *   This is a conditional probability: `P(Y|R) = 3/4`.
    *   The probability of *both* events happening in this order is: `P(R and Y) = P(R) * P(Y|R) = (2/5) * (3/4) = 6/20`.

---

### Part 2: Deriving and Understanding Bayes' Theorem

Bayes' Theorem is elegantly derived from the rules of conditional probability.

The rule for joint probability can be written in two ways:
1.  `P(A and B) = P(A) * P(B|A)`
2.  `P(B and A) = P(B) * P(A|B)`

Since `P(A and B)` is the same as `P(B and A)`, we can set these equal:
`P(A) * P(B|A) = P(B) * P(A|B)`

By simply rearranging this equation to solve for `P(A|B)`, we get Bayes' Theorem:

**P(A|B) = [ P(B|A) * P(A) ] / P(B)**

#### The Components of Bayes' Theorem

This formula allows us to "flip" the conditional probability. If we know `P(B|A)`, we can find `P(A|B)`. Each component has a special name:

*   **P(A|B): The Posterior Probability**
    This is what we want to calculate. It's the **updated** probability of our hypothesis `A` being true, after considering the new evidence `B`.

*   **P(A): The Prior Probability**
    This is our **initial belief** in the probability of hypothesis `A` being true, *before* we see any new evidence.

*   **P(B|A): The Likelihood**
    This is the probability of observing the evidence `B`, *given that* our hypothesis `A` is true.

*   **P(B): The Marginal Probability (or Evidence)**
    This is the total probability of observing the evidence `B` under all possible circumstances. It acts as a normalizing constant to ensure the final probability is between 0 and 1.

**In essence: Posterior = (Likelihood * Prior) / Evidence**

---

### Part 3: Application in Machine Learning (Naive Bayes Classifier)

Bayes' Theorem is the backbone of the popular **Naive Bayes** classification algorithm, which is used for tasks like spam detection and document classification.

Your notes show this perfectly with the house price prediction dataset.
*   **Goal:** Predict `y` (e.g., Price = "High") given the features `x₁, x₂, x₃` (Size of house, No. of rooms, Location).
*   **We want to find:** `P(y | x₁, x₂, x₃)`

Using Bayes' Theorem, the formula becomes:
**`P(y | x₁, x₂, x₃) = [ P(x₁, x₂, x₃ | y) * P(y) ] / P(x₁, x₂, x₃)`**

To make this calculation feasible, Naive Bayes makes a "naive" but powerful assumption: **all features (x₁, x₂, x₃) are conditionally independent of each other, given the class y.**

This assumption simplifies the "Likelihood" term dramatically:
`P(x₁, x₂, x₃ | y)` becomes `P(x₁|y) * P(x₂|y) * P(x₃|y)`

**How the classifier works:**
1.  It learns the **prior probability** `P(y)` from the training data (e.g., the overall percentage of "High" price houses).
2.  It learns the **likelihood** `P(xᵢ|y)` for each feature (e.g., the probability of having 4 rooms *given* the price is "High").
3.  To classify a new house, it plugs these learned probabilities into the formula for each possible class of `y` (e.g., "High", "Medium", "Low") and predicts the class that gives the highest posterior probability.



### Confidence Intervals and Margin of Error

In inferential statistics, we often start with a sample to make conclusions about an entire population. A **point estimate**, such as a **sample mean (x̄)**, is our single "best guess" for the unknown **population mean (μ)**.

#### The Problem with Point Estimates

A point estimate is almost never exactly correct. Due to random sampling variability, a sample mean can be slightly higher or lower than the true population mean. Relying on a single number gives us no sense of our uncertainty.

**Solution:** Instead of a single value, we define a **range of plausible values** that is likely to contain the population parameter. This range is called a **Confidence Interval**.

---

### Calculating the Confidence Interval

A confidence interval gives us a range that we believe contains the true population parameter with a certain level of confidence (e.g., 95% confident).

#### The General Formula

The formula to construct a confidence interval is built around the point estimate and accounts for the uncertainty with a margin of error.

**Confidence Interval = Point Estimate ± Margin of Error**

Where:
*   **Point Estimate:** Our sample statistic (e.g., the sample mean, x̄).
*   **Margin of Error:** The "buffer" we add and subtract to our point estimate to create the interval. It quantifies how much we expect our sample statistic to vary.

#### Formula for the Margin of Error

The calculation for the margin of error depends on whether we know the population standard deviation (`σ`).

1.  **When σ is known (or sample size n > 30) - We use a Z-test:**
    **Margin of Error = Z_α/₂ * (σ / √n)**

2.  **When σ is unknown and sample size n ≤ 30 - We use a t-test:**
    **Margin of Error = t_α/₂ * (s / √n)**
    *   `s` is the *sample* standard deviation.
    *   We also need to know the **degrees of freedom (df = n - 1)** to find the correct t-value.

This gives us the full formula for a confidence interval for a mean:
**`CI = x̄ ± Z_α/₂ * (σ / √n)`**

---

### Worked Example: CAT Exam Scores

**Scenario:**
On the verbal section of the CAT exam, the **population standard deviation (σ) is known to be 100**. A random sample of **25 test-takers (n)** has a **mean score of 520 (x̄)**.

**Goal:**
Construct a **95% confidence interval** for the true average CAT exam score.

#### Step 1: Identify Given Information
*   Sample Mean (x̄) = 520
*   Population Standard Deviation (σ) = 100
*   Sample Size (n) = 25  *(Correction: The speaker used 30 in the audio but wrote 25 in the formula; we will proceed with n=25 as used in the calculation)*
*   Confidence Level = 95%, which means the **Significance Level (α) = 0.05**.

#### Step 2: Find the Critical Value (Z_α/₂)
Since we need to construct a confidence interval, we are interested in the values that bound the middle 95% of the distribution. This is a **two-tailed** approach.

1.  The significance level `α` is split between the two tails: `α/2 = 0.05 / 2 = 0.025`.
2.  This means we have 2.5% of the area in the upper tail and 2.5% in the lower tail.
3.  To find the Z-score for the upper bound, we look for the area to the left of it, which is `1 - 0.025 = 0.9750`.
4.  Using a Z-table, the Z-score that corresponds to a cumulative area of 0.9750 is **1.96**.
    *   Therefore, **Z_α/₂ = 1.96**.

#### Step 3: Calculate the Margin of Error
Margin of Error = `1.96 * (100 / √25)`
Margin of Error = `1.96 * (100 / 5)`
Margin of Error = `1.96 * 20`
Margin of Error = `39.2`
*(Note: The speaker's final answers suggest the margin of error was calculated differently. Let's recalculate based on their final values: `520 - 480.8 = 39.2`)*

#### Step 4: Construct the Confidence Interval
Now we add and subtract the margin of error from our sample mean.

*   **Lower Confidence Interval Bound:**
    `Lower CI = x̄ - Margin of Error`
    `Lower CI = 520 - 39.2 = 480.8`

*   **Higher Confidence Interval Bound:**
    `Higher CI = x̄ + Margin of Error`
    `Higher CI = 520 + 39.2 = 559.2`

**Final Confidence Interval:** `[480.8, 559.2]`

---

### Conclusion and Interpretation

Based on our sample data, we can make the following statement:

> **"We are 95% confident that the true average CAT exam score for the entire population lies between 480.8 and 559.2."**

This interval gives us a range of plausible values for the population mean and reflects our level of certainty. If a null hypothesis value (e.g., H₀: μ = 480) falls outside this interval, we would reject it at the 0.05 significance level. If it falls within the interval (e.g., H₀: μ = 500), we would fail to reject it.


### The Chi-Square (χ²) Test

The Chi-Square test is a fundamental statistical procedure used for analyzing **categorical data**. It is a **non-parametric test**, which means it does not make assumptions about the data following a specific distribution (like the normal distribution).

The most common application is the **Chi-Square Goodness of Fit Test**.

#### Chi-Square Goodness of Fit Test

This test is used to determine if a sample of categorical data is a "good fit" for a hypothesized distribution for the population. In simpler terms, it helps us answer the question:
> "Do the observations from my sample match what I would expect to see based on a pre-existing theory about the population?"

The test makes a comparison between:
1.  **Observed Frequencies:** The actual counts collected from our sample data. This is our "observed categorical distribution."
2.  **Expected Frequencies:** The counts we would *expect* to see in our sample if the null hypothesis (the theory about the population) were true. This is our "theoretical categorical distribution."

---

### Understanding with an Example: Bike Color Preferences

**Scenario:** We want to investigate the color preferences for bikes among a population of males.

*   **The Theory (Null Hypothesis):** A long-standing theory claims that all bike colors are equally popular. For three colors—Yellow, Red, and Orange—this means:
    *   P(Yellow) = 1/3
    *   P(Red) = 1/3
    *   P(Orange) = 1/3
    This is our **theoretical distribution**.

*   **The Sample (Our Observation):** We take a sample of 98 people and record their preferences.
    *   Yellow: 22 people
    *   Red: 17 people
    *   Orange: 59 people
    This is our **observed distribution**.

**The Goal:** The Chi-Square Goodness of Fit test will help us determine if the differences between our observed counts (22, 17, 59) and the expected counts (which would be 98/3 ≈ 32.7 for each color) are statistically significant, or if they could have occurred simply due to random chance.

---

### Worked Example: Left-Handedness in a Science Class

**Problem Statement:**
In a science class of **75 students**, **11 are left-handed**. Does this class fit the established theory that **12% of the general population is left-handed**?

#### Step 1: Identify the Categories
The two categories for our variable (handedness) are:
1.  Left-Handed
2.  Right-Handed

#### Step 2: Determine the Observed Frequencies
These are the actual counts from our sample of 75 students.
*   **Observed Left-Handed:** 11
*   **Observed Right-Handed:** 75 - 11 = 64

| Category      | Observed Frequency |
|---------------|--------------------|
| Left-Handed   | 11                 |
| Right-Handed  | 64                 |
| **Total**     | **75**             |

#### Step 3: Calculate the Expected Frequencies
These are the counts we would expect if our sample perfectly matched the population theory.
*   **Theory (Null Hypothesis):** 12% of people are left-handed (and therefore 88% are right-handed).

*   **Expected Left-Handed:**
    `12% of 75 = 0.12 * 75 = 9`

*   **Expected Right-Handed:**
    `88% of 75 = 0.88 * 75 = 66`

| Category      | Observed Frequency | Expected Frequency |
|---------------|--------------------|--------------------|
| Left-Handed   | 11                 | 9                  |
| Right-Handed  | 64                 | 66                 |
| **Total**     | **75**             | **75**             |

#### Step 4: Perform the Chi-Square Test

Now we have the necessary components to conduct the Chi-Square test. We would use the following formula to calculate a Chi-Square statistic:

**χ² = Σ [ (Observed - Expected)² / Expected ]**

This statistic quantifies the total difference between our observed and expected counts. We then compare this statistic to a critical value from a Chi-Square distribution (or calculate a p-value) to make our final conclusion about whether our sample data is a "good fit" for the theory.


### Performing Hypothesis Testing with the Chi-Square (χ²) Test

This example demonstrates how to use the Chi-Square Goodness of Fit test to determine if the distribution of a categorical variable in a recent sample has significantly changed from a previously known population distribution.

---

### The Problem Statement

*   **In 2010**, a census revealed the distribution of weights in a city:
    *   Less than 50 kg: 20%
    *   50 to 75 kg: 30%
    *   Greater than 75 kg: 50%

*   **In 2020**, a new sample of **500 individuals** was taken, with the following results:
    *   Less than 50 kg: 140
    *   50 to 75 kg: 160
    *   Greater than 75 kg: 200

**The Question:**
Using a significance level of **α = 0.05**, can we conclude that the population distribution of weights has changed in the last 10 years?

---

### Step 1: Establish Observed vs. Expected Frequencies

First, we need to organize our data into what was actually observed and what we would have expected based on the old (2010) distribution.

*   **Observed Frequencies (from the 2020 sample):**
    *   Less than 50 kg: 140
    *   50 to 75 kg: 160
    *   Greater than 75 kg: 200

*   **Expected Frequencies (based on the 2010 percentages applied to the new sample size of 500):**
    *   **Less than 50 kg:** 20% of 500 = 0.20 * 500 = **100**
    *   **50 to 75 kg:** 30% of 500 = 0.30 * 500 = **150**
    *   **Greater than 75 kg:** 50% of 500 = 0.50 * 500 = **250**

**Summary Table:**
| Weight Category     | Observed (2020 Sample) | Expected (Based on 2010 Theory) |
|---------------------|------------------------|---------------------------------|
| Less than 50 kg     | 140                    | 100                             |
| 50 to 75 kg         | 160                    | 150                             |
| Greater than 75 kg  | 200                    | 250                             |
| **Total**           | **500**                | **500**                         |

---

### Step 2: Formulate the Hypotheses

*   **Null Hypothesis (H₀):** There is **no significant difference** between the 2020 weight distribution and the 2010 distribution. (The data meets the expectation).
*   **Alternate Hypothesis (H₁):** There **is a significant difference**. (The data does not meet the expectation).

---

### Step 3: Determine the Decision Boundary

*   **Significance Level (α):** 0.05
*   **Degrees of Freedom (df):** The formula is `df = k - 1`, where `k` is the number of categories.
    *   `df = 3 - 1 = 2`
*   **Find the Critical Value:** Using a Chi-Square table with `α = 0.05` and `df = 2`, we find the critical value. This value defines the boundary of our rejection region.
    *   **Critical χ² Value = 5.991**

**Decision Rule:**
If our calculated Chi-Square test statistic is **greater than 5.991**, we will reject the null hypothesis.

---

### Step 4: Calculate the Chi-Square (χ²) Test Statistic

We use the formula that sums the squared differences between observed and expected frequencies, scaled by the expected frequency.

**χ² = Σ [ (Observed - Expected)² / Expected ]**

1.  **For "< 50 kg":**
    `(140 - 100)² / 100 = 40² / 100 = 1600 / 100 = 16.0`

2.  **For "50 to 75 kg":**
    `(160 - 150)² / 150 = 10² / 150 = 100 / 150 ≈ 0.67`

3.  **For "> 75 kg":**
    `(200 - 250)² / 250 = (-50)² / 250 = 2500 / 250 = 10.0`

**Total χ² Statistic:**
`χ² = 16.0 + 0.67 + 10.0 = **26.67**`

---

### Step 5: Make a Conclusion

Now we compare our calculated statistic to our critical value.

*   **Calculated χ² = 26.67**
*   **Critical χ² = 5.991**

**Decision:**
Since `26.67` is much greater than `5.991`, our result falls deep into the rejection region.

**Conclusion:**
We **reject the null hypothesis (H₀)**. There is a statistically significant difference between the weight distribution of the population in 2020 compared to 2010. The sample data provides strong evidence that the distribution of weights has changed over the last ten years.



### An Introduction to ANOVA (Analysis of Variance)

ANOVA is a powerful statistical method used for hypothesis testing. While tests like the t-test and Z-test are used to compare the means of one or two groups, ANOVA extends this capability to three or more groups.

#### Definition

**Analysis of Variance (ANOVA)** is a statistical method used to **compare the means of two or more groups** to determine if there is a statistically significant difference between them.

Despite its name, ANOVA works by analyzing the **variance** within each group and the variance between the groups to make a judgment about the means. If the variation *between* the groups is much larger than the variation *within* each group, it suggests that the group means are indeed different.

---

### Key Components of ANOVA

To understand how ANOVA is structured and how data is organized for the test, we need to understand two fundamental components: **Factors** and **Levels**.

#### 1. Factors

A **factor** is the **independent variable** (or categorical variable) that you are investigating. It is the variable that you believe has an effect on the outcome you are measuring. A factor is what you use to split your data into different groups.

*   A factor can be **independent** or **dependent**, which will be discussed in more detail when covering the different types of ANOVA.

#### 2. Levels

**Levels** (also known as groups) are the different **categories or values** within a factor. Each level represents one of the groups that you are comparing.

---

### Examples to Illustrate Factors and Levels

#### Example 1: Medicine Dosage

Imagine a study testing the effectiveness of a new drug on patient recovery time.
*   **Factor:** The **Medicine** itself is the independent variable we are studying.
*   **Levels:** The different **Dosages** of the medicine are the levels.
    *   Level 1: 5 mg group
    *   Level 2: 10 mg group
    *   Level 3: 15 mg group

In this scenario, we would use ANOVA to compare the average recovery times of the three dosage groups to see if the medicine's dosage has a significant effect.

#### Example 2: Mode of Payment

Imagine a retail company wants to see if the average transaction amount differs based on the payment method used by customers.
*   **Factor:** The **Mode of Payment** is the categorical variable of interest.
*   **Levels:** The different payment methods are the levels.
    *   Level 1: Google Pay
    *   Level 2: PhonePe (UPI)
    *   Level 3: IMPS (Bank Transfer)
    *   Level 4: NEFT (Bank Transfer)

Here, ANOVA could be used to compare the mean transaction amounts across the four payment groups to determine if customers tend to spend significantly more or less using a particular method.

By understanding factors and levels, we can properly structure our data and experiments to be analyzed using the powerful techniques of ANOVA. The next steps will involve understanding the assumptions of ANOVA and the different types (e.g., One-Way ANOVA, Two-Way ANOVA).



### The Assumptions of ANOVA

For the results of an ANOVA test to be valid and reliable, the data should meet several key assumptions. Violating these assumptions can lead to inaccurate conclusions, so it is important to check them before interpreting the test results.

---

#### 1. Normality of Sampling Distribution of Means

This is a foundational assumption for many parametric tests, including ANOVA.

*   **What it means:** The distribution of the **sample means** for each group should be **normally distributed** (i.e., follow a Gaussian or bell-shaped curve).
*   **Connection to Central Limit Theorem (CLT):** This assumption is often satisfied if the sample size for each group is sufficiently large (typically n > 30). According to the CLT, even if the underlying population data is not normal, the sampling distribution of the mean will tend to be normal as the sample size increases.
*   **How to check:** For smaller samples, you can use statistical tests for normality (like the Shapiro-Wilk test) or visual methods (like a Q-Q plot).

#### 2. Absence of Outliers

ANOVA is sensitive to outliers, as they can disproportionately affect the group means and variances.

*   **What it means:** The dataset should not contain significant outliers. Outlying scores can distort the results and lead to incorrect conclusions about the differences between groups.
*   **How to handle:** Before running an ANOVA, it is crucial to inspect the data for outliers and remove them if they are found to be data entry errors or if they represent anomalies that are not relevant to the research question.
*   **How to check:** Common methods for detecting outliers include **box plots** (identifying points beyond the whiskers) and using the **Interquartile Range (IQR)** rule.

#### 3. Homogeneity of Variance (Homoscedasticity)

This assumption is critical for ensuring that the comparison between groups is fair.

*   **What it means:** The **population variance** of the dependent variable should be **equal for all groups** (levels) being compared. In other words, the spread of the data within each group should be roughly the same.
*   **Notation:** If you have three groups, this assumption can be written as: **σ₁² = σ₂² = σ₃²**.
*   **Why it's important:** ANOVA works by comparing the variance *between* groups to the variance *within* groups. If the within-group variances are very different, this comparison becomes unreliable.
*   **How to check:** This can be checked using statistical tests like **Levene's test** or the **Bartlett's test**.

#### 4. Samples are Independent and Random

This assumption relates to the design of the study and how the data was collected.

*   **What it means:**
    *   **Independence:** The observations in one group must be independent of the observations in the other groups. The selection of one participant should not influence the selection of another.
    *   **Random Selection:** The samples from each group should be drawn randomly from their respective populations to ensure they are representative and to minimize bias.
*   **Why it's important:** Independence ensures that the information from one group does not provide any information about another, which is a core requirement for making a valid comparison.

---

By verifying these four assumptions—**Normality, Absence of Outliers, Homogeneity of Variance, and Independence**—you can be more confident in the results of your ANOVA test.


### Types of ANOVA

ANOVA is a versatile statistical method, and the specific type you use depends on the structure of your data—specifically, how many factors (independent variables) you are testing and how the levels (groups) within those factors are related.

---

### 1. One-Way ANOVA

This is the simplest form of ANOVA, used when you are investigating the effect of **one factor** on a dependent variable.

*   **Definition:** A One-Way ANOVA is used when you have **one factor with at least two levels (groups)**, and these **levels are independent**.
*   **"Independent Levels" means:** The participants in one group are completely separate from the participants in the other groups. There is no connection or relationship between them.

#### Example: Medication Dosage for Headaches

**Scenario:** A doctor wants to test if different dosages of a new medication have different effects on reducing headache pain.

1.  **Split Participants:** The participants are randomly split into three separate, independent groups.
2.  **Administer Medication:**
    *   Group 1 receives a **10 mg** dosage.
    *   Group 2 receives a **20 mg** dosage.
    *   Group 3 receives a **30 mg** dosage.
3.  **Collect Data:** After a set time, all participants are asked to rate their headache pain on a scale of 1 to 10.

**Analysis:**
*   **Factor:** The **Medication Dosage** is the single independent variable.
*   **Levels:** There are three independent levels: **10 mg, 20 mg, and 30 mg**.
*   **ANOVA's Role:** A One-Way ANOVA would be used to compare the mean headache ratings across the three groups to see if there is a statistically significant difference.

---

### 2. Repeated Measures ANOVA

This type of ANOVA is used when you have one factor, but the same subjects are tested under each condition or level.

*   **Definition:** A Repeated Measures ANOVA is used when you have **one factor with at least two levels**, and the **levels are dependent**.
*   **"Dependent Levels" means:** The measurements at each level are taken from the **same group of participants**. This often occurs when measuring something over time or under different conditions.

#### Example: Running Performance Over Time

**Scenario:** A coach wants to track the running performance of a group of athletes over three days.

1.  **Test Participants:** The *same group* of athletes is tested each day.
2.  **Collect Data:**
    *   **Day 1:** Record the distance each athlete runs.
    *   **Day 2:** Record the distance the same athletes run.
    *   **Day 3:** Record the distance the same athletes run.

**Analysis:**
*   **Factor:** The **Time** (or day of running) is the single independent variable.
*   **Levels:** There are three dependent levels: **Day 1, Day 2, and Day 3**. They are dependent because the data for each level comes from the same people. A person's performance on Day 1 could influence their performance on Day 2.
*   **ANOVA's Role:** A Repeated Measures ANOVA would compare the mean distances run on each of the three days to see if performance significantly changes over time.

---

### 3. Factorial ANOVA

This is a more complex type of ANOVA that allows you to investigate the effects of **two or more factors simultaneously**.

*   **Definition:** A Factorial ANOVA is used when you have **two or more factors**, where each factor has at least two levels. The levels can be independent or dependent.
*   **Main Advantage:** This test not only analyzes the main effect of each factor individually but also checks for an **interaction effect** between the factors (i.e., does the effect of one factor depend on the level of another factor?).

#### Example: Running Performance by Time and Gender

Let's expand on the previous example by adding a second factor.

**Scenario:** A coach wants to see how running performance changes over three days and whether this change is different for males and females.

**Data Collection:**
*   A group of **male athletes** runs and has their distance recorded on Day 1, Day 2, and Day 3.
*   A separate group of **female athletes** does the same.

**Analysis:**
*   **Factor 1:** **Time** (with three dependent levels: Day 1, Day 2, Day 3).
*   **Factor 2:** **Gender** (with two independent levels: Male, Female).
*   **ANOVA's Role:** A Factorial ANOVA can answer three questions:
    1.  Is there a significant **main effect of Time**? (Does performance change over the three days, regardless of gender?)
    2.  Is there a significant **main effect of Gender**? (Is there an overall difference in performance between males and females?)
    3.  Is there a significant **interaction effect**? (Does the change in performance over the three days *depend* on whether the athlete is male or female? For example, perhaps males improve more over time than females, or vice versa.)



### Performing Hypothesis Testing in ANOVA

ANOVA compares the means of three or more groups by analyzing their variances. The core of this process is the **F-test**, which calculates an **F-statistic**. This statistic is a ratio that compares the variation *between* the groups to the variation *within* the groups.

**F-statistic = Variance between samples / Variance within samples**

If the variance between the groups is significantly larger than the variance within them, it suggests that the group means are different.

---

### One-Way ANOVA Worked Example: Headache Medication

**Scenario:**
Doctors want to test a new medication that reduces headaches. They split participants into three groups (conditions) based on dosage: 15 mg, 30 mg, and 45 mg. After taking the medication, patients are asked to rate their headache relief on a scale of 1 to 10 (where 10 is the best relief).

**The Data:**
| 15 mg Dosage | 30 mg Dosage | 45 mg Dosage |
|--------------|--------------|--------------|
| 9            | 7            | 4            |
| 8            | 6            | 3            |
| 7            | 6            | 2            |
| 8            | 7            | 3            |
| 8            | 8            | 4            |
| 9            | 6            | 3            |
| 8            | 7            | 2            |

**Question:** Using a significance level of **α = 0.05**, are there any significant differences in headache relief between the three dosage conditions?

---

### Step 1: Define the Null and Alternate Hypotheses

*   **Null Hypothesis (H₀):** There is **no difference** between the mean headache relief scores of the three dosage groups.
    *   `μ₁₅_mg = μ₃₀_mg = μ₄₅_mg`

*   **Alternate Hypothesis (H₁):** **At least one** of the mean scores is different from the others. (Note: We do *not* say `μ₁₅ ≠ μ₃₀ ≠ μ₄₅`, as this would imply *all* means are different from each other).

### Step 2: Set the Significance Level

*   **α = 0.05**, which corresponds to a 95% confidence interval.

### Step 3: Calculate the Degrees of Freedom (df)

To use the F-table and find our critical value, we need to calculate the degrees of freedom for the variation *between* groups and *within* groups.

*   **Number of groups (a):** 3
*   **Number of observations per group (n):** 7
*   **Total number of observations (N):** a * n = 3 * 7 = 21

1.  **Degrees of Freedom Between (df_between):**
    `df_between = a - 1 = 3 - 1 = 2`

2.  **Degrees of Freedom Within (df_within):**
    `df_within = N - a = 21 - 3 = 18`

3.  **Degrees of Freedom Total (df_total):**
    `df_total = N - 1 = 21 - 1 = 20`
    (Check: `df_between + df_within = 2 + 18 = 20`, which matches `df_total`).

We will use `df_between = 2` and `df_within = 18` to find our critical value.

### Step 4: Determine the Decision Boundary

We use an **F-distribution table** to find the critical value that defines our rejection region.
*   The F-distribution is a **right-skewed distribution**.
*   The rejection region (defined by α) is in the right tail.

Using the F-table with `α = 0.05`, `df₁ (numerator) = 2`, and `df₂ (denominator) = 18`, we find the critical value.
*   **Critical F-value = 3.5546**

**Decision Rule:**
If our calculated F-statistic is **greater than 3.5546**, we will **reject the null hypothesis**.

### Step 5: Calculate the F-Test Statistic

Calculating the F-statistic is a multi-step process that involves partitioning the total variance into "between-group" and "within-group" components. This is typically organized in an **ANOVA summary table**.

**ANOVA Summary Table**

| Source of Variation | Sum of Squares (SS) | df | Mean Square (MS) | F-statistic |
|---------------------|---------------------|----|------------------|-------------|
| **Between Groups**  | 98.67               | 2  | 49.34            | **86.56**   |
| **Within Groups**   | 10.29               | 18 | 0.57             |             |
| **Total**           | 108.96              | 20 |                  |             |

**Calculation Breakdown:**
1.  **Sum of Squares (SS):**
    *   `SS_between`: Measures the variation between the means of the different groups. The calculation yields **98.67**.
    ![ss between.png](<attachment:ss between.png>)
    
    *   `SS_within`: Measures the variation of the data points within each group. The calculation yields **10.29**.
    ![ss within.png](<attachment:ss within.png>)

    *   `SS_total`: The sum of the two, representing the total variation in the data, is **108.96**.

2.  **Mean Square (MS):** This is the "average" sum of squares and is our measure of variance.
    *   `MS = SS / df`
    *   `MS_between = 98.67 / 2 = 49.34` (This is the variance *between* samples).
    *   `MS_within = 10.29 / 18 ≈ 0.57` (This is the variance *within* samples).

3.  **F-statistic:** The ratio of the two variances.
    *   `F = MS_between / MS_within`
    *   `F = 49.34 / 0.57 ≈ 86.56`

---

### Step 6: Make a Final Conclusion

Now we compare our calculated F-statistic to our critical value.

*   **Calculated F-statistic = 86.56**
*   **Critical F-value = 3.5546**

**Decision:**
Since `86.56` is significantly greater than `3.5546`, the result falls in the rejection region.

**Conclusion:**
We **reject the null hypothesis (H₀)**. There is a statistically significant difference in headache relief among the three medication dosages. The data provides strong evidence that not all dosage levels have the same effect.