### **Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.**

**Answer:**
ANOVA assumes:
1. **Independence of observations** – each sample must be collected independently.
2. **Normality** – data within each group should be approximately normally distributed.
3. **Homogeneity of variances** – all groups should have similar variances (homoscedasticity).

**Violations Examples:**
- **Non-independence**: Same subjects tested multiple times without accounting for repeated measures.
- **Non-normality**: Skewed data in one group.
- **Heteroscedasticity**: One group has much larger variance than the others.

---

### **Q2. What are the three types of ANOVA, and in what situations would each be used?**

**Answer:**
1. **One-way ANOVA** – Compares means of three or more groups based on one factor.
   - *Example*: Weight loss across three diets.
2. **Two-way ANOVA** – Examines the effect of two independent variables and their interaction.
   - *Example*: Task completion time across software and experience levels.
3. **Repeated Measures ANOVA** – Used when the same subjects are measured across different conditions.
   - *Example*: Sales of stores across 30 days.

---

### **Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?**

**Answer:**
ANOVA partitions total variance (SST) into:
- **Between-group variance (SSE)** – variance due to group differences.
- **Within-group variance (SSR)** – residual or error variance.

It’s essential to understand how much of the variation in the data is due to the factor being tested versus random error.

---

### **Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?**

**Answer:**
Using `statsmodels`:
```python
model = ols('outcome ~ group', data=df).fit()
anova_table = sm.stats.anova_lm(model)
```
You can extract:
- SSE = Between-group variance (`sum_sq` of factor)
- SSR = Residual (`sum_sq` of Residual)
- SST = SSE + SSR

---

### **Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?**

**Answer:**
```python
model = ols('response ~ C(factor1) * C(factor2)', data=df).fit()
sm.stats.anova_lm(model, typ=2)
```
This gives F-statistics and p-values for:
- Main effects: `C(factor1)` and `C(factor2)`
- Interaction effect: `C(factor1):C(factor2)`

---

### **Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude?**

**Answer:**
Since p-value < 0.05, you **reject the null hypothesis**. There is a **statistically significant difference** between the means of the groups.

---

### **Q7. In a repeated measures ANOVA, how would you handle missing data?**

**Answer:**
- **Listwise deletion** – Drop subjects with any missing value (simple but can reduce power).
- **Imputation** – Replace missing values using mean/multiple imputation (adds uncertainty).
- **Mixed models** – More flexible, can handle missing data assuming it's missing at random (MAR).

---

### **Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one?**

**Answer:**
- **Tukey's HSD** – Controls for Type I error; used when comparing all pairs.
- **Bonferroni** – Conservative; used when fewer comparisons are made.
- **Scheffé’s Test** – More flexible, used for complex comparisons.

*Example:* Use **Tukey’s** if ANOVA shows a difference among diets and you want to know which pairs differ.

---

### **Q9. Conduct a one-way ANOVA for three diets.**

**Result:**
```plaintext
F = 14.34, p = 0.000014
```

**Interpretation:**
There are **significant differences** in weight loss between at least two diets.

---

### **Q10. Conduct a two-way ANOVA for software programs and experience level.**

**Result:**
```
Main Effects:
- Program:     F = 12.87, p = 0.00016  → Significant
- Experience:  F = 14.45, p = 0.00087  → Significant

Interaction:
- Program * Experience: F = 0.08, p = 0.923 → Not significant
```

**Interpretation:**
Both program type and experience level have significant effects, but there's **no interaction effect**.

---

### **Q11. Conduct a two-sample t-test for teaching method.**

**Result:**
```
t-statistic = -4.11, p-value = 0.000081
```

**Interpretation:**
There is a **significant difference** in test scores. The new teaching method appears more effective.

---

### **Q12. Conduct a repeated measures ANOVA for sales in three stores.**

**Result:**
```
F = 5.85, p = 0.0048 → Significant
```

**Tukey’s Post-hoc Test:**
```
Store A vs B: Not significant
Store A vs C: Significant
Store B vs C: Significant
```

**Interpretation:**
Store C has **significantly higher** sales than both Store A and B.
