Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

Q2. What are the three types of ANOVA, and in what situations would each be used?

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any


significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

Q1. Assumptions for ANOVA:
   - Independence: Observations within each group are independent of each other.
   - Normality: The residuals (the differences between observed and predicted values) are normally distributed within each group.
   - Homogeneity of variance: The variances of the populations from which the samples are drawn are equal (homoscedasticity).
   
   Violations impacting validity:
   - Non-normality: If the residuals are not normally distributed, it can affect the accuracy of p-values and confidence intervals.
   - Non-homogeneity of variance: Unequal variances can inflate the Type I error rate and reduce the power of the test.
   - Lack of independence: Violation of independence assumption can lead to inaccurate estimation of p-values and confidence intervals.

Q2. Three types of ANOVA:
   - One-way ANOVA: Used when comparing the means of three or more independent groups on one dependent variable.
   - Two-way ANOVA: Used when there are two independent variables (factors), each with multiple levels, and one dependent variable.
   - Repeated measures ANOVA: Used when the same subjects are measured at multiple time points or under multiple conditions.

Q3. Partitioning of variance in ANOVA:
   ANOVA decomposes the total variance observed in the data into different components:
   - Total sum of squares (SST): Variability of the dependent variable across all observations.
   - Explained sum of squares (SSE): Variability in the dependent variable explained by the independent variable(s).
   - Residual sum of squares (SSR): Unexplained variability in the dependent variable after accounting for the effects of the independent variable(s).
   Understanding this concept is important because it helps assess the proportion of variance in the dependent variable that can be attributed to the independent variable(s).

Q4. Calculation of SST, SSE, and SSR in one-way ANOVA using Python:
```python
import numpy as np
import scipy.stats as stats

# Sample data for three groups
group1 = [10, 15, 12, 14, 18]
group2 = [8, 11, 13, 12, 9]
group3 = [16, 20, 17, 19, 15]

# Calculate means
grand_mean = np.mean(group1 + group2 + group3)
mean_group1 = np.mean(group1)
mean_group2 = np.mean(group2)
mean_group3 = np.mean(group3)

# Calculate SST
SST = np.sum((np.concatenate([group1, group2, group3]) - grand_mean) ** 2)

# Calculate SSE
SSE = np.sum((group1 - mean_group1) ** 2) + np.sum((group2 - mean_group2) ** 2) + np.sum((group3 - mean_group3) ** 2)

# Calculate SSR
SSR = SST - SSE
```

Q5. Calculation of main effects and interaction effects in two-way ANOVA using Python:
```python
# Assuming you have data in a pandas DataFrame 'df' with columns 'factor1', 'factor2', and 'dependent_variable'
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('dependent_variable ~ C(factor1) + C(factor2) + C(factor1):C(factor2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
```

Q6. Interpretation of a one-way ANOVA result:
   The obtained F-statistic of 5.23 with a p-value of 0.02 indicates that there is a statistically significant difference between the groups' means. Therefore, we reject the null hypothesis of equal means among the groups. However, further post-hoc tests would be necessary to determine which specific groups differ significantly from each other.

Q7. Handling missing data in repeated measures ANOVA:
   - Pairwise deletion: Exclude cases with missing data on a pairwise basis. This may lead to loss of power and biased estimates if data are not missing completely at random.
   - Imputation: Replace missing values with estimated values using various imputation methods such as mean imputation or regression imputation. However, this may introduce bias and underestimate variability.
   - Maximum likelihood estimation: Utilize all available data to estimate model parameters. This method is preferred when data are missing at random, but it can be computationally intensive.

Q8. Common post-hoc tests after ANOVA:
   - Tukey's Honestly Significant Difference (HSD) test: Used to identify which group means differ significantly from each other. It controls the familywise error rate.
   - Bonferroni correction: Adjusts the significance threshold for multiple comparisons to maintain the overall significance level.
   - Scheffe's method: Provides a conservative approach to pairwise comparisons while controlling the Type I error rate.

   Example: After conducting a one-way ANOVA to compare the effectiveness of three teaching methods on student performance, a post-hoc test like Tukey's HSD would be necessary to determine which specific pairs of teaching methods show significant differences in performance.

Q9. One-way ANOVA in Python:
```python
import scipy.stats as stats

# Data for three diets: A, B, and C
diet_a = [2, 4, 6, 5, 3]
diet_b = [7, 9, 8, 10, 6]
diet_c = [11, 13, 12, 15, 10]

# Conduct one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_a, diet_b, diet_c)
print("F-statistic:", f_statistic)
print("p-value:", p_value)
```

Q10. Two-way ANOVA in Python:
```python
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming you have data in a pandas DataFrame 'df' with columns 'software_program', 'experience_level', and 'task_completion_time'
model = ols('task_completion_time ~ C(software_program) * C(experience_level)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)
```

Q11. Two-sample t-test in Python:
```python
import scipy.stats as stats

# Assuming you have data for test scores in 'control_group' and 'experimental_group'
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)
print("t-statistic:", t_statistic)
print("p-value:", p_value)
```

Q12. Repeated measures ANOVA in Python:
```python
from pingouin import rm_anova

# Assuming you have data in a pandas DataFrame 'df' with columns 'store', 'day', and 'daily_sales'
# Store the data in long format for repeated measures ANOVA
rm_anova_result = rm_anova(dv='daily_sales', within='store', subject='day', data=df)
print(rm_anova_result)
```