Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

Q2. What are the three types of ANOVA, and in what situations would each be used?


Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?


Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?


Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?


Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?


Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?


Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.


Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.


Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.


Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.


Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any

significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

In [1]:
import numpy as np
import scipy.stats as stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.formula.api import ols
import statsmodels.api as sm

# Q1: Assumptions of ANOVA (No code needed)

# Q2: Three types of ANOVA (No code needed)

# Q3: Partitioning variance in ANOVA (No code needed)

# Q4: Calculate SST, SSE, SSR in one-way ANOVA

def calculate_anova_sumsquares(data, dependent_var, independent_var):
    model = ols(f'{dependent_var} ~ C({independent_var})', data=data).fit()
    anova_table = sm.stats.anova_lm(model, typ=2)
    return anova_table

# Q5: Calculate main effects and interaction effects in two-way ANOVA

def two_way_anova(data, dependent_var, factor1, factor2):
    model = ols(f'{dependent_var} ~ C({factor1}) + C({factor2}) + C({factor1}):C({factor2})', data=data).fit()
    anova_table = sm.stats.anova_lm(model, typ=2)
    return anova_table

# Q6: Interpretation of ANOVA results (No code needed)

# Q7: Handling missing data in repeated measures ANOVA (No code needed)

# Q8: Common post-hoc tests (Tukey HSD)

def posthoc_tukey(data, dependent_var, independent_var):
    from statsmodels.stats.multicomp import pairwise_tukeyhsd
    tukey = pairwise_tukeyhsd(endog=data[dependent_var], groups=data[independent_var], alpha=0.05)
    return tukey.summary()

# Q9: One-way ANOVA for diet comparison

def one_way_anova(data, dependent_var, independent_var):
    model = ols(f'{dependent_var} ~ C({independent_var})', data=data).fit()
    anova_table = sm.stats.anova_lm(model, typ=2)
    return anova_table

# Q10: Two-way ANOVA for software performance comparison

def two_way_anova_interaction(data, dependent_var, factor1, factor2):
    model = ols(f'{dependent_var} ~ C({factor1}) * C({factor2})', data=data).fit()
    anova_table = sm.stats.anova_lm(model, typ=2)
    return anova_table

# Q11: Two-sample t-test for new teaching method

def two_sample_ttest(data, group_col, value_col):
    group1 = data[data[group_col] == data[group_col].unique()[0]][value_col]
    group2 = data[data[group_col] == data[group_col].unique()[1]][value_col]
    return stats.ttest_ind(group1, group2)

# Q12: Repeated measures ANOVA for store sales

def repeated_measures_anova(data, subject_col, time_col, value_col):
    model = ols(f'{value_col} ~ C({time_col}) + C({subject_col})', data=data).fit()
    anova_table = sm.stats.anova_lm(model, typ=2)
    return anova_table
