Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

Assumptions of ANOVA:

Normality: The residuals within each group should be approximately normally distributed.
Homogeneity of variances (Homoscedasticity): The variance of residuals should be approximately equal across all groups.
Independence: Observations within each group should be independent of each other.
Violations and Impacts:

Violation of Normality: If residuals are not normally distributed, the F-test results may be inaccurate, leading to incorrect conclusions.
Violation of Homogeneity of Variances: Unequal variances can lead to incorrect p-values and confidence intervals, affecting the validity of results.
Violation of Independence: If observations within groups are not independent, the assumption of non-independence could lead to inflated type I error rates.
Q2. What are the three types of ANOVA, and in what situations would each be used?

One-Way ANOVA: Used to compare means of three or more groups for a single categorical independent variable.
Two-Way ANOVA: Used to study the effect of two independent variables (factors) on a dependent variable.
Repeated Measures ANOVA: Used when measurements are taken on the same subjects at multiple time points or under different conditions.
Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Partitioning of Variance: ANOVA breaks down the total variance in the data into different components due to various sources of variation, such as between-groups, within-groups, and error.

Understanding this concept is important because it helps to quantify the contributions of different factors to the overall variance, allowing us to determine if the differences between group means are statistically significant.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

In [None]:
import numpy as np
import scipy.stats as stats

data = [group1_data, group2_data, group3_data, ...]  # List of arrays for each group

grand_mean = np.mean(np.concatenate(data))
sst = sum((group - grand_mean)**2 for group in data)
sse = sum(sum((x - np.mean(group))**2 for x in group) for group in data)
ssr = sst - sse


Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

You can perform a two-way ANOVA using libraries like statsmodels and then analyze the main effects and interaction effects. Here's how you might do it:

In [None]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('dependent_var ~ C(factor1) * C(factor2)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

main_effect_factor1 = anova_table['sum_sq']['C(factor1)']
main_effect_factor2 = anova_table['sum_sq']['C(factor2)']
interaction_effect = anova_table['sum_sq']['C(factor1):C(factor2)']


Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?

With a low p-value (0.02), you can conclude that there are statistically significant differences between at least some of the group means. However, the F-statistic alone doesn't tell you which groups are significantly different. You would need to perform post-hoc tests to determine which specific group(s) differ from each other.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

Handling missing data in repeated measures ANOVA can be challenging. Some methods include imputation, deletion, or using mixed-effects models. Different methods can lead to biased estimates, reduced statistical power, or incorrect conclusions if not handled appropriately.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.

Common post-hoc tests include Tukey's HSD, Bonferroni, and Scheffe tests. They are used to compare group means after obtaining a significant result in ANOVA. For example, if you find a significant difference in a one-way ANOVA, you might use Tukey's HSD to determine which specific pairs of groups have significantly different means.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.

To conduct a one-way ANOVA in Python, you can use the scipy.stats.f_oneway function. Here's an example:

In [None]:
import numpy as np
import scipy.stats as stats

diet_A = np.array([...])  # Weight loss data for Diet A
