Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

Q2. What are the three types of ANOVA, and in what situations would each be used?

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

Q1. Assumptions for using ANOVA:
- Independence: Observations within each group are independent of each other.
- Normality: The residuals (deviations from the group means) are normally distributed for each group.
- Homogeneity of variances: The variability of the residuals is the same across all groups.

Violations impacting validity:
- Violation of independence: If observations within groups are not independent, such as in a repeated measures design where the same participants are measured multiple times.
- Violation of normality: If the residuals do not follow a normal distribution, it may affect the validity of the results. Outliers or skewed distributions can be problematic.
- Violation of homogeneity of variances: If the variability of the residuals differs across groups, it can affect the validity. Unequal variances can distort the F-statistic and lead to incorrect conclusions.

Q2. The three types of ANOVA are:
- One-way ANOVA: Used when comparing means across two or more independent groups.
- Two-way ANOVA: Used when comparing means across two or more independent groups, considering two independent variables (factors) simultaneously.
- Repeated measures ANOVA: Used when comparing means within the same group under different conditions or time points.

Q3. Partitioning of variance in ANOVA refers to the division of the total variability in the data into different components: the variability explained by the factors (explained sum of squares, SSE) and the variability not explained by the factors (residual sum of squares, SSR). Understanding this concept is important as it helps quantify the proportion of variance accounted for by the factors, allowing us to assess their significance in influencing the dependent variable.

Q4. Calculation of sum of squares in a one-way ANOVA using Python:
- Total sum of squares (SST): `SST = sum((x - grand_mean)**2)` where x is the data and grand_mean is the mean of all data points.
- Explained sum of squares (SSE): `SSE = sum((group_mean - grand_mean)**2)` where group_mean is the mean of each group.
- Residual sum of squares (SSR): `SSR = sum((x - group_mean)**2)` where x is the data and group_mean is the mean of each group.

Q5. Calculation of main effects and interaction effects in a two-way ANOVA using Python depends on the specific library or approach used for analysis, such as statsmodels or scipy. The main effects represent the independent contributions of each factor, while the interaction effects describe how the factors interact with each other. The formulas and implementation may vary, so it's advisable to consult the documentation for the specific library or function being used.

Q6. An F-statistic of 5.23 and a p-value of 0.02 indicate that there are significant differences between the groups. The F-statistic measures the ratio of between-group variability to within-group variability. A larger F-statistic suggests greater between-group differences relative to within-group differences. The p-value represents the probability of obtaining the observed F-statistic (or more extreme) if the null hypothesis of no group differences were true. In this case, with a p-value of 0.02, we can reject the null hypothesis and conclude that there are significant differences between the groups.

Q7. Handling missing data in a repeated measures ANOVA can be challenging. One approach is to remove cases with missing data, but this may lead to biased results if the missingness is related to the outcome. Another option is to use techniques like imputation to estimate missing values. However, different methods of handling missing data can lead to different results, and the choice should be based on the nature of the missingness and the assumptions made. It is crucial to consider the potential consequences of different methods, such as potential bias or inflated variability, and to report any missing data handling procedures in the analysis.

Q8. Common post-hoc tests used after ANOVA include Tukey's Honestly Significant Difference (HSD), Bonferroni correction, and Scheffe's method. These tests are used to compare group means pairwise after finding a significant overall difference in the ANOVA.

- Tukey's HSD: Used to determine which specific group means are significantly different from each other.
- Bonferroni correction: Adjusts the significance level for multiple comparisons to control the family-wise error rate.
- Scheffe's method: Provides a more conservative approach for pairwise comparisons, controlling for all possible comparisons.

Post-hoc tests are necessary when the ANOVA results indicate a significant overall difference between groups but do not specify which groups differ significantly.



In [None]:
import scipy.stats as stats

# Weight loss data for each diet group
diet_A = [2.5, 3.1, 1.8, 2.3, 2.9]  # List of weight loss values for Diet A
diet_B = [1.9, 2.4, 1.5, 2.1, 2.6]  # List of weight loss values for Diet B
diet_C = [2.2, 1.7, 2.8, 2.0, 1.6]  # List of weight loss values for Diet C

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Report the results
print("F-statistic:", f_statistic)
print("p-value:", p_value)


In [None]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a DataFrame with the data
data = {
    'Software': ['A', 'B', 'C'] * 10,
    'Experience': ['Novice', 'Experienced'] * 10,
    'Time': [15.2, 14.5, 16.3, 13.8, 15.7, 15.9] * 5  # List of time values for each combination
}
df = pd.DataFrame(data)

# Perform two-way ANOVA
model = ols('Time ~ Software + Experience + Software:Experience', data=df).fit()
anova_table = sm.stats.anova_lm(model)

# Report the results
print(anova_table)


In [None]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a DataFrame with the data
data = {
    'Day': list(range(1, 31)) * 3,
    'Store': ['A'] * 30 + ['B'] * 30 + ['C'] * 30,
    'Sales': [100, 95, 105] * 10  # List of sales values for each combination
}
df = pd.DataFrame(data)

# Perform repeated measures ANOVA
model = ols('Sales ~ Store + C(Day)', data=df).fit()
anova_table = sm.stats.anova_lm(model)

# Report the results
print(anova_table)
