Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

Q2. What are the three types of ANOVA, and in what situations would each be used?

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any
significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

# Q1. Assumptions Required to Use ANOVA and Violations
ANOVA (Analysis of Variance) has several key assumptions that must be met for the results to be valid:

Independence: The observations in each group must be independent of each other. This means the value of one observation should not influence or be influenced by another observation.

Violation: If the data are collected from repeated measurements on the same subjects or from clusters (e.g., students within classrooms), the results may be invalid.
Normality: The data in each group should follow a normal distribution. This can be checked using normality tests like Shapiro-Wilk.

Violation: If the data is heavily skewed or contains outliers, it could impact the F-test used in ANOVA, leading to incorrect conclusions.
Homogeneity of Variances: The variances of the groups being compared should be approximately equal. This is also known as homoscedasticity.

Violation: If the variances are unequal (heteroscedasticity), it can distort the ANOVA results, making them less reliable. Levene’s test or Bartlett’s test is used to check this assumption.


# Q2. Types of ANOVA
One-Way ANOVA: Used when you have one categorical independent variable with more than two levels (groups), and you want to test if there is a difference in the means of the groups.

Example: Comparing the test scores of students from three different teaching methods.
Two-Way ANOVA: Used when you have two categorical independent variables and want to test for the main effects of each factor, as well as the interaction effect between them.

Example: Comparing the effects of different teaching methods and student gender on test scores.
Repeated Measures ANOVA: Used when the same subjects are measured multiple times under different conditions, or when there are repeated observations across time or conditions.

Example: Testing the effectiveness of different diets on weight loss over time with the same participants.


# Q3. Partitioning of Variance in ANOVA
In ANOVA, the total variance in the data is partitioned into two components:

Between-group variance: Measures the variability between the means of different groups.
Within-group variance (residual variance): Measures the variability within each group.
This partitioning is important because:

If the between-group variance is significantly larger than the within-group variance, it suggests that the group means are different from each other.
The F-statistic is the ratio of between-group variance to within-group variance, and a large value indicates that at least one group mean is different.

In [None]:
# 4. 

import numpy as np
import scipy.stats as stats

# Example data: 3 groups
group1 = [23, 21, 25, 20, 22]
group2 = [30, 32, 29, 28, 31]
group3 = [35, 37, 33, 36, 38]

# Combine groups into one array for ANOVA
data = [group1, group2, group3]

# Perform one-way ANOVA
F_statistic, p_value = stats.f_oneway(*data)

# Calculate SST, SSE, SSR manually
overall_mean = np.mean(np.concatenate(data))
sst = np.sum((np.concatenate(data) - overall_mean) ** 2)

# Between-group mean
means = [np.mean(group) for group in data]
sse = np.sum([len(group) * (mean - overall_mean) ** 2 for group, mean in zip(data, means)])

# Residual sum of squares (SSR)
ssr = sst - sse

sst, sse, ssr


In [None]:
# Q5. Calculating Main Effects and Interaction Effects in Two-Way ANOVA Using Python

import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

# Example data
data = {
    'Program': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C'],
    'Experience': ['Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced'],
    'Time': [20, 18, 22, 19, 25, 23, 21, 20, 23, 22, 24, 23]
}

df = pd.DataFrame(data)

# Fit the model
model = ols('Time ~ Program * Experience', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

anova_table


# # Q6. One-Way ANOVA Results Interpretation (F-statistic = 5.23, p-value = 0.02)
Since the p-value (0.02) is less than the significance level (0.05), you can reject the null hypothesis. This means there are significant differences between the means of the groups. The F-statistic of 5.23 indicates the ratio of between-group variance to within-group variance, showing a substantial difference between the groups.

# Q7. Handling Missing Data in Repeated Measures ANOVA
In repeated measures ANOVA, missing data can be handled using:

Listwise deletion: Removing all subjects with any missing data.
Imputation: Filling in missing values based on the mean, median, or other estimation methods.
The consequences of listwise deletion include reducing sample size, which might reduce power. Imputation can introduce bias if not done carefully.

# Q8. Post-Hoc Tests After ANOVA
Common post-hoc tests include:

Tukey’s HSD (Honestly Significant Difference): Used when comparing all pairwise group differences.
Bonferroni correction: Adjusts p-values to control the Type I error rate when conducting multiple tests.
Example of when to use:

After performing a one-way ANOVA and finding a significant difference, use a post-hoc test to determine which specific groups differ.

In [None]:
# Q9. One-Way ANOVA for Weight Loss with Three Diets

import numpy as np
import scipy.stats as stats

# Example data for weight loss from three diets
diet_A = [2, 3, 1, 4, 2]
diet_B = [3, 4, 5, 4, 3]
diet_C = [5, 6, 7, 5, 6]

# Perform one-way ANOVA
F_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

F_statistic, p_value


In [None]:
# Q10. Two-Way ANOVA for Software Programs and Experience Level

import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Example data
data = {
    'Software': ['A', 'A', 'B', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'C', 'C'],
    'Experience': ['Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced', 'Novice', 'Experienced'],
    'Time': [20, 18, 22, 19, 25, 23, 21, 20, 23, 22, 24, 23]
}

df = pd.DataFrame(data)

# Fit the model
model = ols('Time ~ Software * Experience', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

anova_table



In [None]:
# Q11. Two-Sample T-Test for Test Scores

import scipy.stats as stats

# Example data for test scores
control_group = [70, 75, 80, 68, 77]
experimental_group = [85, 90, 88, 84, 89]

# Two-sample t-test
t_stat, p_value = stats.ttest_ind(control_group, experimental_group)

t_stat, p_value


In [None]:
# Q12. Repeated Measures ANOVA for Sales

import numpy as np
import statsmodels.api as sm
from statsmodels.formula.api import ols
import pandas as pd

# Example sales data for 30 days for 3 stores
data = {
    'Store': ['A', 'B', 'C'] * 30,
    'Sales': np.random.normal(100, 10, 90),  # Replace with real sales data
    'Day': np.repeat(np.arange(1, 31), 3)
}

df = pd.DataFrame(data)

# Fit the model for repeated measures ANOVA
model = ols('Sales ~ C(Store) + C(Day) + C(Store):C(Day)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

anova_table
