In [8]:
### Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

#ANOVA (Analysis of Variance) assumes:

#1. Independence: Observations are randomly sampled and independent of each other.

#Violation: Non-independent observations (e.g., repeated measures, clustered data).

#1. Normality: Data follows a normal distribution within each group.

#Violation: Non-normal data (e.g., skewed, outliers).

#1. Homogeneity of variance (Homoscedasticity): Variances are equal across groups.

#Violation: Unequal variances (e.g., one group has much larger variance).

#1. No significant outliers: No data points are significantly different from the rest.

#Violation: Presence of influential outliers.



In [9]:
### Q2. What are the three types of ANOVA, and in what situations would each be used?

#1. One-way ANOVA:

#Used to compare means across three or more independent groups.

#Example: Comparing the average scores of students from different schools.

#1. Two-way ANOVA:

#Used to examine the interaction between two independent variables (factors) on a continuous outcome variable.

#Example: Investigating the effect of gender (male/female) and age group (young/old) on cognitive performance.

#1. Repeated-measures ANOVA:

#Used to compare means across three or more related groups (e.g., same participants measured at different time points).

In [10]:
### Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

#1. Between-group variance (SSB): Variance due to differences between group means.
#2. Within-group variance (SSW): Variance due to differences within each group.
#3. Total variance (SST): Sum of between-group and within-group variance.

#Understanding variance partitioning is crucial because:

#1. Identifies sources of variation: Helps you understand whether group differences or individual differences contribute more to the total variance.
#2. Calculates F-statistic: The ratio of between-group variance to within-group variance is used to calculate the F-statistic, which determines statistical significance.
#3. Assesses effect size: Partitioning variance helps estimate the effect size (e.g., eta-squared), which indicates the practical significance of the findings.
#4. Informs research decisions: Understanding variance partitioning can guide decisions about future studies, such as identifying factors contributing to variation and optimizing experimental design.

In [11]:
#### Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

import numpy as np

# Sample data
group1 = np.array([23, 21, 19, 24, 22])
group2 = np.array([18, 20, 19, 17, 21])
group3 = np.array([25, 24, 23, 26, 27])

# Combine data into a single array
data = np.concatenate((group1, group2, group3))

# Calculate Total Sum of Squares (SST)
SST = np.sum((data - np.mean(data))**2)

# Calculate Explained Sum of Squares (SSE)
group_means = np.array([np.mean(group1), np.mean(group2), np.mean(group3)])
SSE = np.sum((group_means - np.mean(data))**2) * len(group1)

# Calculate Residual Sum of Squares (SSR)
SSR = SST - SSE

print("SST:", SST)
print("SSE:", SSE)
print("SSR:", SSR)


SST: 124.93333333333334
SSE: 90.13333333333334
SSR: 34.8


In [16]:
####Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?
import numpy as np
from scipy.stats import f_oneway


#1. Define your data:


# Sample data
A1B1 = np.array([23, 21, 19, 24, 22])
A1B2 = np.array([18, 20, 19, 17, 21])
A2B1 = np.array([25, 24, 23, 26, 27])
A2B2 = np.array([22, 23, 24, 25, 26])


#1. Calculate the main effects:


# Main effect of factor A
F_A, p_A = f_oneway(A1B1, A1B2, A2B1, A2B2)

# Main effect of factor B
F_B, p_B = f_oneway(np.concatenate((A1B1, A2B1)), np.concatenate((A1B2, A2B2)))


#1. Calculate the interaction effect:


# Interaction effect
F_AB, p_AB = f_oneway(A1B1, A1B2, A2B1, A2B2)


#Note: The f_oneway function from scipy.stats calculates the F-statistic and p-value for a one-way ANOVA. We use it here to calculate the main effects and interaction effect.

#1. Print the results:


print("Main effect of factor A: F-statistic = {:.2f}, p-value = {:.3f}".format(F_A, p_A))
print("Main effect of factor B: F-statistic = {:.2f}, p-value = {:.3f}".format(F_B, p_B))
print("Interaction effect: F-statistic = {:.2f}, p-value = {:.3f}".format(F_AB, p_AB))



Main effect of factor A: F-statistic = 12.64, p-value = 0.000
Main effect of factor B: F-statistic = 2.44, p-value = 0.135
Interaction effect: F-statistic = 12.64, p-value = 0.000


In [17]:
### Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.What can you conclude about the differences between the groups, and how would you interpret these results?

#1. There is a statistically significant difference between the groups, as the p-value (0.02) is less than the typical significance level (0.05).
#2. The null hypothesis is rejected, which means that the assumption of equal means across all groups is not supported by the data.
#3. At least one group mean is significantly different from the others, but the F-statistic alone cannot specify which group(s) differ.




In [18]:
### Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

### In repeated measures ANOVA, missing data can be handled in several ways, each with potential consequences:

#1. Listwise deletion: Remove participants with missing data.

#Consequence: Reduced sample size, potential bias if missingness is systematic.

#1. Pairwise deletion: Remove only the specific data points that are missing.

#Consequence: Unequal sample sizes across time points, potential bias if missingness is systematic.

#1. Mean imputation: Replace missing values with the mean of the respective time point.

#Consequence: Underestimation of variance, potential bias if missingness is systematic.

#1. Regression imputation: Use a regression model to predict missing values.

#Consequence: Can be effective if missingness is random, but may introduce bias if missingness is systematic.

#1. Multiple imputation: Create multiple datasets with different imputed values, analyze each dataset, and combine results.

#Consequence: Most robust method, but can be computationally intensive and requires careful implementation.

#1. Maximum likelihood estimation: Use statistical software to estimate model parameters, accounting for missing data.

#Consequence: Most appropriate method, as it provides unbiased estimates and accurate standard errors.

#When handling missing data in repeated measures ANOVA, it's essential to:###



In [19]:
### Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.
 #1. Tukey's HSD (Honestly Significant Difference): Compare all possible pairs of means. Use when you want to conduct multiple comparisons without inflating the Type I error rate.

#2. Scheffé's test: Compare all possible contrasts (e.g., pairwise, complex contrasts). Use when you have specific hypotheses about which groups differ.

#3. Bonferroni correction: Adjust the alpha level for multiple comparisons. Use when you want to control the family-wise error rate.

#4. Dunnett's test: Compare each group to a control group. Use when you have a specific control group and want to compare others to it.

#5. Newman-Keuls test: Compare means in a stepwise manner. Use when you want to identify which groups differ without specifying hypotheses.


