In [2]:
# Q1. Assumptions required to use ANOVA:

# Independence: The observations within each group are independent of each other.
# Normality: The populations from which the samples are drawn should follow a normal distribution.
# Homogeneity of variances: The variability of the dependent variable should be similar across all groups.
# Violations that could impact the validity of ANOVA results:

# Violation of independence: If observations within groups are not independent, such as in a repeated measures design where the same participants are measured multiple times, the assumption is violated.
# Violation of normality: If the populations do not follow a normal distribution, it can affect the validity of ANOVA results. However, ANOVA is known to be robust to moderate departures from normality if the group sizes are large enough.
# Violation of homogeneity of variances: If the variability of the dependent variable differs significantly across groups, it can impact the validity of ANOVA results. This violation is known as heteroscedasticity.


# Q2. The three types of ANOVA are:

# One-way ANOVA: Used when comparing the means of three or more independent groups on a single dependent variable.
# Two-way ANOVA: Used when comparing the means of a single dependent variable across two independent variables(factors). It examines the main effects of each factor and their interaction effect.
# Repeated measures ANOVA: Used when comparing the means of a single dependent variable measured on the same participants under different conditions. It accounts for within-subject correlation and analyzes the effect of the within-subject factor(s).


# Q3. Partitioning of variance in ANOVA:

# The partitioning of variance refers to the decomposition of the total variation in the data into different components. In ANOVA, the total variation is partitioned into two main components:

# Between-group variation(explained variation): It represents the variation in the dependent variable that can be explained by the group differences or factors.
# Within-group variation(residual variation): It represents the unexplained or random variation within each group, which is not attributable to the group differences.
# Understanding this concept is important because it allows us to quantify the proportion of the total variation in the data that can be explained by the factors being studied. By comparing the explained variation to the residual variation, ANOVA determines whether the group differences are statistically significant.


In [4]:
#Q4
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assume you have your data stored in a Pandas DataFrame called 'df'
model = ols('dependent_variable ~ group_variable', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract the sum of squares values
SST = anova_table['sum_sq']['group_variable']
SSE = anova_table['sum_sq']['Residual']
SSR = SST - SSE


In [7]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assume you have your data stored in a Pandas DataFrame called 'df'
model = ols(
    'dependent_variable ~ factor1 + factor2 + factor1:factor2', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract the sum of squares values
main_effect_factor1 = anova_table['sum_sq']['factor1']
main_effect_factor2 = anova_table['sum_sq']['factor2']
interaction_effect = anova_table['sum_sq']['factor1:factor2']


In [8]:
# Q6. Interpretation of one-way ANOVA results:

# Given an F-statistic of 5.23 and a p-value of 0.02 in a one-way ANOVA, we can conclude that there is a statistically significant difference between the groups being compared. The F-statistic indicates the ratio of explained variation to unexplained variation, with larger values suggesting a stronger effect. The p-value of 0.02 indicates that the probability of obtaining such a significant difference by chance alone is only 2 % . Therefore, we reject the null hypothesis of no group differences and conclude that there are significant differences between at least some of the groups.

# Q7. Handling missing data in repeated measures ANOVA:

# When dealing with missing data in repeated measures ANOVA, there are a few possible approaches:

# Complete-case analysis: Exclude participants with missing data from the analysis. This approach reduces the sample size but avoids imputation. However, it may introduce bias if missingness is related to the outcome or other variables of interest.

# Pairwise deletion: Analyze only the available data for each pair of time points. This approach maximizes the use of available data but may yield biased estimates if missingness is related to the outcome.

# Imputation: Fill in missing values using various imputation methods, such as mean imputation, regression imputation, or multiple imputation. Imputation allows for the inclusion of all participants but introduces uncertainty and assumptions about the missing values.

# The consequences of using different methods to handle missing data can vary. Complete-case analysis and pairwise deletion may lead to biased results if missingness is related to the outcome or other variables. Imputation methods can introduce uncertainty and assumptions about the missing values, which may affect the precision and validity of the results.

# Q8. Common post-hoc tests used after ANOVA:

# Tukey's Honestly Significant Difference(HSD): This test compares all possible pairs of group means and controls the familywise error rate. It is often used when conducting multiple pairwise comparisons.

# Bonferroni correction: It adjusts the significance level for each pairwise comparison to maintain the familywise error rate. It is more conservative than Tukey's HSD and is suitable when conducting a large number of pairwise comparisons.

# Scheffe's test: This test is more conservative than Tukey's HSD and allows for complex comparisons, such as contrasts or comparisons between subsets of groups.

# The choice of post-hoc test depends on the specific research question and the desired balance between the risk of Type I errors and the power to detect differences. Post-hoc tests are necessary when the overall ANOVA results are significant and there are three or more groups, to identify which specific groups differ significantly from each other.


In [10]:
import scipy.stats as stats

# Assume the weight loss data is stored in three separate lists: diet_A,diet_B, diet_C

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print the results
print("F-statistic:", f_statistic)
print("p-value:", p_value)
