![image.png](attachment:image.png)

Q1. Assumptions for ANOVA and Examples of Violations:
To use ANOVA (Analysis of Variance) effectively, certain assumptions need to be met. These assumptions include:

Independence: The observations within each group are assumed to be independent of each other.
Example of Violation: In a study measuring the effect of a new drug on blood pressure, if multiple measurements are taken from the same individual and treated as independent observations, the assumption of independence is violated.

Normality: The distribution of the dependent variable should be approximately normal within each group.
Example of Violation: In an ANOVA comparing test scores between different schools, if the test scores within each school are not normally distributed, the assumption of normality is violated.

Homogeneity of Variance (Homoscedasticity): The variance of the dependent variable is assumed to be equal across all groups.
Example of Violation: In an ANOVA comparing the weight loss of different diet programs, if the variability in weight loss is much larger in one group compared to others, the assumption of homogeneity of variance is violated.

Independence of Errors: The errors or residuals (deviations of individual observations from the group means) are assumed to be independent of each other.
Example of Violation: In a study measuring the effect of exercise on heart rate, if measurements taken over time are correlated (e.g., the heart rate at one time point is likely to be related to the heart rate at the next time point), the assumption of independence of errors is violated.

Violations of these assumptions can impact the validity of ANOVA results, leading to incorrect conclusions or less reliable findings. It is important to assess the assumptions and, if violated, consider alternative analyses or corrective measures.

Q2. Types of ANOVA and Their Usage:
The three main types of ANOVA are:

One-Way ANOVA: Used when comparing the means of a continuous dependent variable across two or more independent groups.
Example: Comparing the average test scores of students across three different schools.

Two-Way ANOVA: Used when studying the effects of two categorical independent variables on a continuous dependent variable.
Example: Investigating the impact of both gender and age group on the average income of individuals.

Repeated Measures ANOVA: Used when measuring the effects of a within-subjects or repeated measures independent variable on a continuous dependent variable.
Example: Assessing the effectiveness of three different treatments administered to the same group of patients at different time points.

Each type of ANOVA is appropriate for specific research designs and hypotheses, allowing for the examination of different sources of variation and interactions.

Q3. Partitioning of Variance in ANOVA and Its Importance:
The partitioning of variance in ANOVA involves breaking down the total variability in the data into different components based on the sources of variation. It is important to understand this concept because it provides valuable insights into the relative contributions of these sources and helps interpret the results.

In ANOVA, the total sum of squares (SST) represents the total variability in the data. It is partitioned into two components: the explained sum of squares (SSE) and the residual sum of squares (SSR). The SSE represents the variability explained by the model or the group differences, while the SSR represents the unexplained variability or the random variation within each group.

By comparing the magnitude of SSE to SSR, one can assess the proportion of variance accounted for by the model. This comparison allows researchers to evaluate the strength of the group differences and determine the statistical significance of the findings.

In [None]:
#Q4
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming you have the data in a dataframe 'data' with dependent variable 'y' and group variable 'group'
model = ols('y ~ group', data=data).fit()
anova_table = sm.stats.anova_lm(model)

SST = anova_table['sum_sq']['group'] + anova_table['sum_sq']['Residual']
SSE = anova_table['sum_sq']['group']
SSR = anova_table['sum_sq']['Residual']

In [None]:
#Q5
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming you have the data in a dataframe 'data' with dependent variable 'y', and two independent variables 'x1' and 'x2'
model = ols('y ~ x1 + x2 + x1:x2', data=data).fit()
anova_table = sm.stats.anova_lm(model)

main_effect_x1 = anova_table['sum_sq']['x1']
main_effect_x2 = anova_table['sum_sq']['x2']
interaction_effect = anova_table['sum_sq']['x1:x2']


Q6. Interpretation of One-Way ANOVA Results:
In a one-way ANOVA, the F-statistic and the associated p-value are used to assess the differences between the groups. In the given scenario with an F-statistic of 5.23 and a p-value of 0.02, you can draw the following conclusions:

The F-statistic of 5.23 indicates that there is a significant difference between the means of the groups. The p-value of 0.02 suggests that the probability of observing such a difference by chance alone, assuming no true difference exists, is 0.02 or 2%. Typically, if the p-value is below a predetermined significance level (e.g., 0.05), it is considered statistically significant.

Therefore, you can conclude that there are statistically significant differences between the groups based on the data. However, the interpretation of which specific groups differ from each other requires further analysis, such as post-hoc tests, to determine the nature and direction of the differences.

Q7. Handling Missing Data in Repeated Measures ANOVA and Consequences of Different Methods:
When dealing with missing data in a repeated measures ANOVA

, several approaches can be considered:

Complete Case Analysis (Listwise Deletion): This involves excluding any participant with missing data from the analysis. It results in a reduced sample size but uses only complete cases for analysis.

Pairwise Deletion: It involves including all available data for each analysis, disregarding missing values on a case-by-case basis. It allows the use of more data but can introduce bias if the missingness is not random.

Imputation: Missing values can be replaced with estimated values using imputation methods, such as mean imputation, regression imputation, or multiple imputation. This allows for the retention of all participants in the analysis but may introduce uncertainty and potential bias if the imputation model is misspecified.

The choice of method for handling missing data depends on the nature of the missingness and assumptions made about the data. Each method has potential consequences. Complete case analysis may lead to loss of information and reduced statistical power. Pairwise deletion allows the use of more data but may lead to biased estimates if missingness is related to the outcome. Imputation methods introduce assumptions and uncertainty associated with the imputation model.

Q8. Common Post-hoc Tests after ANOVA and Their Usage:
Post-hoc tests are conducted after an ANOVA to identify specific group differences when the overall ANOVA test indicates statistical significance. Some commonly used post-hoc tests include:

Tukey's Honestly Significant Difference (HSD) test: This test compares all possible pairwise differences between group means, providing simultaneous confidence intervals for each comparison. It controls the family-wise error rate.

Bonferroni correction: This method adjusts the significance level for each pairwise comparison by dividing the desired significance level by the number of comparisons. It controls the family-wise error rate but can be conservative.

Scheffé's test: This test allows for multiple comparisons while controlling the family-wise error rate for all possible comparisons. It is more conservative than other post-hoc tests but can be useful when the number of comparisons is large.

Fisher's Least Significant Difference (LSD) test: This test compares group means using t-tests and adjusts for multiple comparisons. It is less conservative than some other methods but assumes equal variances.

The choice of post-hoc test depends on factors such as the research question, the number of comparisons, the desired control of Type I error rate, and assumptions about the data. Post-hoc tests help to determine which specific groups differ significantly from each other after finding a significant result in the ANOVA.

Example: Suppose you conducted a one-way ANOVA to compare the performance of three different teaching methods (A, B, and C) on student test scores. After obtaining a significant result, you would conduct post-hoc tests to identify the specific differences between the teaching methods.






In [None]:
#Q9
import scipy.stats as stats

# Assuming you have your weight loss data in three lists: diet_a, diet_b, and diet_c
f_statistic, p_value = stats.f_oneway(diet_a, diet_b, diet_c)

print("F statistic:", f_statistic)
print("p-value:", p_value)

The p-value tells you the probability that the differences you observed happened by chance. If p-value < 0.05, you would reject the null hypothesis (that the diets have the same mean weight loss), and conclude that there is a significant difference between the diets.

In [None]:
#Q10
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Assuming your data is in a DataFrame df, with 'time' as the response variable,
# 'software' as the first factor, and 'experience' as the second factor
model = ols('time ~ C(software) + C(experience) + C(software):C(experience)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print(anova_table)


In [None]:
#q11
# Assuming the scores for the control and experimental groups are in lists control_scores and experimental_scores
t_statistic, p_value = stats.ttest_ind(control_scores, experimental_scores)

print("t statistic:", t_statistic)
print("p-value:", p_value)

# If p-value < 0.05, perform a post-hoc test
if p_value < 0.05:
    # Use a suitable post-hoc test here 

In [None]:
#Q12
import statsmodels.api as sm
from statsmodels.stats.anova import AnovaRM

# Assuming your data is in a DataFrame df, with 'sales' as the response variable and 'store' as the factor
# 'subject' should be a variable representing each day (since measurements are repeated for each day)
anova = AnovaRM(df, 'sales', 'subject', within=['store'])
res = anova.fit()

print(res)

# If p-value < 0.05, perform a post-hoc test
if res.anova_table['Pr > F'][0] < 0.05:
    # Use a suitable post-hoc test here (like Tukey's HSD)
