Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.

The assumptions required to use ANOVA are:

1. Independence: The observations in each group are independent of each other. In other words, the value of one observation should not affect the value of another observation within the same group.

2. Normality: The distribution of the data within each group is normally distributed. This means that the data should follow a bell-shaped curve.

3. Homogeneity of variance: The variance of the data within each group is equal. This means that the spread of the data should be consistent across all groups.

If any of these assumptions are violated, the validity of the results can be impacted. 

Here are some examples of violations that could occur:

Violation of independence: This can occur if the observations within each group are not truly independent. For example, if participants in one group influence each other's responses, this could violate the independence assumption.

Violation of normality: This can occur if the data within each group is not normally distributed. For example, if the data is skewed or has outliers, this could violate the normality assumption.

Violation of homogeneity of variance: This can occur if the variance of the data within each group is not equal. For example, if one group has a much larger variance than the others, this could violate the homogeneity of variance assumption.

When these assumptions are violated, it can affect the accuracy and reliability of the ANOVA results. For example, if the assumption of normality is violated, the ANOVA results may not accurately reflect the true differences between the groups, and may lead to false conclusions. Similarly, if the assumption of independence is violated, it can lead to biased results. Therefore, it is important to carefully check the assumptions before using ANOVA, and to consider alternative methods if any assumptions are violated.

Q2. What are the three types of ANOVA, and in what situations would each be used?

The three types of ANOVA are:
1. One-way ANOVA: It is used when there is one independent variable (also called a factor) with three or more levels or groups, and one dependent variable. For example, a one-way ANOVA might be used to compare the mean scores of students from different schools on a math test. The independent variable would be school (with three or more levels, such as School A, School B, and School C), and the dependent variable would be math test score.

2. Repeated measures ANOVA: Repeated measures ANOVA is used when you want to test for differences in means across multiple time points or conditions for the same group of participants. For example, a repeated measures ANOVA might be used to test for differences in blood pressure at different time points before and after an exercise program.

3. Factorial ANOVA: Factorial ANOVA is used when there are two or more independent variables, and one dependent variable. For example, a factorial ANOVA might be used to test for the effect of both gender and age on job satisfaction. In this case, gender would be one independent variable (with two levels: male and female), and age would be the other independent variable (with three levels: 20s, 30s, and 40s). The dependent variable would be job satisfaction.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?

The partitioning of variance in ANOVA refers to the process of breaking down the total variance in the data into different sources of variation. In ANOVA, the total variance in the data is partitioned into two types of variance: 

1. between-group variance: It represents the variation between the means of the different groups being compared, and is calculated as the sum of the squared differences between the group means and the overall mean, weighted by the number of observations in each group.. 
2. within-group variance: It represents the variation within each group, and is calculated as the sum of the squared differences between each observation and its respective group mean.

Partitioning of variance in ANOVA, helps us to quantify the amount of variation in the data that can be attributed to differences between the groups being compared, and the amount that is due to individual differences within each group. This information can be useful in interpreting the results of the analysis and in identifying potential sources of variation that may be contributing to the differences between the groups.

Also, it provides a basis for calculating the F-statistic, which is used to test the null hypothesis that there are no significant differences between the group means. The F-statistic is calculated as the ratio of the between-group variance to the within-group variance, and is used to determine whether the observed differences between the groups are larger than what would be expected due to chance.

Finally, understanding the partitioning of variance can help us to identify potential sources of error or bias in the data, which can impact the validity of the results. For example, if there is a large amount of within-group variance relative to the between-group variance, this could indicate that there is a significant amount of individual variation that is not accounted for in the analysis, or that there are other factors that are influencing the dependent variable that are not being considered in the study design.

In summary, the partitioning of variance in ANOVA is an important concept that helps us to understand the sources of variation in the data and how these sources of variation contribute to the observed differences between the groups. It provides a basis for calculating the F-statistic and testing the null hypothesis, and can help us to identify potential sources of error or bias in the data.

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?

In [10]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# create example data
data = {'group': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
        'score': [80, 85, 90, 75, 78, 80, 70, 72, 76]}
df = pd.DataFrame(data)

# fit ANOVA model
model = ols('score ~ group', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# extract sums of squares
SST = anova_table['sum_sq'][0]
SSE = anova_table['sum_sq'][1]
SSR = SST - SSE

print('SST:', round(SST,2))
print('SSE:', round(SSE,2))
print('SSR:', round(SSR,2))

SST: 230.89
SSE: 81.33
SSR: 149.56


Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

In [24]:
# Importing libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a dataframe
df = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15), 'Watering': np.repeat(['daily', 'weekly'], 15),
                        'height': [14, 16, 15, 15, 16, 13, 12, 11,
                                    14, 15, 16, 16, 17, 18, 14, 13,
                                    14, 14, 14, 15, 16, 16, 17, 18,
                                    14, 13, 14, 14, 14, 15]})


# Performing two-way ANOVA
model = ols('height ~ C(Fertilizer) + C(Watering) + C(Fertilizer):C(Watering)', data=df).fit()
result = sm.stats.anova_lm(model, typ=2)

# Print the result
print(result)


                                 sum_sq    df             F    PR(>F)
C(Fertilizer)              4.390651e-13   1.0  1.589719e-13  1.000000
C(Watering)                8.630952e-02   1.0  3.125000e-02  0.860956
C(Fertilizer):C(Watering)  3.333333e-02   1.0  1.206897e-02  0.913305
Residual                   7.733333e+01  28.0           NaN       NaN


Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?

The F-statistic measures the ratio of variation between groups to variation within groups. A larger F-statistic indicates a greater amount of variation between the groups relative to the variation within the groups. In this case, the F-statistic of 5.23 suggests that there is a significant difference between the groups.

The p-value of 0.02 indicates the probability of observing such a large F-statistic by chance alone. Typically, a p-value below 0.05 is considered statistically significant, which means there is less than a 5% chance of observing the result by chance alone. In this case, the p-value of 0.02 suggests that there is a statistically significant difference between the groups.

To interpret these results, you can conclude that there is a significant difference between the groups in terms of the variable being measured.

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?

In a repeated measures ANOVA, missing data can be handled in several ways. Here are some common methods:

1. Pairwise deletion: This method involves removing cases with missing data on a variable-by-variable basis. For example, if a participant has missing data on one of the measures, their data can still be included in the analysis for the other measures.

2. Listwise deletion: This method involves removing cases with missing data on any of the variables.

3. Imputation: This method involves replacing missing values with estimated values based on the available data. There are several methods of imputation, including mean imputation, regression imputation, and multiple imputation.

The potential consequences of using different methods to handle missing data are that it can affect the power, precision, and bias of the results. Pairwise deletion can reduce power and precision, while listwise deletion can reduce power even more. Imputation can increase power and precision, but it may introduce bias if the imputation model is misspecified or the assumptions about the missing data are not met.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Post-hoc tests are used to determine which groups in an ANOVA are significantly different from each other after a statistically significant omnibus ANOVA test result. Here are some common post-hoc tests used after ANOVA:

1. Tukey's HSD: This test compares all possible pairs of means and calculates a single p-value for each comparison. This test is appropriate when there are more than two groups and the sample sizes are equal. It is often considered the most conservative of the post-hoc tests.

2. Bonferroni correction: This test adjusts the alpha level for multiple comparisons. This test is appropriate when there are more than two groups and the sample sizes are equal. It is more conservative than Tukey's HSD and is often used to control for type I error rates.

3. Scheffe's method: This test is similar to Bonferroni correction, but it is more powerful and less conservative. It is appropriate when the sample sizes are unequal or the variances are unequal.

4. Dunnett's test: This test compares each group mean to a control group mean. It is appropriate when there is a control group and the other groups are being compared to the control group.

5. Games-Howell test: This test is similar to Tukey's HSD, but it does not assume equal variances or sample sizes. It is appropriate when the variances and sample sizes are unequal.

For example, suppose a researcher is conducting a study on the effectiveness of three different therapies for treating depression. They conduct an ANOVA and find a statistically significant difference between the three groups. To determine which therapies are significantly different from each other, the researcher might use a post-hoc test such as Tukey's HSD or Bonferroni correction.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

In [29]:
import numpy as np
from scipy.stats import f_oneway

# Create sample data
diet_a = np.random.normal(5, 2, 50)
diet_b = np.random.normal(4, 2, 50)
diet_c = np.random.normal(6, 2, 50)

# Conduct one-way ANOVA
f_statistic, p_value = f_oneway(diet_a, diet_b, diet_c)

# Report results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

F-statistic: 8.607816506474096
p-value: 0.0002916476260397513


For the above randomly generated data we can conclude the results as follows:

The F-statistic is 8.60 and the p-value is 0.0003, which is less than the alpha level of 0.05 typically used for hypothesis testing. This indicates that there is a statistically significant difference between the mean weight loss of the three diets.
We can conclude that at least one of the diets (A, B, or C) has a significantly different mean weight loss than the others.

Q10. A company wants to know if there are any significant differences in the average time it takes to
complete a task using three different software programs: Program A, Program B, and Program C. They
randomly assign 30 employees to one of the programs and record the time it takes each employee to
complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or
interaction effects between the software programs and employee experience level (novice vs.
experienced). Report the F-statistics and p-values, and interpret the results.

In [63]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create sample data
data = pd.DataFrame({
    'Software': ['A', 'A', 'B', 'B', 'C', 'C'] * 5,
    'Experience': np.random.choice(np.array(["Novice", "Experienced"]), size=30),
    'Time': np.random.randint(low=15, high=61, size=30)})
# Conduct two-way ANOVA
model = ols('Time ~ C(Software) + C(Experience) + C(Software):C(Experience)', data=data).fit()
table = sm.stats.anova_lm(model, typ=2)

# Report results
print(table)

                                sum_sq    df         F    PR(>F)
C(Software)                 580.384286   2.0  1.353281  0.277408
C(Experience)               177.005714   1.0  0.825448  0.372626
C(Software):C(Experience)   367.830000   2.0  0.857668  0.436745
Residual                   5146.464286  24.0       NaN       NaN


The first row shows the results for the main effect of "Software". The sum of squares for "Software" is 580.384, and there are 2 degrees of freedom associated with this factor. The F-statistic is 1.353 and the p-value is 0.277, which indicates that there is not enough evidence to reject the null hypothesis of no significant difference between the mean times to complete the task for the three software programs.

The second row shows the results for the main effect of "Experience". The sum of squares for "Experience" is 177.006, and there is 1 degree of freedom associated with this factor. The F-statistic is 0.825 and the p-value is 0.373, which indicates that there is not enough evidence to reject the null hypothesis of no significant difference between the mean times to complete the task for the novice and experienced employees.

The third row shows the results for the interaction effect between "Software" and "Experience". The sum of squares for the interaction effect is 367.830, and there are 2 degrees of freedom associated with this factor. The F-statistic is 0.858 and the p-value is 0.437, which indicates that there is not enough evidence to reject the null hypothesis of no significant interaction effect between the two factors.

The last row shows the results for the residual or error term, which is the variation in the data that is not explained by the factors or their interaction. The sum of squares for the residual is 5146.464, and there are 24 degrees of freedom associated with this term.

Overall, based on these results, there is no evidence of a significant difference in the mean times to complete the task between the three software programs or between novice and experienced employees, and there is no significant interaction effect between the two factors.

Q11. An educational researcher is interested in whether a new teaching method improves student test
scores. They randomly assign 100 students to either the control group (traditional teaching method) or the
experimental group (new teaching method) and administer a test at the end of the semester. Conduct a
two-sample t-test using Python to determine if there are any significant differences in test scores
between the two groups. If the results are significant, follow up with a post-hoc test to determine which
group(s) differ significantly from each other.

In [67]:
import numpy as np
from scipy.stats import ttest_ind

# generate data
np.random.seed(123)
control_scores = np.random.normal(75, 10, size=100)
experimental_scores = np.random.normal(80, 10, size=100)

# conduct t-test
t_stat, p_val = ttest_ind(control_scores, experimental_scores)

# report results
print("t-statistic: {:.3f}".format(t_stat))
print("p-value: {:.3f}".format(p_val))

t-statistic: -3.032
p-value: 0.003


Assuming a significance level of 0.05, as the p-value is less than 0.05, we reject the null hypothesis and conclude that there is a significant difference in test scores between the two groups.

In [68]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# combine scores and group labels
all_scores = np.concatenate((control_scores, experimental_scores))
group_labels = np.concatenate((np.repeat("Control", 100), np.repeat("Experimental", 100)))

# perform Tukey's HSD test
tukey_results = pairwise_tukeyhsd(all_scores, group_labels, alpha=0.05)

# print results
print(tukey_results)

   Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj  lower  upper  reject
---------------------------------------------------------
Control Experimental   4.5336 0.0028 1.5846 7.4826   True
---------------------------------------------------------


Q12. A researcher wants to know if there are any significant differences in the average daily sales of three
retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store
on those days. Conduct a repeated measures ANOVA using Python to determine if there are any
significant differences in sales between the three stores. If the results are significant, follow up with a post-
hoc test to determine which store(s) differ significantly from each other.

In [90]:
import numpy as np
import pandas as pd
from scipy import stats

# Generate random sales data for Store A, Store B, and Store C
np.random.seed(125)
store_a_sales = np.random.randint(1000, 5000, 30)
store_b_sales = np.random.randint(1500, 4500, 30)
store_c_sales = np.random.randint(2000, 4000, 30)

# Create a pandas dataframe
df = pd.DataFrame({'Store A': store_a_sales, 'Store B': store_b_sales, 'Store C': store_c_sales})
df.head()

Unnamed: 0,Store A,Store B,Store C
0,4005,2515,3925
1,3205,3128,3366
2,2250,4186,2792
3,4927,2769,2750
4,2279,2143,2859


In [91]:
# Reshape the dataframe
df = pd.melt(df.reset_index(), id_vars=['index'], value_vars=['Store A', 'Store B', 'Store C'])
df.columns = ['Day', 'Store', 'Sales']
df.head()

Unnamed: 0,Day,Store,Sales
0,0,Store A,4005
1,1,Store A,3205
2,2,Store A,2250
3,3,Store A,4927
4,4,Store A,2279


In [92]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Fit the repeated measures ANOVA model
model = ols('Sales ~ C(Store, Sum)', data=df).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

# Print the ANOVA table
print(anova_table)

                     sum_sq    df         F    PR(>F)
C(Store, Sum)  1.231069e+04   2.0  0.008232  0.991802
Residual       6.505191e+07  87.0       NaN       NaN


The results of the repeated measures ANOVA show a non-significant effect of the store on sales, with an F-statistic of 0.008 and a p-value of 0.992. This suggests that there is no significant difference in the average daily sales between the three stores.

Since the ANOVA did not yield a significant result, there is no need for a post-hoc test to determine which stores differ significantly from each other.