Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact
the validity of the results.


There are three primary assumptions in ANOVA:

The responses for each factor level have a normal population distribution. <br>
These distributions have the same variance.<br>
The data are independent.<br>

Potential assumption violations include:

Implicit factors: lack of independence within a sample<br>
Lack of independence: lack of independence between samples<br>
Outliers: apparent nonnormality by a few data points<br>
Nonnormality: nonnormality of entire samples<br>
Unequal population variances<br>
Patterns in plots of data: detecting violation assumptions graphically<br>
Special problems with small sample sizes<br>
Special problems with unbalanced sample sizes<br>
Multiple comparisons: effects of assumption violations on multiple comparison tests

Q2. What are the three types of ANOVA, and in what situations would each be used?


One-Way ANOVA

A one-way ANOVA has just one independent variable. For example, difference in IQ can be assessed by Country, and County can have 2, 20, or more different categories to compare.

Two-Way ANOVA

A two-way ANOVA (are also called factorial ANOVA) refers to an ANOVA using two independent variables. Expanding the example above, a 2-way ANOVA can examine differences in IQ scores (the dependent variable) by Country (independent variable 1) and Gender (independent variable 2). Two-way ANOVA can be used to examine the interaction between the two independent variables. Interactions indicate that differences are not uniform across all categories of the independent variables. For example, females may have higher IQ scores overall compared to males, but this difference could be greater (or less) in European countries compared to North American countries.

N-Way ANOVA

A researcher can also use more than two independent variables, and this is an n-way ANOVA (with n being the number of independent variables you have). For example, potential differences in IQ scores can be examined by Country, Gender, Age group, Ethnicity, etc, simultaneously.

Q3. What is the partitioning of variance in ANOVA, and why is it important to understand this concept?


Partitioning of variance in ANOVA refers to hypothesis testing and it is as follows:

Null hypothesis (H0) : σ1²=σ2²=σ3²= .......σk² (k = number of levels) 

Alternate hypothesis (Ha): Atleast one of the sample mean is not equal

The test statistic in ANOVA is the F test:

F = (Variance between samples) / (Variance within samples)

Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual
sum of squares (SSR) in a one-way ANOVA using Python?


In [1]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a sample dataset with a categorical variable and a continuous variable
df = pd.DataFrame({'group': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'score': [10, 15, 20, 12, 18, 25]})

# Fit a one-way ANOVA model
model = ols('score ~ group', data=df).fit()

# Calculate the total sum of squares (SST)
sst = sum((df['score'] - df['score'].mean()) ** 2)

# Calculate the explained sum of squares (SSE)
sse = sum((model.predict(df) - df['score'].mean()) ** 2)

# Calculate the residual sum of squares (SSR)
ssr = sum((df['score'] - model.predict(df)) ** 2)

print('SST:', sst)
print('SSE:', sse)
print('SSR:', ssr)

SST: 151.33333333333331
SSE: 151.3333333333332
SSR: 6.626431603856499e-29


Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?


In [2]:
import seaborn as sns
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Load the tips dataset from seaborn
tips = sns.load_dataset("tips")

# Create a formula for the ANOVA model
formula = 'total_bill ~ sex + time + sex:time'

# Fit the ANOVA model
model = ols(formula, data=tips).fit()

# Perform the ANOVA and print the results table
anova_table = sm.stats.anova_lm(model, typ=2)
print(anova_table)

                sum_sq     df         F    PR(>F)
sex         231.460310    1.0  3.022685  0.083390
time        473.011803    1.0  6.177153  0.013623
sex:time      3.371906    1.0  0.044034  0.833968
Residual  18377.855945  240.0       NaN       NaN


Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02.
What can you conclude about the differences between the groups, and how would you interpret these
results?


F stat is 5.23 and p is 0.02 so there is significant differnce between the means as p is less than 0.05

Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential
consequences of using different methods to handle missing data?


Handling missing data in a repeated measures ANOVA can be challenging, as the data is often dependent and the same individuals are measured across multiple time points or conditions. There are different methods to handle missing data, including:

Listwise deletion: This method involves excluding all individuals who have missing data on any of the variables. While this method is straightforward, it can lead to a reduction in sample size and loss of statistical power.

Mean imputation: This method involves replacing missing values with the mean value for that variable. While this method is easy to implement, it can lead to biased estimates of the means and variances.

Maximum likelihood estimation: This method involves using statistical models to estimate the missing values based on the available data. This method can provide unbiased estimates of the means and variances, but it requires a sophisticated statistical model and may not work well for small sample sizes.

The consequences of using different methods to handle missing data can vary depending on the amount and pattern of missing data, as well as the method used. In general, using listwise deletion can lead to a reduction in statistical power, while mean imputation can lead to biased estimates of the means and variances. Maximum likelihood estimation can provide unbiased estimates of the means and variances, but it may not work well for small sample sizes or when the missing data is not missing at random. It is important to carefully consider the amount and pattern of missing data and choose a method that is appropriate for the specific research question and data at hand.

Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide
an example of a situation where a post-hoc test might be necessary.

Post-hoc tests are used to determine which groups differ significantly from each other after obtaining a significant result from an ANOVA. Some common post-hoc tests include Tukey's HSD, Bonferroni correction, Scheffé's method, and Dunnett's test.

Tukey's HSD: This test is used to compare all possible pairs of groups to determine which pairs have a significant difference. It controls the family-wise error rate (FWER), which is the probability of making at least one type I error among all the comparisons. Tukey's HSD is commonly used when there are equal sample sizes and variances across all groups.

Bonferroni correction: This test is used to control the FWER by dividing the significance level by the number of pairwise comparisons. For example, if there are four groups, and the significance level is set to 0.05, then the adjusted significance level would be 0.05/6 = 0.0083, since there are six pairwise comparisons. Bonferroni correction is a conservative method, and it is commonly used when there are unequal sample sizes or variances across groups.

Scheffé's method: This test is also used to control the FWER, but it is less conservative than Bonferroni correction. It is commonly used when there are unequal sample sizes or variances across groups. Dunnett's test: This test is used to compare each group to a control group. It controls the family-wise error rate for these comparisons. An example of a situation where a post-hoc test might be necessary is in a study comparing the effectiveness of three different types of pain medication. An ANOVA might reveal a significant difference between the groups, but a post-hoc test would be necessary to determine which specific pairs of groups differ significantly from each other.

Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from
50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python
to determine if there are any significant differences between the mean weight loss of the three diets.
Report the F-statistic and p-value, and interpret the results.

In [1]:
#H0: there is no difference among group means
#Ha: there is atleast 1 group difference among the group means
import numpy as np
import pandas as pd
#for convenience a total of 51 observations is assumed for demonstration
df = pd.DataFrame({"diet":np.repeat(["A","B","C"],17),"weight":np.random.randint(40,70,51)})

In [3]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
model = ols(
    'weight ~ C(diet)', data=df).fit()
results = sm.stats.anova_lm(model, typ=1)

results

Unnamed: 0,df,sum_sq,mean_sq,F,PR(>F)
C(diet),2.0,50.156863,25.078431,0.296597,0.744696
Residual,48.0,4058.588235,84.553922,,


From the one way anova we got the F = 0.296597 and p-value = 0.744696

N = 51 , a = 3 , n = 17 dof(between) = a - 1 = 2 dof (within) = N - a = 48 dof (total) = N - 1 = 50 crtical value of F at alpha = 0.05 is 3.2317 Decision rule: If F > 3.2317 then the null hypothesis will be rejected

Inference: Since the F value is less than the critical limit the null hypothesis is accepted so there is no difference in means in atleast one pair of groups

Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.

In [5]:
#H0: there is no difference among group means
#Ha: there is atleast 1 group difference among the group means
#import libraries:
import numpy as np
import pandas as pd
# Create a dataframe
prog = np.repeat(["Program A","Program B","Program C"], 10)
np.random.shuffle(prog)
exp = np.repeat(['novice', 'experienced'], 15)
np.random.shuffle(exp)
df = pd.DataFrame({'Program': prog,
                          'Exp_lvl':exp,
                          'time': np.random.randint(20,60,30)})

In [6]:
# Importing libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# Performing two-way ANOVA
model = ols(
    'time ~ C(Exp_lvl) + C(Program) +\
    C(Exp_lvl):C(Program)', data=df).fit()
results = sm.stats.anova_lm(model, typ=2)
#results
results

Unnamed: 0,sum_sq,df,F,PR(>F)
C(Exp_lvl),41.128358,1.0,0.313898,0.580491
C(Program),146.928358,2.0,0.56069,0.578113
C(Exp_lvl):C(Program),381.281166,2.0,1.454998,0.253263
Residual,3144.590476,24.0,,


for exp_lvl the critical value of F = 4.2597 and for program as well as the interaction of experience level with program the critical value of F = 3.4028 Decision rule: If the F statistic is more than the correspond critical limit then null hypothesis is rejected F values from the test are:

Exp_lvl = 0.59
Program = 0.65
Exp_lvl and Program = 1.89
Inference: Since none of F statistic is higher than its critical limit the test has failed to reject the null hypothesis Therefore we conclude that the means among all groups are the same

Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.

In [7]:
import numpy as np
from scipy.stats import ttest_ind

# generate sample data
control_scores = np.random.normal(70, 10, 100)
experimental_scores = np.random.normal(75, 10, 100)

# conduct two-sample t-test
t_statistic, p_value = ttest_ind(control_scores, experimental_scores)

# print results
print("t-statistic:", t_statistic)
print("p-value:", p_value)

# conduct post-hoc test (Tukey's HSD)
from statsmodels.stats.multicomp import pairwise_tukeyhsd

tukey_results = pairwise_tukeyhsd(np.concatenate((control_scores, experimental_scores)),
                                  np.concatenate((np.repeat('control', 100), np.repeat('experimental', 100))))

print(tukey_results)

t-statistic: -2.885705467129249
p-value: 0.004338429120937719
   Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj  lower  upper  reject
---------------------------------------------------------
control experimental   4.0434 0.0043 1.2802 6.8065   True
---------------------------------------------------------


Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post- hoc test to determine which store(s) differ significantly from each other.

In [12]:
import pandas as pd
import numpy as np
import scipy.stats as stats
import pingouin as pg

# create a sample dataset
np.random.seed(123)
data = pd.DataFrame({
    'day': np.repeat(range(1, 31), 3),
    'store': np.tile(['A', 'B', 'C'], 30),
    'sales': np.random.normal(loc=1000, scale=100, size=90)
})

# conduct repeated measures ANOVA
rm_anova = pg.rm_anova(dv='sales', within='store', subject='day', data=data)
print(rm_anova)

  Source  ddof1  ddof2         F     p-unc      ng2       eps
0  store      2     58  1.669709  0.197225  0.03671  0.959348


In [13]:
# conduct pairwise t-test with Bonferroni correction
posthoc = pg.pairwise_ttests(dv='sales', within='store', subject='day', data=data, padjust='bonf')
print(posthoc)

  Contrast  A  B  Paired  Parametric         T   dof alternative     p-unc  \
0    store  A  B    True        True -1.740227  29.0   two-sided  0.092423   
1    store  A  C    True        True -0.892032  29.0   two-sided  0.379718   
2    store  B  C    True        True  0.998930  29.0   two-sided  0.326091   

     p-corr p-adjust   BF10    hedges  
0  0.277268     bonf  0.742 -0.453587  
1  1.000000     bonf   0.28 -0.256064  
2  0.978273     bonf  0.307  0.216494  


