# SA1-29 BAYQUEN **two-way mixed model ANOVA**
Github link: https://github.com/notfolded/APM1220/blob/main/SA1-29.ipynb

In [9]:
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import seaborn as sns
import pingouin as pg

In [38]:
# loading the dataset

weightloss_df = pd.read_csv("/content/drive/MyDrive/Applied Multivariate Data Anlysis/Weight Loss by Diet Type and Time.csv")

# Reshape the data for analysis
weightloss_long = pd.melt(weightloss_df,
                          id_vars=['Participant', 'Diet Type'],
                          value_vars=['Baseline', 'After 1 month', 'After 2 months'],
                          var_name='Time', value_name='Weight Loss')

# Convert 'Time' to categorical and 'Weight Loss' to numeric
weightloss_long['Time'] = pd.Categorical(weightloss_long['Time'],
                                         categories=['Baseline', 'After 1 month', 'After 2 months'],
                                         ordered=True)
weightloss_long['Weight Loss'] = pd.to_numeric(weightloss_long['Weight Loss'])


weightloss_long.head()

Unnamed: 0,Participant,Diet Type,Time,Weight Loss
0,1,Low-Carb,Baseline,0.0
1,2,Low-Carb,Baseline,0.0
2,3,Low-Carb,Baseline,0.0
3,4,Low-Carb,Baseline,0.0
4,5,Low-Carb,Baseline,0.0


# **Assumption Validation***



**Normality Assumption**


In [40]:
# Check normality
for diet in weightloss_long['Diet Type'].unique():
    for time in weightloss_long['Time'].unique():
        _, p_value = stats.shapiro(weightloss_long[(weightloss_long['Diet Type'] == diet) &
                                                   (weightloss_long['Time'] == time)]['Weight Loss'])
        print(f"Shapiro-Wilk test p-value for {diet} at {time}: {p_value:.4f}")


Shapiro-Wilk test p-value for Low-Carb at Baseline: 1.0000
Shapiro-Wilk test p-value for Low-Carb at After 1 month: 0.6951
Shapiro-Wilk test p-value for Low-Carb at After 2 months: 0.9347
Shapiro-Wilk test p-value for Low-Fat at Baseline: 1.0000
Shapiro-Wilk test p-value for Low-Fat at After 1 month: 0.6459
Shapiro-Wilk test p-value for Low-Fat at After 2 months: 0.4941


For all the time points and both diet types, the p-values are well above 0.05, meaning we fail to reject the null hypothesis of normality. This indicates that the normality assumption is satisfied.

**Sphericity Test**

In [16]:
# Mauchly's test for sphericity (using pingouin)
sphericity_results = pg.sphericity(df_melt, dv='Weight Loss', within='Time', subject='Participant')
print(f"Mauchly's test for sphericity: {sphericity_results}")


Mauchly's test for sphericity: SpherResults(spher=False, W=0.09429927870847703, chi2=42.5030712889439, dof=2, pval=5.896242510895627e-10)


The p-value for Mauchly’s test is very small (p < 0.05), meaning the sphericity assumption is violated. Since sphericity is violated, we need to apply the Greenhouse-Geisser correction when interpreting the results of the within-subjects factor (Time).

**Homogeneity of Variance**

In [45]:
# Check homogeneity of variance
levene_results = []
for time in weightloss_long['Time'].unique():
    stat, p = stats.levene(weightloss_long[(weightloss_long['Diet Type'] == 'Low-Carb') &
                                           (weightloss_long['Time'] == time)]['Weight Loss'],
                           weightloss_long[(weightloss_long['Diet Type'] == 'Low-Fat') &
                                           (weightloss_long['Time'] == time)]['Weight Loss'])
    levene_results.append((time, stat, p))

print("\nLevene's test results:")
for time, stat, p in levene_results:
    print(f"{time}: statistic = {stat:.4f}, p-value = {p:.4f}")


Levene's test results:
Baseline: statistic = nan, p-value = nan
After 1 month: statistic = 0.8588, p-value = 0.3663
After 2 months: statistic = 2.6036, p-value = 0.1240


  W = numer / denom


Levene’s test results for "After 1 Month" and "After 2 Months" have p-values greater than 0.05, which indicates that the variances between groups are not significantly different at these time points. Hence, the homogeneity of variance assumption is satisfied for these time points.

# **Performing the Two-way mixed Anova**

In [47]:
# 1. Within-subjects effect (Time)
time_effects = {}
for diet in weightloss_long['Diet Type'].unique():
    diet_data = weightloss_long[weightloss_long['Diet Type'] == diet]
    f_val, p_val = stats.f_oneway(diet_data[diet_data['Time'] == 'Baseline']['Weight Loss'],
                                  diet_data[diet_data['Time'] == 'After 1 month']['Weight Loss'],
                                  diet_data[diet_data['Time'] == 'After 2 months']['Weight Loss'])
    time_effects[diet] = {'F-value': f_val, 'p-value': p_val}

print("Within-subjects effect (Time):")
for diet, result in time_effects.items():
    print(f"{diet}: F = {result['F-value']:.4f}, p = {result['p-value']:.4f}")

# 2. Between-subjects effect (Diet Type)
diet_effects = {}
for time in weightloss_long['Time'].unique():
    time_data = weightloss_long[weightloss_long['Time'] == time]
    f_val, p_val = stats.f_oneway(time_data[time_data['Diet Type'] == 'Low-Carb']['Weight Loss'],
                                  time_data[time_data['Diet Type'] == 'Low-Fat']['Weight Loss'])
    diet_effects[time] = {'F-value': f_val, 'p-value': p_val}

print("\nBetween-subjects effect (Diet Type):")
for time, result in diet_effects.items():
    print(f"{time}: F = {result['F-value']:.4f}, p = {result['p-value']:.4f}")

# 3. Interaction effect (Diet Type x Time)
model = ols('Q("Weight Loss") ~ C(Q("Diet Type")) + C(Time) + C(Q("Diet Type")):C(Time)',
            data=weightloss_long).fit()
anova_table = sm.stats.anova_lm(model, typ=2)

print("\nInteraction effect (Diet Type x Time):")
print(anova_table)

# Calculate effect sizes (partial eta-squared)
def partial_eta_squared(aov):
    aov['eta2'] = 'NaN'
    aov['eta2'] = aov[:-1]['sum_sq'] / (aov[:-1]['sum_sq'] + aov['sum_sq'].iloc[-1])
    return aov

anova_table = partial_eta_squared(anova_table)
print("\nANOVA table with effect sizes:")
print(anova_table)

Within-subjects effect (Time):
Low-Carb: F = 969.4394, p = 0.0000
Low-Fat: F = 670.3252, p = 0.0000

Between-subjects effect (Diet Type):
Baseline: F = nan, p = nan
After 1 month: F = 57.3064, p = 0.0000
After 2 months: F = 309.9753, p = 0.0000

Interaction effect (Diet Type x Time):
                              sum_sq    df            F        PR(>F)
C(Q("Diet Type"))           9.440667   1.0   328.054054  1.326833e-24
C(Time)                    93.334333   2.0  1621.638996  6.087392e-49
C(Q("Diet Type")):C(Time)   7.424333   2.0   128.994208  2.709523e-21
Residual                    1.554000  54.0          NaN           NaN

ANOVA table with effect sizes:
                              sum_sq    df            F        PR(>F)  \
C(Q("Diet Type"))           9.440667   1.0   328.054054  1.326833e-24   
C(Time)                    93.334333   2.0  1621.638996  6.087392e-49   
C(Q("Diet Type")):C(Time)   7.424333   2.0   128.994208  2.709523e-21   
Residual                    1.554000  54.

  res = hypotest_fun_out(*samples, **kwds)


- **Main Effect of Diet Type (Between-Subjects Effect)**:

 For both diet types, there is a highly significant effect of time on weight loss (p < 0.0001). This means that weight loss changed significantly over the course of the study for both the Low-Carb and Low-Fat diets. The larger F-value for the Low-Carb diet suggests that the effect of time might be more pronounced in this group.

- **Main Effect of Time (Within-Subjects Effect)**:

  The 'nan' values at baseline indicate that there was no variation between groups at the start, which is expected as all participants began at 0 kg weight loss. After 1 month and 2 months, there are highly significant differences between the diet types (p < 0.0001). The much larger F-value at 2 months (309.9753) compared to 1 month (57.3064) suggests that the difference between diet types becomes more pronounced over time.

- **Interaction Effect (Diet Type x Time)**:

  All effect sizes are very large (> 0.14 is typically considered a large effect). Time has the largest effect size, followed by Diet Type, and then the interaction. This suggests that while all factors have a strong influence, the passage of time has the most substantial impact on weight loss, followed by the type of diet, and then how the diet type's effect changes over time.

**Post-hoc test**

In [35]:
posthoc_results = pg.pairwise_ttests(dv='Weight_Loss', within='Time', between='Diet Type',
                                     subject='Participant', data=df_melt, padjust='bonf')
print(posthoc_results)

           Contrast            Time               A               B Paired  \
0              Time               -   After 1 month  After 2 months   True   
1              Time               -   After 1 month        Baseline   True   
2              Time               -  After 2 months        Baseline   True   
3         Diet Type               -        Low-Carb         Low-Fat  False   
4  Time * Diet Type   After 1 month        Low-Carb         Low-Fat  False   
5  Time * Diet Type  After 2 months        Low-Carb         Low-Fat  False   
6  Time * Diet Type        Baseline        Low-Carb         Low-Fat  False   

  Parametric          T   dof alternative         p-unc        p-corr  \
0       True -12.440914  19.0   two-sided  1.405194e-10  4.215581e-10   
1       True  17.536628  19.0   two-sided  3.423323e-13  1.026997e-12   
2       True  15.141360  19.0   two-sided  4.666314e-12  1.399894e-11   
3       True  13.217690  18.0   two-sided  1.048878e-10           NaN   
4       Tr

  d = (x.mean() - y.mean()) / poolsd


**Time Contrasts**

- After 1 month vs. After 2 months:

  There's a significant difference in weight loss between 1 month and 2 months (p < 0.001). The negative T-value and Hedges' g indicate that weight loss is greater at 2 months.

- After 1 month vs. Baseline:

  There's a significant difference between 1 month and baseline (p < 0.001). The positive T-value and large Hedges' g indicate substantially more weight loss at 1 month compared to baseline.

- After 2 months vs. Baseline:

  There's a significant difference between 2 months and baseline (p < 0.001). The positive T-value and large Hedges' g show substantially more weight loss at 2 months compared to baseline.

**Diet Type Contrast**

- Low-Carb vs. Low-Fat:

   There's a significant overall difference between Low-Carb and Low-Fat diets (p < 0.001). The positive T-value and large Hedges' g indicate that the Low-Carb diet results in more weight loss overall.

- Time x Diet Type Interaction:

  There's a significant difference between diet types at 1 month (p < 0.001). The positive T-value and Hedges' g indicate that Low-Carb results in more weight loss at 1 month.

- After 2 months: Low-Carb vs. Low-Fat

  There's a significant difference between diet types at 2 months (p < 0.001). The positive T-value and very large Hedges' g indicate that Low-Carb results in substantially more weight loss at 2 months.

- Baseline: Low-Carb vs. Low-Fat

  All values are NaN, which is expected as there should be no difference at baseline (all participants start at 0 weight loss).