# SA1-30 BAYQUEN **3-Factor ANOVA**
Github link : https://github.com/notfolded/APM1220/blob/main/SA1-30.ipynb

In [3]:
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.stats.anova import AnovaRM
import pingouin as pg


In [52]:
# loading the dataset

response_df = pd.read_csv("/content/drive/MyDrive/Applied Multivariate Data Anlysis/server_response_time_replicates.csv")
response_df['Time'] = pd.Categorical(response_df['Time'], categories=["Baseline", "1 Month", "2 Months"], ordered=True)

response_df['Server_Type_Protocol'] = response_df['Server Type'] + "_" + response_df['Security Protocol']

response_df.head()



Unnamed: 0,Server,Server Type,Security Protocol,Time,Response Time,Server_Type_Protocol
0,1,Linux,SSL,Baseline,93.881199,Linux_SSL
1,1,Linux,SSL,1 Month,105.47571,Linux_SSL
2,1,Linux,SSL,2 Months,111.870916,Linux_SSL
3,2,Windows,TLS,Baseline,160.105153,Windows_TLS
4,2,Windows,TLS,1 Month,152.757781,Windows_TLS


# **Assumption Validation**

**Normality Assumption**

In [54]:
# Check Normality for each group
normality_test = response_df.groupby(['Server_Type_Protocol', 'Time'])['Response Time'].apply(pg.normality)
print(normality_test)

                                                           W      pval  normal
Server_Type_Protocol Time                                                     
Linux_SSL            Baseline Linux_SSL   Baseline  0.948601  0.707390    True
                     1 Month  Linux_SSL   1 Month   0.910743  0.486346    True
                     2 Months Linux_SSL   2 Months  0.898632  0.424295    True
Linux_TLS            Baseline Linux_TLS   Baseline  0.925163  0.510552    True
                     1 Month  Linux_TLS   1 Month   0.915007  0.431623    True
                     2 Months Linux_TLS   2 Months  0.924185  0.502602    True
Windows_SSL          Baseline Windows_SSL Baseline  0.825995  0.129775    True
                     1 Month  Windows_SSL 1 Month   0.906941  0.449427    True
                     2 Months Windows_SSL 2 Months  0.909555  0.464921    True
Windows_TLS          Baseline Windows_TLS Baseline  0.910347  0.484245    True
                     1 Month  Windows_TLS 1 Month   

  normality_test = response_df.groupby(['Server_Type_Protocol', 'Time'])['Response Time'].apply(pg.normality)


 For each combination of Server_Type_Protocol and Time, the p-values are all above 0.05. This indicates that we fail to reject the null hypothesis of normality for all groups. The server response times follow a normal distribution within each group, satisfying the normality assumption.

**Sphericity Assumption**


In [40]:
# Check Sphericity
sphericity_test = pg.sphericity(response_df, dv='Response Time', within='Time', subject='Server')
print(sphericity_test)


Sphericity test result:
 SpherResults(spher=False, W=0.5592525843401225, chi2=10.460773035729904, dof=2, pval=0.0053514564693362306)


The sphericity test result indicates a violation of the sphericity assumption (W = 0.559, χ2(2)=10.46χ2(2)=10.46, p = 0.0054). Since p < 0.05, we reject the null hypothesis, meaning the variances of the differences between time points are not equal. The Greenhouse-Geisser correction should be applied to adjust the degrees of freedom in the ANOVA for the Time factor.

**Homogeneity Assumption**

In [51]:
# Checking for Homogeneity of Variance
response_df['Group'] = response_df['Server Type'] + "_" + response_df['Security Protocol']

# Perform Levene's test for homogeneity of variances
levene_test = pg.homoscedasticity(response_df, dv='Response Time', group='Server_Type_Protocol')
print(levene_test)

               W      pval  equal_var
levene  0.781134  0.509468       True


The p-value for Levene’s test is 0.509, which is greater than 0.05. This means we fail to reject the null hypothesis of equal variances, and the homogeneity of variance assumption is satisfied.

**Independence Assumption**

There is no relationship between the observations in each group of the independent variable or between the groups themselves—the values of each observations are independent of each other.

# **Performing the Three-Way ANOVA**

In [55]:
# Perform a three-way mixed ANOVA (one between-subjects factor: Server_Type_Protocol, one within-subjects factor: Time)
anova_results = pg.mixed_anova(dv='Response Time',
                               within='Time',
                               between='Server_Type_Protocol',
                               subject='Server',
                               data=response_df)

print(anova_results)

                 Source            SS  DF1  DF2           MS          F  \
0  Server_Type_Protocol  18616.553305    3   16  6205.517768  22.712627   
1                  Time    530.953324    2   32   265.476662  21.234966   
2           Interaction   1101.863049    6   32   183.643842  14.689317   

          p-unc  p-GG-corr       np2       eps sphericity   W-spher   p-spher  
0  5.204016e-06        NaN  0.809836       NaN        NaN       NaN       NaN  
1  1.351152e-06   0.009115  0.570296  0.694084      False  0.559253  0.005351  
2  5.518366e-08        NaN  0.733635       NaN        NaN       NaN       NaN  


**1. Server Type and Security Protocol (between-subjects)**

  There is a significant main effect of the combined Server Type and Security Protocol on response times. This suggests that server response times significantly differ between different server configurations (Linux vs. Windows and TLS vs. SSL). The large partial eta squared (η² = 0.81) indicates a strong effect size.

  > **Main Effect of Server Type (Linux vs. Windows)**
  The analysis indicates that server response times differ significantly between Linux and Windows configurations when considering the security protocols (TLS and SSL). The strong effect size (η² = 0.81) suggests that one server type consistently outperforms the other across all tested conditions.

  > **Main Effect of Security Protocol (TLS vs. SSL)**
  The significant effect of this combined factor implies that the type of security protocol also plays a significant role in influencing response times, possibly showing that one protocol is more efficient than the other across different server types.

**2. Time (within-subjects, with Greenhouse-Geisser correction)**

  There is a significant main effect of Time, meaning that server response times change significantly over the three time points (Baseline, 1 Month, 2 Months).The effect size is also substantial (η² = 0.57), indicating a notable change in response times over time.

**3. Interaction between Server Type and Security Protocol**

  There is a significant interaction effect between Server_Type_Protocol and Time. This suggests that the effect of Time on server response times depends on the server configuration (i.e., the interaction between Server Type and Security Protocol over time). The effect size (η² = 0.73) is strong, indicating that the interaction between server configuration and time has a significant impact on response times.

**Post-Hoc Tests**

In [57]:
# Conduct post-hoc tests for simple main effects of Time at each level of Server_Type_Protocol
for protocol in response_df['Server_Type_Protocol'].unique():
    print(f"Post-hoc for {protocol}:")
    post_hoc_results = pg.pairwise_tukey(response_df[response_df['Server_Type_Protocol'] == protocol],
                                          dv='Response Time',
                                          between='Time')
    print(post_hoc_results)



Post-hoc for Linux_SSL:
          A         B     mean(A)     mean(B)       diff        se         T  \
0  Baseline   1 Month  102.672297  113.475776 -10.803479  5.412746 -1.995933   
1  Baseline  2 Months  102.672297  116.379736 -13.707439  5.412746 -2.532437   
2   1 Month  2 Months  113.475776  116.379736  -2.903959  5.412746 -0.536504   

    p-tukey    hedges  
0  0.168817 -1.213436  
1  0.074874 -1.559312  
2  0.855688 -0.333250  
Post-hoc for Windows_TLS:
          A         B     mean(A)     mean(B)       diff        se         T  \
0  Baseline   1 Month  149.531083  144.339316   5.191767  8.640503  0.600864   
1  Baseline  2 Months  149.531083  154.990024  -5.458941  8.640503 -0.631785   
2   1 Month  2 Months  144.339316  154.990024 -10.650707  8.640503 -1.232649   

    p-tukey    hedges  
0  0.823160  0.372802  
1  0.806826 -0.388445  
2  0.464894 -0.751291  
Post-hoc for Windows_SSL:
          A         B     mean(A)     mean(B)      diff        se         T  \
0  Baseline

- **Linux SSL**

  1. The comparison between Baseline and 1 Month shows a difference of -10.80 ms with a p-value of 0.169, indicating no significant difference.

  2. The comparison between Baseline and 2 Months shows a difference of -13.71 ms with a p-value of 0.075, which is close to significance but not quite there (p < 0.05).

  3. There is no significant difference between 1 Month and 2 Months (p = 0.856).

- **Windows TLS**

  1. No significant differences are observed in any of the comparisons (all p-values > 0.05).The response times do not change significantly from Baseline to 1 Month, Baseline to 2 Months, or from 1 Month to 2 Months.

- **Windows SSL**

  1. There are no significant differences in response times across the three time points (all p-values > 0.05).

- **Linux TLS**

  1. The comparison between Baseline and 1 Month shows a significant difference (p = 0.048), indicating that response times improved from Baseline to 1 Month.

  2. The comparison between Baseline and 2 Months shows a significant difference (p = 0.006), indicating a notable improvement in response times over time.

  3. There is no significant difference between 1 Month and 2 Months (p = 0.608).