# Hypothesis Testing

[1. Completion times with VR are significantly higher than without VR - t-test](#1)

[2. The start modality doesn't influence the completion times (Mean of completion times for both start modalities is the same) - t-test](#2)

[3. Completion times for WBC are significantly higher than for CHIRON controller (t-test)](#3)

[4. Individual t-test for each of the 4 combinations](#4)

[5. LLM for 2x2x3 mixed user study design (CORRECT WAY TO DO IT I GUESS!)](#5)

In [16]:
import pandas as pd
from scipy.stats import ttest_ind

# Load the provided CSV file
file_path = 'C:/Users/sophi/Alemanha/Thesis/IUS-analysis/cleaned_data/completion_times_cleaned_partial_times_separated.csv'
data = pd.read_csv(file_path)

data
df = data
df

Unnamed: 0,Folder,Controller,Start_Modality,Task_1,Task_2,Task_3,Task_4,Task_5,Subject,Trial,Modality,Total Time
0,AAHF21-RW-WITH-VR-TRIAL-2-ROSBAG,CHIRON,WITH-VR,75.644989,48.872307,39.200264,17.848114,30.904282,AAHF21,2,WITH-VR,212.469955
1,AAHF21-RW-WITH-VR-TRIAL-1-ROSBAG,CHIRON,WITH-VR,165.343307,307.602310,34.984314,27.264221,37.816284,AAHF21,1,WITH-VR,573.010437
2,AAHF21-RW-NO-VR-TRIAL-2-ROSBAG,CHIRON,WITH-VR,37.954384,44.784332,19.000168,17.976135,20.240181,AAHF21,2,NO-VR,139.955200
3,AAHF21-RW-NO-VR-TRIAL-1-ROSBAG,CHIRON,WITH-VR,26.210884,28.624367,22.064034,13.376114,18.192238,AAHF21,1,NO-VR,108.467636
4,AAHF21-RW-NO-VR-TRIAL-3-ROSBAG,CHIRON,WITH-VR,38.964264,41.384285,17.336159,12.672020,16.224594,AAHF21,3,NO-VR,126.581322
...,...,...,...,...,...,...,...,...,...,...,...,...
102,SASD12-RW-NO-VR-TRIAL-2-ROSBAG,WBC,NO-VR,91.417801,63.719894,37.655945,12.304108,24.816162,SASD12,2,NO-VR,229.913910
103,SASD12-RW-WITH-VR-TRIAL-1-ROSBAG,WBC,NO-VR,83.553024,98.992661,48.496201,45.216324,101.504700,SASD12,1,WITH-VR,377.762909
104,SASD12-RW-NO-VR-TRIAL-3-ROSBAG,WBC,NO-VR,80.978241,58.496277,58.080185,44.944183,38.200272,SASD12,3,NO-VR,280.699158
105,SASD12-RW-WITH-VR-TRIAL-2-ROSBAG,WBC,NO-VR,103.480209,114.760712,48.392136,31.728241,96.000732,SASD12,2,WITH-VR,394.362030


## 1. Completion times with VR are significantly higher than without VR <a id='1'></a>

Hypothesis:

- Null Hypothesis (H₀): There is no significant difference in completion times between WITH-VR and NO-VR trials, or WITH-VR times are not significantly higher.
- Alternative Hypothesis (H₁): Completion times for WITH-VR trials are significantly higher than those for NO-VR trials.

A one-tailed independent T-test was used to compare the total completion between the two groups: WITH-VR and NO-VR.

In [2]:
# Step 2: Filter the data based on the "Modality" column
with_vr = data[data['Modality'] == 'WITH-VR']['Total Time']
no_vr = data[data['Modality'] == 'NO-VR']['Total Time']

# Step 3: Perform the statistical hypothesis test
t_stat, p_value = ttest_ind(with_vr, no_vr, alternative='greater')

# Step 4: Summarize the results
results = {
    "t_statistic": t_stat,
    "p_value": p_value,
    "mean_with_vr": with_vr.mean(),
    "mean_no_vr": no_vr.mean(),
    "sample_size_with_vr": len(with_vr),
    "sample_size_no_vr": len(no_vr)
}

# Display the results
print("Hypothesis Testing Results:")
for key, value in results.items():
    print(f"{key}: {value}")

# Conclusion
if p_value < 0.05:
    print("\nConclusion: Completion times for trials with the modality 'WITH-VR' are significantly higher than 'NO-VR' trials.")
else:
    print("\nConclusion: No significant difference in completion times between 'WITH-VR' and 'NO-VR' trials.")

Hypothesis Testing Results:
t_statistic: 4.509245471307329
p_value: 8.49651055796249e-06
mean_with_vr: 402.01578756479114
mean_no_vr: 269.6437383478338
sample_size_with_vr: 52
sample_size_no_vr: 55

Conclusion: Completion times for trials with the modality 'WITH-VR' are significantly higher than 'NO-VR' trials.


<mark style="background-color: green; color: white;">
We reject the Null Hypothesis H₀, meaning that controlling the robot with the headset on takes significantly more time than without across all 3 trials.
</mark>

## 2. The start modality doesn't influence the completion times (Mean of completion times for both start modalities is the same) <a id='2'></a>

Hypothesis:

- Null Hypothesis (H₀): The mean completion times for trials starting with WITH-VR and NO-VR are the same.
- Alternative Hypothesis (H₁): The mean completion times for trials starting with WITH-VR and NO-VR are different.

A two-tailed independent t-test was conducted to compare the mean total completion time for the two start modalities.

In [3]:

# Step 1: Filter the data based on the "Start_Modality" column
start_with_vr = data[data['Start_Modality'] == 'WITH-VR']['Total Time']
start_no_vr = data[data['Start_Modality'] == 'NO-VR']['Total Time']

# Step 2: Perform the statistical hypothesis test
# Null Hypothesis (H0): The means of both groups are equal
# Alternative Hypothesis (H1): The means of both groups are different
t_stat, p_value = ttest_ind(start_with_vr, start_no_vr, alternative='two-sided')

# Step 3: Summarize the results
results_start_modality = {
    "t_statistic": t_stat,
    "p_value": p_value,
    "mean_start_with_vr": start_with_vr.mean(),
    "mean_start_no_vr": start_no_vr.mean(),
    "sample_size_start_with_vr": len(start_with_vr),
    "sample_size_start_no_vr": len(start_no_vr)
}

# Display the results
print("Hypothesis Testing Results for Start Modality:")
for key, value in results_start_modality.items():
    print(f"{key}: {value}")

# Conclusion
if p_value < 0.05:
    print("\nConclusion: The start modality significantly influences the completion times.")
else:
    print("\nConclusion: The start modality does not significantly influence the completion times.")


Hypothesis Testing Results for Start Modality:
t_statistic: -0.5256423119248483
p_value: 0.6002450987324301
mean_start_with_vr: 325.7918776078658
mean_start_no_vr: 342.6283325782189
sample_size_start_with_vr: 55
sample_size_start_no_vr: 52

Conclusion: The start modality does not significantly influence the completion times.


<mark style="background-color: green; color: white;">
We DON'T reject the Null Hypothesis H₀, meaning that the start modality doesn't significantly influence the completion times when looking at the mean for all trials.
</mark>

## 3. Completion times for WBC are significantly higher than for CHIRON controller <a id='3'></a>

Hypothesis:

- Null Hypothesis (H₀): Completion times for the WBC controller are not significantly higher than those for the CHIRON controller.
- Alternative Hypothesis (H₁): Completion times for the WBC controller are significantly higher than those for the CHIRON controller.

A one-tailed independent t-test was conducted to compare the mean total completion time for the WBC and CHIRON controllers.

In [4]:
# Step 1: Filter the data based on the "Controller" column
wbc_controller = data[data['Controller'] == 'WBC']['Total Time']
chiron_controller = data[data['Controller'] == 'CHIRON']['Total Time']

# Step 2: Perform the statistical hypothesis test
# Null Hypothesis (H0): Completion times for WBC are not significantly higher than CHIRON
# Alternative Hypothesis (H1): Completion times for WBC are significantly higher than CHIRON
t_stat, p_value = ttest_ind(wbc_controller, chiron_controller, alternative='greater')

# Step 3: Summarize the results
results_controller = {
    "t_statistic": t_stat,
    "p_value": p_value,
    "mean_wbc": wbc_controller.mean(),
    "mean_chiron": chiron_controller.mean(),
    "sample_size_wbc": len(wbc_controller),
    "sample_size_chiron": len(chiron_controller)
}

# Display the results
print("Hypothesis Testing Results for Controller Type:")
for key, value in results_controller.items():
    print(f"{key}: {value}")

# Conclusion
if p_value < 0.05:
    print("\nConclusion: Completion times for WBC are significantly higher than for CHIRON.")
else:
    print("\nConclusion: Completion times for WBC are not significantly higher than for CHIRON.")

Hypothesis Testing Results for Controller Type:
t_statistic: 5.430893171879472
p_value: 1.8229285885671913e-07
mean_wbc: 414.5773465025659
mean_chiron: 260.56753376552035
sample_size_wbc: 51
sample_size_chiron: 56

Conclusion: Completion times for WBC are significantly higher than for CHIRON.


<mark style="background-color: green; color: white;">
We reject the Null Hypothesis H₀, meaning that completion times for WBC are significantly higher than for CHIRON when looking at the mean of all trials.
</mark>

# 4. Individual t-test for with vr/ no vr for each controller <a id='4'></a>

Null Hypothesis being tested here:

- A. No significant difference between CHIRON WITH VR and CHIRON NO VR for completion times
- B. No significant difference between WBC WITH VR and WBC NO VR for completion times
- C. No significant difference between CHIRON WITH VR and WBC WITH VR for completion times
- D. No significant difference between CHIRON NO VR and WBC NO VR for completion times


In [31]:
missing_rows = df_pivot[df_pivot.isnull().any(axis=1)]
print(missing_rows)


Modality Subject Controller       NO-VR     WITH-VR
12        LEMT02        WBC         NaN  710.234985
16        REKD03        WBC  287.957314         NaN
19        XHKB15     CHIRON  201.130758         NaN


In [34]:
import pandas as pd
from scipy.stats import ttest_rel, ttest_ind

df.rename(columns={"Total Time": "Total_Time"}, inplace=True)


# -----------------------------------------------------------------------------
# Step 1: Group the data by Participant, Controller, and Visualization,
#         computing the mean of Total_Time for each condition.
# -----------------------------------------------------------------------------
df_grouped = df.groupby(['Subject', 'Controller', 'Modality'])['Total_Time'].mean().reset_index()

# -----------------------------------------------------------------------------
# Step 2: Pivot the DataFrame so that each row represents a participant's 
#         performance for each controller with separate columns for each visualization.
# -----------------------------------------------------------------------------
df_pivot = df_grouped.pivot(index=['Subject', 'Controller'], 
                            columns='Modality', 
                            values='Total_Time').reset_index()

print("Pivoted DataFrame before dropping missing values:")
print(df_pivot.head())

# -----------------------------------------------------------------------------
# Step 3: Drop rows that have missing values in the pivoted DataFrame.
#         This ensures that each row has both a WITH-VR and a WITHOUT-VR value.
# -----------------------------------------------------------------------------
df_pivot = df_pivot.dropna(subset=['WITH-VR', 'NO-VR'])
print("Pivoted DataFrame after dropping missing rows:")
print(df_pivot.head())

# -----------------------------------------------------------------------------
# Step 4: Perform the comparisons
# -----------------------------------------------------------------------------

# 1. CHIRON: WITH-VR vs. WITHOUT-VR (paired t-test)
chiron_df = df_pivot[df_pivot['Controller'] == 'CHIRON']
if not chiron_df.empty:
    t_stat, p_value = ttest_rel(chiron_df['WITH-VR'], chiron_df['NO-VR'])
    print("CHIRON: WITH-VR vs. WITHOUT-VR: t-statistic = {:.3f}, p-value = {:.3f}".format(t_stat, p_value))
else:
    print("No CHIRON data available for comparison.")

# 2. WBC: WITH-VR vs. WITHOUT-VR (paired t-test)
wbc_df = df_pivot[df_pivot['Controller'] == 'WBC']
if not wbc_df.empty:
    t_stat, p_value = ttest_rel(wbc_df['WITH-VR'], wbc_df['NO-VR'])
    print("WBC: WITH-VR vs. WITHOUT-VR: t-statistic = {:.3f}, p-value = {:.3f}".format(t_stat, p_value))
else:
    print("No WBC data available for comparison.")

# 3. WITH-VR: CHIRON vs. WBC (independent t-test)
if not chiron_df.empty and not wbc_df.empty:
    t_stat, p_value = ttest_ind(chiron_df['WITH-VR'], wbc_df['WITH-VR'], equal_var=False)
    print("WITH-VR: CHIRON vs. WBC: t-statistic = {:.3f}, p-value = {:.3f}".format(t_stat, p_value))
else:
    print("Insufficient data for CHIRON vs. WBC comparison in WITH-VR.")

# 4. WITHOUT-VR: CHIRON vs. WBC (independent t-test)
if not chiron_df.empty and not wbc_df.empty:
    t_stat, p_value = ttest_ind(chiron_df['NO-VR'], wbc_df['NO-VR'], equal_var=False)
    print("WITHOUT-VR: CHIRON vs. WBC: t-statistic = {:.3f}, p-value = {:.3f}".format(t_stat, p_value))
else:
    print("Insufficient data for CHIRON vs. WBC comparison in WITHOUT-VR.")


Pivoted DataFrame before dropping missing values:
Modality Subject Controller       NO-VR     WITH-VR
0         AAHF21     CHIRON  125.001386  332.985158
1         ARAH22        WBC  275.245967  329.201385
2         ATSF08     CHIRON  212.648794  252.253448
3         BTHH23        WBC  363.826803  564.708476
4         CEEJ05     CHIRON  303.251938  381.647603
Pivoted DataFrame after dropping missing rows:
Modality Subject Controller       NO-VR     WITH-VR
0         AAHF21     CHIRON  125.001386  332.985158
1         ARAH22        WBC  275.245967  329.201385
2         ATSF08     CHIRON  212.648794  252.253448
3         BTHH23        WBC  363.826803  564.708476
4         CEEJ05     CHIRON  303.251938  381.647603
CHIRON: WITH-VR vs. WITHOUT-VR: t-statistic = 3.204, p-value = 0.013
WBC: WITH-VR vs. WITHOUT-VR: t-statistic = 4.724, p-value = 0.002
WITH-VR: CHIRON vs. WBC: t-statistic = -3.103, p-value = 0.012
WITHOUT-VR: CHIRON vs. WBC: t-statistic = -2.510, p-value = 0.024


# Interpretation of T-Test Results

Below are the results of our t-test comparisons along with their interpretations:

---

### 1. CHIRON: WITH-VR vs. WITHOUT-VR
- **t-statistic:** 3.204  
- **p-value:** 0.013

**Interpretation:**  
The positive t-statistic (3.204) with a p-value of 0.013 (which is less than 0.05) indicates a statistically significant difference between the two conditions. For participants using the **CHIRON** controller, the mean Total_Time for the **WITH-VR** condition is significantly different from the **WITHOUT-VR** condition.  
- Since the t-statistic is positive, it suggests that **WITH-VR** trials took longer than **WITHOUT-VR** trials.

---

### 2. WBC: WITH-VR vs. WITHOUT-VR
- **t-statistic:** 4.724  
- **p-value:** 0.002

**Interpretation:**  
For the **WBC** controller, the comparison between **WITH-VR** and **WITHOUT-VR** is also statistically significant (p = 0.002). The positive t-statistic (4.724) indicates that the **WITH-VR** condition leads to a longer Total_Time compared to the **WITHOUT-VR** condition for participants using WBC.

---

### 3. WITH-VR: CHIRON vs. WBC
- **t-statistic:** -3.103  
- **p-value:** 0.012

**Interpretation:**  
When comparing the **WITH-VR** conditions between controllers:
- The negative t-statistic (-3.103) suggests that the mean Total_Time for **CHIRON** is lower than that for **WBC**.
- With a p-value of 0.012 (which is significant), we conclude that **CHIRON with VR** is significantly faster than **WBC with VR**.

---

### 4. WITHOUT-VR: CHIRON vs. WBC
- **t-statistic:** -2.510  
- **p-value:** 0.024

**Interpretation:**  
For the **WITHOUT-VR** conditions:
- The negative t-statistic (-2.510) indicates that the mean Total_Time for **CHIRON** is lower than that for **WBC**.
- The p-value (0.024) is below the 0.05 threshold, meaning this difference is statistically significant. Thus, **CHIRON without VR** is significantly faster than **WBC without VR**.

---

## Summary

- **Within the same controller:**  
  Both CHIRON and WBC showed significantly longer completion times when using **WITH-VR** compared to **WITHOUT-VR**.

- **Between controllers:**  
  For both visualization modalities, **CHIRON** resulted in significantly faster completion times compared to **WBC**.

These findings suggest that:
- The **VR visualization** increases completion times for both controllers.
- The **CHIRON controller** performs better (i.e., faster task completion) than the **WBC controller** regardless of the visualization modality.


## 5. LLM for 2x2x3 mixed user study design (CORRECT WAY TO DO IT I GUESS!)

# Why Choose a Linear Mixed Model Over a Repeated Measures ANOVA?

There are several reasons for preferring a **linear mixed model (LMM)** over a traditional **repeated measures ANOVA** in many experimental settings: https://vsni.co.uk/blogs/anova-vs-linear-mixed-models-choosing-the-right-tool-for-your-statistical-analysis/

1. **Flexibility with Missing Data:**
   - **LMM:** Can handle missing data in a principled way (assuming data are missing at random) without discarding entire subjects. This is especially useful when some participants have incomplete data.
   - **Repeated Measures ANOVA:** Typically requires complete data for all conditions, so any missing value often leads to dropping the subject or employing imputation techniques.

2. **Handling Unbalanced Designs:**
   - **LMM:** Can accommodate unbalanced data (e.g., different numbers of observations per participant) without requiring strict data balancing.
   - **Repeated Measures ANOVA:** Assumes a balanced design, which can limit its applicability if some conditions or trials have fewer observations.

3. **Modeling Random Effects:**
   - **LMM:** Explicitly models random effects (like random intercepts or slopes for subjects), capturing inter-subject variability. This allows for a more accurate representation of the hierarchical structure of the data (e.g., repeated measurements nested within subjects).
   - **Repeated Measures ANOVA:** Often only implicitly accounts for within-subject correlations and may rely on assumptions like sphericity, which can be violated in practice.

4. **Flexibility in Covariance Structures:**
   - **LMM:** Allows for the specification of different covariance structures to model the relationship between repeated measurements, which can be tailored to the data.
   - **Repeated Measures ANOVA:** Assumes compound symmetry (equal variances and covariances) unless corrections (e.g., Greenhouse-Geisser) are applied.

5. **Enhanced Inference:**
   - **LMM:** Provides more flexibility in hypothesis testing, including the possibility of testing complex interactions and accommodating covariates.
   - **Repeated Measures ANOVA:** Can be more restrictive, especially when the data do not meet its underlying assumptions.

---

**In summary**, a linear mixed model is often preferred because it is more robust to missing or unbalanced data, allows for explicit modeling of subject-level variability, and provides greater flexibility in terms of specifying the structure of the data. These advantages are particularly beneficial in complex experimental designs where assumptions of traditional repeated measures ANOVA may not hold.



In [35]:
import pandas as pd
import statsmodels.formula.api as smf

df.rename(columns={"Total Time": "Total_Time"}, inplace=True)
# --- Ensure that relevant columns are treated as categorical variables ---
df['Subject'] = df['Subject'].astype('category')
df['Controller'] = df['Controller'].astype('category')
df['Modality'] = df['Modality'].astype('category')
df['Trial'] = df['Trial'].astype('category')  # Use 'category' if you want to treat trial as a factor

# --- Fit a linear mixed-effects model ---
# The model formula includes main effects and interactions among Controller, Visualization, and Trial.
# A random intercept is specified for each Participant to account for repeated measures.
model = smf.mixedlm("Total_Time ~ Controller * Modality * Trial", 
                    data=df, 
                    groups="Subject")
result = model.fit()

# --- Output the results ---
print(result.summary())


                             Mixed Linear Model Regression Results
Model:                          MixedLM              Dependent Variable:              Total_Time
No. Observations:               107                  Method:                          REML      
No. Groups:                     20                   Scale:                           7655.8291 
Min. group size:                1                    Log-Likelihood:                  -589.0224 
Max. group size:                6                    Converged:                       Yes       
Mean group size:                5.3                                                             
------------------------------------------------------------------------------------------------
                                                  Coef.   Std.Err.   z    P>|z|  [0.025   0.975]
------------------------------------------------------------------------------------------------
Intercept                                         257.215   

# Mixed Linear Model Regression Results Overview

This overview explains the interpretation of the mixed-effects model output for the analysis of `Total_Time`. The model includes fixed effects for **Controller**, **Modality**, and **Trial** (with interactions) and a random intercept for each **Participant** to account for repeated measurements.

---

## Model Overview

- **Dependent Variable:** `Total_Time`
- **Fixed Effects:**
  - **Controller** (between-subjects; baseline = `CHIRON`)
  - **Modality** (within-subjects; baseline = `WITHOUT-VR`)
  - **Trial** (within-subjects; baseline = **Trial 1**)
- **Random Effects:**
  - Random intercepts for each **Participant**

---

## Coefficient Interpretations

### Intercept
- **Coefficient:** 257.215 (p < 0.001)
- **Interpretation:**  
  The estimated average `Total_Time` for the **baseline condition**: participants using the `CHIRON` controller, with `WITHOUT-VR` modality, on **Trial 1**.

---

### Controller[T.WBC]
- **Coefficient:** 158.011 (p = 0.006)
- **Interpretation:**  
  Participants using the **WBC** controller have, on average, a `Total_Time` that is **158.011 units higher** compared to those using the **CHIRON** controller (all else being equal).

---

### Modality[T.WITH-VR]
- **Coefficient:** 111.989 (p = 0.007)
- **Interpretation:**  
  The **WITH-VR** modality is associated with an increase in `Total_Time` of approximately **112 units** compared to the **WITHOUT-VR** condition.

---

### Trial Effects

- **Trial[T.2]:**
  - **Coefficient:** -44.991 (p = 0.265)
  - **Interpretation:**  
    **Trial 2** is associated with a decrease of about **45 units** in `Total_Time` compared to **Trial 1**. This effect is not statistically significant.

- **Trial[T.3]:**
  - **Coefficient:** -63.811 (p = 0.114)
  - **Interpretation:**  
    **Trial 3** shows a decrease of approximately **64 units** compared to **Trial 1**, but this effect is not statistically significant.

---

### Two-Way Interactions

- **Controller[T.WBC]:Modality[T.WITH-VR]**
  - **Coefficient:** 11.679 (p = 0.844)
  - **Interpretation:**  
    The interaction between **WBC** and **WITH-VR** is minimal and not statistically significant, suggesting the combined effect is approximately the sum of their individual effects.

- **Controller[T.WBC]:Trial[T.2]**
  - **Coefficient:** -91.815 (p = 0.112)
  - **Interpretation:**  
    For **Trial 2**, the effect of using **WBC** appears to reduce `Total_Time` by about **91.815 units** relative to the baseline, though not significantly.

- **Controller[T.WBC]:Trial[T.3]**
  - **Coefficient:** -58.032 (p = 0.325)
  - **Interpretation:**  
    In **Trial 3**, a similar interaction is observed with a decrease of about **58.032 units**, which is also not statistically significant.

- **Modality[T.WITH-VR]:Trial[T.2]**
  - **Coefficient:** -37.875 (p = 0.512)
  - **Interpretation:**  
    Indicates a decrease in `Total_Time` for the **WITH-VR** condition in **Trial 2**, but this effect is not statistically significant.

- **Modality[T.WITH-VR]:Trial[T.3]**
  - **Coefficient:** -62.559 (p = 0.278)
  - **Interpretation:**  
    A similar trend is observed for **Trial 3**; however, this effect is not statistically significant.

---

### Three-Way Interactions

- **Controller[T.WBC]:Modality[T.WITH-VR]:Trial[T.2]**
  - **Coefficient:** 164.217 (p = 0.048)
  - **Interpretation:**  
    This significant interaction suggests that in **Trial 2**, the combination of **WBC** and **WITH-VR** results in an additional increase of **164.217 units** in `Total_Time` compared to the baseline, beyond what the individual effects would predict.

- **Controller[T.WBC]:Modality[T.WITH-VR]:Trial[T.3]**
  - **Coefficient:** 114.496 (p = 0.173)
  - **Interpretation:**  
    For **Trial 3**, a similar pattern is observed, but this interaction is not statistically significant.

---

### Random Effects

- **Subject Var:** 7870.822
  - **Interpretation:**  
    This represents the estimated variance among participants' intercepts, indicating substantial differences in baseline `Total_Time` across participants.

---

## Key Takeaways

- **Main Effects:**
  - The **WBC** controller is associated with longer task completion times than the **CHIRON** controller.
  - The **WITH-VR** modality leads to longer task completion times compared to **WITHOUT-VR**.
  
- **Trial Effects:**
  - There is a trend toward improvement (i.e., reduced `Total_Time`) in later trials, though these effects are not statistically significant.
  
- **Interactions:**
  - Most interaction terms are not significant, except for the significant three-way interaction in **Trial 2**. This suggests that in **Trial 2**, the combination of using **WBC** with **WITH-VR** has a uniquely large effect on increasing `Total_Time`.

---

## Implications for the Study

- **Controller and Modality:**  
  The analysis suggests that the **CHIRON** controller and the **WITHOUT-VR** condition may result in faster task completion times.

- **Learning Effects:**  
  Although there is a trend indicating improvement over trials, these effects are not statistically robust in this model.

- **Combined Effects:**  
  The significant three-way interaction in **Trial 2** warrants further investigation, as it indicates that the combination of **WBC** and **WITH-VR** in that trial has a unique impact on performance.
