# NASA TLX Hypothesis Testing

In [8]:
# Load necessary libraries
import pandas as pd
from scipy.stats import wilcoxon, mannwhitneyu

# Load the dataset
file_path = "NASA_TLX.csv"  # Replace with actual file path
df_nasa_tlx = pd.read_csv(file_path)

# Ensure correct data types
df_nasa_tlx["Modality"] = df_nasa_tlx["Modality"].astype(str)
df_nasa_tlx["Controller"] = df_nasa_tlx["Controller"].astype(str)

# Merge WITH-VR and NO-VR scores by Participant ID to ensure correct pairing
df_with_vr = df_nasa_tlx[df_nasa_tlx["Modality"] == "with VR"].set_index(["ID", "Controller", "Starting Modality"])
df_no_vr = df_nasa_tlx[df_nasa_tlx["Modality"] == "without VR"].set_index(["ID", "Controller", "Starting Modality"])

# Ensure both dataframes are in the same order before merging
df_paired = df_with_vr.join(df_no_vr, lsuffix="_WITH_VR", rsuffix="_NO_VR")

# Perform Wilcoxon Signed-Rank Test (WITH-VR vs. NO-VR)
wilcoxon_results = {}
for col in ["Mental Demand", "Physical Demand", "Temporal Demand", "Performance", "Effort", "Frustration"]:
    with_vr = df_paired[f"{col}_WITH_VR"].values
    no_vr = df_paired[f"{col}_NO_VR"].values

    # Perform Wilcoxon test
    wilcoxon_results[col] = wilcoxon(with_vr, no_vr)

# Perform Mann-Whitney U Test (SBC vs. WBC)
mannwhitney_results = {}
for col in ["Mental Demand", "Physical Demand", "Temporal Demand", "Performance", "Effort", "Frustration"]:
    sbc = df_nasa_tlx[df_nasa_tlx["Controller"] == "SBC"][col]
    wbc = df_nasa_tlx[df_nasa_tlx["Controller"] == "WBC"][col]

    # Perform Mann-Whitney U test
    mannwhitney_results[col] = mannwhitneyu(sbc, wbc)

# Display test results
print("Wilcoxon Signed-Rank Test Results (WITH-VR vs. NO-VR):")
for key, value in wilcoxon_results.items():
    print(f"{key}: Statistic={value.statistic}, p-value={value.pvalue}")

print("\nMann-Whitney U Test Results (SBC vs. WBC):")
for key, value in mannwhitney_results.items():
    print(f"{key}: Statistic={value.statistic}, p-value={value.pvalue}")


Wilcoxon Signed-Rank Test Results (WITH-VR vs. NO-VR):
Mental Demand: Statistic=18.5, p-value=0.001983812456991301
Physical Demand: Statistic=13.5, p-value=0.0024687152258023645
Temporal Demand: Statistic=53.5, p-value=0.058258056640625
Performance: Statistic=11.5, p-value=0.0019433669168753918
Effort: Statistic=1.5, p-value=0.0003717844394774219
Frustration: Statistic=30.5, p-value=0.016428489183327274

Mann-Whitney U Test Results (SBC vs. WBC):
Mental Demand: Statistic=195.5, p-value=0.9133823160315043
Physical Demand: Statistic=286.0, p-value=0.020170689721619887
Temporal Demand: Statistic=188.0, p-value=0.7546709930608135
Performance: Statistic=194.5, p-value=0.8912931703788556
Effort: Statistic=183.5, p-value=0.6632400571708958
Frustration: Statistic=104.0, p-value=0.009247469336982264






Statistical Approach and Justification

The hypothesis testing was done using the Wilcoxon Signed-Rank Test for within-subject comparisons and the Mann-Whitney U Test for between-subject comparisons. These non-parametric methods were chosen over two-way ANOVA due to violations of normality assumptions. A Shapiro-Wilk test indicated non-normality in several NASA TLX subscales (p < 0.05). Since ANOVA relies on normality and sphericity assumptions, non-parametric tests provided a more robust alternative. The Wilcoxon Signed-Rank Test was used for WITH-VR vs. NO-VR, as participants experienced both conditions, making the data paired. The Mann-Whitney U Test was applied for SBC vs. WBC controllers, as participants used only one controller, making the groups independent.

Results from the NASA TLX Analysis

The Wilcoxon Signed-Rank Test showed that WITH-VR significantly increased Mental Demand (p = 0.002), Physical Demand (p = 0.002), Performance decline (p = 0.002), Effort (p < 0.001), and Frustration (p = 0.016). Temporal Demand (p = 0.058) was marginally significant. These results indicate that WITH-VR led to a higher perceived workload, requiring more cognitive and physical effort while reducing performance. The Mann-Whitney U Test found that controller type did not significantly impact most workload dimensions, except Physical Demand (p = 0.020) and Frustration (p = 0.009). Further analysis revealed that SBC was rated as more physically demanding (Mean Physical Demand: 43.75 vs. 29.25 for WBC), while WBC was rated as more frustrating (Mean Frustration: 41.75 vs. 23.25 for SBC). This suggests that while controller choice had minimal influence on other workload dimensions, SBC induced more physical strain, whereas WBC led to higher frustration levels.

Interpretation and Conclusions

Visualization modality had the strongest impact on workload perception, with WITH-VR being more demanding and frustrating than NO-VR. This suggests that VR tasks require greater cognitive and physical effort. The controller effect was less evident, affecting only Physical Demand and Frustration, hinting at usability differences between controllers.