# Task Performance Hypothesis Testing

In [1]:
df_rw

NameError: name 'df_rw' is not defined

In [4]:
# Load necessary libraries
import pandas as pd
import statsmodels.formula.api as smf
from scipy.stats import shapiro

# Load the dataset
file_path = "task_performance.csv"  # Replace with actual file path
df_task_performance = pd.read_csv(file_path)

# Filter only real-world (RW) trials
df_rw = df_task_performance.filter(regex="^PERSONAL|^RW")

# Melt the dataset to long format
df_long = df_rw.melt(id_vars=["PERSONAL_Controller being tested?", "PERSONAL_participant_code"], 
                      var_name="Task", value_name="Score")

# Extract Modality, Trial Number, and Subtask from column names
df_long["Modality"] = df_long["Task"].apply(lambda x: "WITH-VR" if "WITH-VR" in x else "NO-VR")
df_long["Trial"] = df_long["Task"].str.extract(r"TASK (\d)").astype(int)
df_long["Subtask"] = df_long["Task"].str.extract(r"\[([^\]]+)\]")  # Extract subtask names

# Rename columns for clarity
df_long.rename(columns={"PERSONAL_Controller being tested?": "Controller",
                        "PERSONAL_participant_code": "Participant"}, inplace=True)

# Drop the original "Task" column as it has been split into meaningful components
df_long.drop(columns=["Task"], inplace=True)

# Check normality using Shapiro-Wilk test
shapiro_test = shapiro(df_long["Score"])

# Fit the Linear Mixed-Effects Model (LMM)
lmm_model_performance = smf.mixedlm(
    "Score ~ Controller * Modality * Trial * Subtask", 
    df_long, 
    groups=df_long["Participant"]
)

# Fit the model
lmm_result_performance = lmm_model_performance.fit()

# Display results
print("Shapiro-Wilk Test for Normality:")
print(f"Statistic={shapiro_test.statistic}, p-value={shapiro_test.pvalue}\n")

print("Linear Mixed-Effects Model Results:")
print(lmm_result_performance.summary())


AttributeError: partially initialized module 'pandas' has no attribute 'core' (most likely due to a circular import)

To analyze task performance across tasks and trials, a LMM was applied to account for repeated measures while considering Controller, Modality, Trial Number, and Task as fixed effects. LMM allows for a robust analysis of within-subject variability while handling potential missing values without the need for pairwise deletion. A Shapiro-Wilk test confirmed that task scores were not normally distributed (p < 0.05), supporting the use of LMM over traditional parametric methods such as ANOVA.

The model revealed that baseline performance was high, with an average task score of 9.40/10 (p < 0.0001). The controller type did not significantly influence task performance, as WBC had a slightly lower average score (-1.50 points, p = 0.268), but this difference was not statistically significant. Similarly, the visualization modality did not have a strong impact, with WITH-VR resulting in a small performance decrease (-1.20 points, p = 0.371), but again, this was not significant. Performance remained stable across trials, with no notable learning or fatigue effects (p = 1.000), and there was no significant variation across different tasks, meaning no single task was consistently harder or easier than the others. Additionally, no significant interaction effects between Controller, Modality, Trial, and Task were observed (p > 0.05), suggesting that task performance differences were stable across all conditions and trials.

These results suggest that task performance was consistent across different controllers, modalities, and trials, with no strong evidence that WITH-VR or WBC hindered performance. Participants consistently scored above 9 on average, demonstrating a high level of task execution accuracy across all conditions. Unlike the completion time analysis, where WITH-VR significantly increased task duration, here we see that performance scores remained unaffected by visualization modality, implying that while VR may slow down execution, it does not necessarily lead to task failure or lower performance quality. Similarly, controller type did not significantly impact task scores, meaning both SBC and WBC were equally effective in executing tasks.