In [1]:
### Imports
import pandas as pd
import numpy as np

## Data Analysis

This notebook analyzes evaluation results comparing AI-generated chart interpretations under two conditions:

1. **Image only**: o4-mini receives chart without context
2. **Image + context**: o4-mini receives chart with surrounding textual context

Each interpretation was evaluated on four quality dimensions using 7-point Likert scales:
- Accuracy
- Clarity  
- Completeness
- Relevance

Plus an overall preference rating.

### Research Questions

1. Do context-enhanced interpretations score higher on quality dimensions?
2. Which condition do evaluators prefer overall?
3. How do the quality dimensions correlate with each other?

###  1. Dataset Loading & Preparation

In [None]:
### --- Reading the data ---
df = pd.read_csv('data/evaluation_results.csv')

### --- Restructuring the DataFrame to long format ---
metrics = ['accuracy', "clarity", "relevance", "completeness"]
conditions = ["with_context", "without_context"]

# Converting the dataset to long format
long_rows = []
for _, row in df.iterrows():
    for cond in conditions:
        long_rows.append({
            "item_index": row.item_index,
            "condition": cond,
            **{m: row[f"{cond}_{m}"] for m in metrics},
            "overall": sum(row[f"{cond}_{m}"] for m in metrics)/len(metrics),
            "preference": row.preference_actual
        })

# Save the long format DataFrame
long_df = pd.DataFrame(long_rows)

### --- Factorizing Categorical Variables ---

# Factorizing the condition & preference columns using the map function
## NB: Matching condition to preference
long_df['condition'] = long_df['condition'].map({
    "with_context": 1,
    "without_context": 0
})

long_df['preference'] = long_df['preference'].map({
    "with_context": 1,
    "without_context": 0,
    "equal": -1
})

### 2. Addressing Research Questions