# Multi-Model Personality Simulation - Study 2 (Likert Format)

This notebook implements the original Study 2 BFI-2 to Mini-Marker simulation using **Likert format** personality descriptions across multiple LLM models using the unified portal.py interface.

## Models to Test
- GPT-4
- GPT-4o  
- Llama-3.3-70B-Instruct
- DeepSeek-V3

## Data Flow
1. Load and preprocess Soto BFI-2 data
2. Apply reverse coding to personality items
3. Map numeric responses to **Likert format** descriptions (e.g., "Is outgoing, sociable: 5")
4. Generate personality simulation prompts
5. Run simulations across multiple models
6. Save results for analysis

## Key Difference from Expanded Format
- Uses concise Likert-style descriptions instead of expanded narrative descriptions
- Expected to show different personality assignment performance

## Next Steps
After running this notebook, use `study_2_likert_analysis.ipynb` for comprehensive analysis of the results.

In [1]:
import pandas as pd
import sys
from pathlib import Path

# Add shared modules to path
sys.path.append('../shared')

from simulation_utils import (
    SimulationConfig, 
    run_bfi_to_minimarker_simulation,
    retry_failed_participants
)
from schema_bfi2 import likert_scale
from mini_marker_prompt import get_likert_prompt

## Data Loading and Preprocessing

In [2]:
# Load the Soto BFI-2 dataset
data_path = Path('../../raw_data/Soto_data.xlsx')
if not data_path.exists():
    print(f"Data file not found at {data_path}")
    print("Please ensure the raw_data/Soto_data.xlsx file exists in the project root")
    raise FileNotFoundError(f"Data file not found: {data_path}")

data = pd.read_excel(data_path, sheet_name='data')
print(f"Loaded data shape: {data.shape}")
data.head()

Loaded data shape: (470, 704)


Unnamed: 0,case_id,age,sex,ethnicity,rel_acquaintance,rel_friend,rel_roommate,rel_boygirlfriend,rel_relative,rel_other,...,tneo_n3_dep,tneo_n4_sel,tneo_n5_imp,tneo_n6_vul,tneo_o1_fan,tneo_o2_aes,tneo_o3_fee,tneo_o4_act,tneo_o5_ide,tneo_o6_val
0,1,27.0,M,2.0,,,,,,,...,51.25,40.181818,64.0,55.102041,46.639344,46.969697,66.7,57.065217,41.984127,58.039216
1,2,26.0,M,3.0,,,,,,,...,69.632353,60.636364,66.272727,65.306122,54.836066,56.439394,51.7,51.630435,51.904762,45.784314
2,3,24.0,F,4.0,,,,,,,...,60.441176,74.272727,54.909091,65.306122,75.327869,56.439394,56.7,40.76087,51.904762,58.039216
3,4,33.0,M,3.0,,1.0,,,,,...,67.794118,58.363636,64.0,52.55102,54.836066,50.757576,36.7,65.217391,63.809524,58.039216
4,5,23.0,F,5.0,,,,,,,...,62.279412,67.454545,41.272727,60.204082,50.737705,48.863636,49.2,46.195652,38.015873,38.431373


In [3]:
tda_columns = [f"tda{i}" for i in range(1, 41)]
sbfi_columns = [f"bfi{i}" for i in range(1, 61)]
selected_columns = tda_columns + sbfi_columns

print(f"Original data shape: {data.shape}")

# Remove rows with missing values in the selected columns
data = data.dropna(subset=selected_columns)
print(f"Data shape after removing missing values: {data.shape}")

Original data shape: (470, 704)
Data shape after removing missing values: (438, 704)


In [4]:
# NOTE: For Likert format, we do NOT apply reverse coding to the personality descriptions
# The reverse coding is only applied during analysis, not during personality assignment
# This matches the original study workflow exactly

print("Likert format: Using original BFI-2 scores without reverse coding for personality descriptions")

Likert format: Using original BFI-2 scores without reverse coding for personality descriptions


In [5]:
# Map numeric values to Likert format descriptions
def convert_values_to_string(series, mapping):
    # Copy the series to not alter the original data
    series_converted = series.copy()
    # Apply the string mapping
    if series.name in mapping:
        series_converted = series_converted.apply(lambda x: f"{mapping[series.name]} {x};")
    return series_converted

# Apply the mapping function to each row of the dataset
mapped_data = data[sbfi_columns].apply(lambda df: convert_values_to_string(df, likert_scale))
mapped_data['combined_bfi2'] = mapped_data[['bfi' + str(i) for i in range(1, 61)]].apply(lambda row: ' '.join(row), axis=1)

# Add combined description to original data
data['combined_bfi2'] = mapped_data['combined_bfi2']

print("Likert format personality descriptions created successfully")
print(f"Final data shape: {data.shape}")

Likert format personality descriptions created successfully
Final data shape: (438, 705)


In [6]:
# Preview a personality description
print("Sample Likert format personality description:")
print(data.iloc[0]['combined_bfi2'][:500] + "...")

Sample Likert format personality description:
Is outgoing, sociable: 5; Is compassionate, has a soft heart: 5; Tends to be disorganized: 2; Is relaxed, handles stress well: 3; Has few artistic interests: 2; Has an assertive personality: 4; Is respectful, treats others with respect: 5; Tends to be lazy: 4; Stays optimistic after experiencing a setback: 5; Is curious about many different things: 2; Rarely feels excited or eager: 2; Tends to find fault with others: 2; Is dependable, steady: 5; Is moody, has up and down mood swings: 4; Is inven...


In [7]:
 # Test prompt generation with first participant
first_participant = data.iloc[0]
sample_prompt = get_likert_prompt(first_participant['combined_bfi2'])

print("="*80)
print("COMPLETE LIKERT FORMAT PROMPT SENT TO LLM")
print("="*80)
print(sample_prompt)
print("="*80)
print(f"Prompt length: {len(sample_prompt)} characters")
print(f"Prompt word count: {len(sample_prompt.split())} words")


COMPLETE LIKERT FORMAT PROMPT SENT TO LLM
### Context ###
You are participating in a personality psychology study. You have been assigned with personality traits.

### Your Assigned Personality ### 
The number indicates the extent to which you agree or disagree with that statement. 1 means 'Disagree Strongly', 3 means 'Neutral', and 5 means 'Agree Strongly'.

Is outgoing, sociable: 5; Is compassionate, has a soft heart: 5; Tends to be disorganized: 2; Is relaxed, handles stress well: 3; Has few artistic interests: 2; Has an assertive personality: 4; Is respectful, treats others with respect: 5; Tends to be lazy: 4; Stays optimistic after experiencing a setback: 5; Is curious about many different things: 2; Rarely feels excited or eager: 2; Tends to find fault with others: 2; Is dependable, steady: 5; Is moody, has up and down mood swings: 4; Is inventive, finds clever ways to do things: 4; Tends to be quiet: 2; Feels little sympathy for others: 1; Is systematic, likes to keep things in

## Multi-Model Simulation Configuration

In [8]:
# Configuration for different models and temperatures
# models_to_test = ['openai-gpt-3.5-turbo-0125','gpt-4', 'gpt-4o', 'llama', 'deepseek'] # put all models that you want to test here 
models_to_test = ['openai-gpt-3.5-turbo-0125']
temperatures = [1]  
batch_size = 25  # Smaller batch size for stability across different APIs

# Create participant data list from DataFrame
participants_data = data.to_dict('records')
print(f"Prepared {len(participants_data)} participants for simulation")

Prepared 438 participants for simulation


## Run Simulations for All Models

In [9]:
import threading
from concurrent.futures import ThreadPoolExecutor, as_completed
from datetime import datetime
import time

# Thread-safe logging
log_lock = threading.Lock()

def safe_print(message, prefix="INFO"):
    """Thread-safe printing with timestamp and prefix"""
    timestamp = datetime.now().strftime("%H:%M:%S")
    with log_lock:
        print(f"[{timestamp}] {prefix}: {message}")

def run_simulation(model, temperature):
    simulation_id = f"{model}_temp{temperature}"
    
    # Start message
    safe_print(f"Starting simulation: {model} (temp={temperature})", "START")
    
    config = SimulationConfig(
        model=model,
        temperature=temperature,
        batch_size=batch_size,
        max_workers=10
    )
    
    start_time = time.time()
    
    try:
        results = run_bfi_to_minimarker_simulation(
            participants_data=participants_data,
            config=config,
            output_dir="study_2_likert_results"
        )
        
        # Check for failures
        failed_count = sum(1 for r in results if isinstance(r, dict) and 'error' in r)
        duration = time.time() - start_time
        
        if failed_count > 0:
            safe_print(f"Completed {simulation_id} in {duration:.1f}s - WARNING: {failed_count} participants failed", "WARN")
        else:
            safe_print(f"Completed {simulation_id} in {duration:.1f}s - All participants successful", "SUCCESS")
        
        return (simulation_id, results)
        
    except Exception as e:
        duration = time.time() - start_time
        safe_print(f"Failed {simulation_id} after {duration:.1f}s - Error: {str(e)}", "ERROR")
        return (simulation_id, {"error": str(e)})

# Main execution
print("="*80)
print("STARTING PARALLEL SIMULATIONS")
print(f"Models: {models_to_test}")
print(f"Temperatures: {temperatures}")
print(f"Total combinations: {len(models_to_test) * len(temperatures)}")
print("="*80)

all_results = {}
start_time = time.time()

# Use ThreadPoolExecutor for parallel execution
with ThreadPoolExecutor(max_workers=len(models_to_test)) as executor:
    # Submit all jobs
    futures = [
        executor.submit(run_simulation, model, temperature)
        for model in models_to_test
        for temperature in temperatures
    ]
    
    # Collect results as they complete
    completed_count = 0
    total_jobs = len(futures)
    
    for future in as_completed(futures):
        key, result = future.result()
        all_results[key] = result
        completed_count += 1
        
        # Progress update
        safe_print(f"Progress: {completed_count}/{total_jobs} simulations completed", "PROGRESS")

total_duration = time.time() - start_time

# Final summary
print("\n" + "="*80)
print("SIMULATION SUMMARY")
print("="*80)
print(f"Total time: {total_duration:.1f} seconds")
print(f"Completed simulations: {len(all_results)}")

# Categorize results
successful = []
failed = []

for key, result in all_results.items():
    if isinstance(result, dict) and 'error' in result:
        failed.append(key)
    else:
        # Check for partial failures
        if isinstance(result, list):
            failed_participants = sum(1 for r in result if isinstance(r, dict) and 'error' in r)
            if failed_participants > 0:
                print(f"  {key}: SUCCESS (with {failed_participants} failed participants)")
            else:
                print(f"  {key}: SUCCESS")
            successful.append(key)
        else:
            successful.append(key)

if failed:
    print(f"\nFailed simulations ({len(failed)}):")
    for key in failed:
        print(f"  {key}: {all_results[key].get('error', 'Unknown error')}")

print("="*80)

STARTING PARALLEL SIMULATIONS
Models: ['openai-gpt-3.5-turbo-0125']
Temperatures: [1]
Total combinations: 1
[15:44:32] START: Starting simulation: openai-gpt-3.5-turbo-0125 (temp=1)
Starting simulation for 438 participants using openai-gpt-3.5-turbo-0125
Temperature: 1, Batch size: 25
Processing participants 0 to 24
Completed batch 0 to 24
Processing participants 25 to 49
Completed batch 25 to 49
Processing participants 50 to 74
Completed batch 50 to 74
Processing participants 75 to 99
Completed batch 75 to 99
Processing participants 100 to 124
Completed batch 100 to 124
Processing participants 125 to 149
Completed batch 125 to 149
Processing participants 150 to 174
OpenAI API error: Request timed out.
Error in get_personality_response: Request timed out.
OpenAI API error: Request timed out.
Error in get_personality_response: Request timed out.
Completed batch 150 to 174
Processing participants 175 to 199
JSON parsing failed (attempt #1): Failed to parse JSON:
{
"Bashful": 2,
"Bold": 4

## Retry Failed Participants (if any)

In [10]:
# Retry any failed participants
for key, results in all_results.items():
    if isinstance(results, list):
        failed_count = sum(1 for r in results if isinstance(r, dict) and 'error' in r)
        if failed_count > 0:
            print(f"Retrying {failed_count} failed participants for {key}")
            
            # Extract model and temperature from key
            model = key.split('_temp')[0]
            temperature = float(key.split('_temp')[1])
            
            config = SimulationConfig(
                model=model,
                temperature=temperature,
                batch_size=batch_size
            )
            
            updated_results = retry_failed_participants(
                results=results,
                participants_data=participants_data,
                prompt_generator=get_likert_prompt,  # Use likert-specific prompt function
                config=config,
                personality_key='combined_bfi2'
            )
            
            all_results[key] = updated_results
            
            # Save updated results
            from simulation_utils import save_simulation_results
            save_simulation_results(updated_results, "study_2_likert_results", "bfi_to_minimarker_likert_retried", config)

print("Retry process completed")

Retry process completed


## Results Summary

In [11]:
# Analyze results summary
print("Likert Format Simulation Results Summary:")
print("=" * 50)

for key, results in all_results.items():
    if isinstance(results, list):
        total_participants = len(results)
        successful = sum(1 for r in results if not (isinstance(r, dict) and 'error' in r))
        failed = total_participants - successful
        success_rate = (successful / total_participants) * 100
        
        print(f"{key}:")
        print(f"  Total: {total_participants}, Successful: {successful}, Failed: {failed}")
        print(f"  Success Rate: {success_rate:.1f}%")
        print()
    else:
        print(f"{key}: FAILED - {results.get('error', 'Unknown error')}")
        print()

Likert Format Simulation Results Summary:
openai-gpt-3.5-turbo-0125_temp1:
  Total: 438, Successful: 438, Failed: 0
  Success Rate: 100.0%



## Save Preprocessed Data

In [12]:
# Save the preprocessed data for reference
output_path = Path('study_2_likert_results')
output_path.mkdir(exist_ok=True)

data.to_csv(output_path / 'study2_likert_preprocessed_data.csv', index=False)
print(f"Preprocessed data saved to {output_path / 'study2_likert_preprocessed_data.csv'}")

print("\n" + "="*60)
print("LIKERT FORMAT SIMULATION COMPLETE!")
print("\nNext steps:")
print("1. Run study_2_likert_analysis.ipynb for comprehensive analysis")
print("2. Results are saved in study_2_likert_results/ directory")
print("3. Compare with expanded format results for format comparison")
print("4. Preprocessed data available for validation")
print("="*60)

Preprocessed data saved to study_2_likert_results/study2_likert_preprocessed_data.csv

LIKERT FORMAT SIMULATION COMPLETE!

Next steps:
1. Run study_2_likert_analysis.ipynb for comprehensive analysis
2. Results are saved in study_2_likert_results/ directory
3. Compare with expanded format results for format comparison
4. Preprocessed data available for validation
