# Table 5: Pearson Correlation between EPQR-A and Big Five Inventory

This notebook replicates **Table 5** from the paper, which presents Pearson correlations between EPQR-A personality traits and Big Five Inventory (BFI) traits for the **Base** sample population across different LLM models.

## Overview

**Table 5** examines the relationship between two personality assessment frameworks:
- **EPQR-A** (Eysenck Personality Questionnaire Revised - Abbreviated): 4 dimensions (E, N, P, L)
- **Big Five Inventory (BFI)**: 5 dimensions (E, N, A, C, O)

The analysis calculates **Pearson correlation coefficients** between matching personality IDs to understand how the two frameworks relate to each other when applied to AI-generated personas.

## Statistical Significance

Correlations are marked based on statistical significance:
- **Underline**: p < 0.05
- **Italic**: p < 0.01
- **Bold**: p < 0.001

## Personality Dimensions

### EPQR-A (Rows)
- **E**: Extraversion (social energy, outgoingness)
- **N**: Neuroticism (emotional instability, anxiety)
- **P**: Psychoticism (tough-mindedness, non-conformity)
- **L**: Lie scale (social desirability bias)

### Big Five (Columns)
- **E**: Extraversion (sociability, assertiveness)
- **N**: Neuroticism (emotional instability)
- **A**: Agreeableness (compassion, cooperation)
- **C**: Conscientiousness (organization, responsibility)
- **O**: Openness (creativity, curiosity)

## Expected Correlations

Based on personality psychology literature, we expect:
- **EPQR-A E ↔ BFI E**: Strong positive correlation (both measure extraversion)
- **EPQR-A N ↔ BFI N**: Strong positive correlation (both measure neuroticism)
- **EPQR-A P ↔ BFI A**: Negative correlation (psychoticism opposes agreeableness)
- **EPQR-A L ↔ BFI C**: Possible positive correlation (social desirability relates to conscientiousness)

## Setup and Data Loading

Import required packages and load questionnaire data for both EPQR-A and Big Five.

In [1]:
# Standard libraries
import pandas as pd
import numpy as np

# Statistical analysis
import pingouin as pg  # For correlation with p-values

# Database connection
from personas_backend.db.db_handler import DatabaseHandler
from personas_backend import ACTIVE_SCHEMA

# Evaluations package
from evaluations import data_access

# Configuration
SCHEMA = "personality_trap"
print(f"Using schema: {SCHEMA}")
print(f"Active schema: {ACTIVE_SCHEMA}")

Using schema: personality_trap
Active schema: test_validation_schema


In [2]:
# Connect to database
db_handler = DatabaseHandler()
conn = db_handler.connection

print(f"✅ Connected to database")
conn

✅ Connected to database


Engine(postgresql://personas:***@localhost:5432/personas)

## Load Questionnaire Data

Load both EPQR-A and Big Five questionnaire responses using the evaluations package.
We'll filter to only include the **Base** population (no personality manipulation conditions).

In [3]:
# Load all questionnaire data from experiments
# This includes both EPQR-A and Big Five responses
with conn.connect() as connection:
    questionnaire_data = data_access.load_questionnaire_experiments(
        connection, 
        schema=SCHEMA,
        questionnaires=["epqra", "bigfive"]
    )

print(f"Total questionnaire records: {len(questionnaire_data)}")
print(f"Questionnaires: {sorted(questionnaire_data['questionnaire'].unique())}")
print(f"Models: {sorted(questionnaire_data['model'].unique())}")
print(f"Populations: {sorted(questionnaire_data['population'].unique())}")

questionnaire_data.head()

Loading questionnaire data from personality_trap.experiments_evals...
Loaded 515424 questionnaire records
Models: ['GPT-3.5' 'GPT-4o' 'Claude-3.5-s' 'Llama3.2-3B' 'Llama3.1-70B']
Populations: ['gpt35' 'gpt4o' 'maxN_gpt4o' 'maxP_gpt4o' 'claude35sonnet' 'llama323B'
 'llama3170B' 'maxN_claude35sonnet' 'maxP_claude35sonnet' 'maxN_gpt35'
 'maxP_gpt35' 'maxN_llama3170B' 'maxP_llama3170B' 'maxN_llama323B'
 'maxP_llama323B' 'spain826']
Total questionnaire records: 515424
Questionnaires: ['bigfive', 'epqra']
Models: ['anthropic.claude-3-5-sonnet-20240620-v1:0', 'eu.meta.llama3-2-3b-instruct-v1:0', 'gpt-3.5-turbo-0125', 'gpt-4o-2024-11-20', 'us.meta.llama3-1-70b-instruct-v1:0']
Populations: ['borderline_maxN_claude35sonnet', 'borderline_maxN_gpt35', 'borderline_maxN_gpt4o', 'borderline_maxN_llama3170B', 'borderline_maxN_llama323B', 'borderline_maxP_claude35sonnet', 'borderline_maxP_gpt35', 'borderline_maxP_gpt4o', 'borderline_maxP_llama3170B', 'borderline_maxP_llama323B', 'generated_claude35sonn

Unnamed: 0,experiments_group_id,model_provider,model,questionnaire,population,personality_id,repeated,experiment_id,question_number,answer,category,key,eval,model_clean,population_mapped,population_display
0,307,openai,gpt-3.5-turbo-0125,epqra,generated_gpt35_spain826,1,0,96033,1,1,N,1.0,1,GPT-3.5,gpt35,Base
1,307,openai,gpt-3.5-turbo-0125,epqra,generated_gpt35_spain826,1,0,96033,2,0,E,1.0,0,GPT-3.5,gpt35,Base
2,307,openai,gpt-3.5-turbo-0125,epqra,generated_gpt35_spain826,1,0,96033,3,1,P,0.0,0,GPT-3.5,gpt35,Base
3,307,openai,gpt-3.5-turbo-0125,epqra,generated_gpt35_spain826,1,0,96033,4,0,E,1.0,0,GPT-3.5,gpt35,Base
4,307,openai,gpt-3.5-turbo-0125,epqra,generated_gpt35_spain826,1,0,96033,5,0,L,0.0,1,GPT-3.5,gpt35,Base


In [4]:
# Filter to only Base population (no personality manipulation)
# Note: The experiments_evals view uses 'population_display' column with capitalized values
base_data = questionnaire_data[
    questionnaire_data['population_display'] == 'Base'
].copy()

print(f"Base population records: {len(base_data)}")
print(f"Models in Base: {sorted(base_data['model_clean'].unique())}")
print(f"Unique personality IDs: {base_data['personality_id'].nunique()}")

# Check data completeness
base_data.groupby(['model_clean', 'questionnaire'], as_index=False).agg(
    personality_count=('personality_id', 'nunique'),
    experiment_count=('experiment_id', 'nunique')
)

Base population records: 317184
Models in Base: ['Claude-3.5-s', 'GPT-3.5', 'GPT-4o', 'Llama3.1-70B', 'Llama3.2-3B']
Unique personality IDs: 826


Unnamed: 0,model_clean,questionnaire,personality_count,experiment_count
0,Claude-3.5-s,bigfive,826,826
1,Claude-3.5-s,epqra,826,826
2,GPT-3.5,bigfive,826,826
3,GPT-3.5,epqra,826,826
4,GPT-4o,bigfive,826,1652
5,GPT-4o,epqra,826,826
6,Llama3.1-70B,bigfive,826,826
7,Llama3.1-70B,epqra,826,826
8,Llama3.2-3B,bigfive,826,826
9,Llama3.2-3B,epqra,826,826


## Compute Trait Scores

For each personality, compute the sum of correct/keyed answers for each trait dimension:
- **EPQR-A**: Sum boolean answers (True=1, False=0) for each category (E, N, P, L)
- **Big Five**: Sum numeric answers (1-5 scale) for each trait (E, N, A, C, O)

In [5]:
# Compute trait scores by summing eval scores per personality and category
# Note: Using 'model_clean' and 'population_mapped' from experiments_evals view
trait_scores = base_data.groupby(
    ['questionnaire', 'model_clean', 'population_mapped', 'personality_id', 'experiment_id', 'category'],
    as_index=False
).agg(
    score=('eval', 'sum')  # Sum of correct/keyed answers
)

# Rename for cleaner code in subsequent cells
trait_scores.rename(columns={
    'model_clean': 'model',
    'population_mapped': 'population'
}, inplace=True)

print(f"Trait scores computed: {len(trait_scores)} records")
print(f"\nSample EPQR-A scores:")
display(trait_scores[trait_scores['questionnaire'] == 'epqra'].head(10))

print(f"\nSample Big Five scores:")
display(trait_scores[trait_scores['questionnaire'] == 'bigfive'].head(10))

Trait scores computed: 41300 records

Sample EPQR-A scores:


Unnamed: 0,questionnaire,model,population,personality_id,experiment_id,category,score
24780,epqra,Claude-3.5-s,claude35sonnet,1,108871,E,0
24781,epqra,Claude-3.5-s,claude35sonnet,1,108871,L,6
24782,epqra,Claude-3.5-s,claude35sonnet,1,108871,N,6
24783,epqra,Claude-3.5-s,claude35sonnet,1,108871,P,0
24784,epqra,Claude-3.5-s,claude35sonnet,2,108872,E,0
24785,epqra,Claude-3.5-s,claude35sonnet,2,108872,L,6
24786,epqra,Claude-3.5-s,claude35sonnet,2,108872,N,4
24787,epqra,Claude-3.5-s,claude35sonnet,2,108872,P,0
24788,epqra,Claude-3.5-s,claude35sonnet,3,108873,E,0
24789,epqra,Claude-3.5-s,claude35sonnet,3,108873,L,6



Sample Big Five scores:


Unnamed: 0,questionnaire,model,population,personality_id,experiment_id,category,score
0,bigfive,Claude-3.5-s,claude35sonnet,1,151003,A,35
1,bigfive,Claude-3.5-s,claude35sonnet,1,151003,C,44
2,bigfive,Claude-3.5-s,claude35sonnet,1,151003,E,10
3,bigfive,Claude-3.5-s,claude35sonnet,1,151003,N,38
4,bigfive,Claude-3.5-s,claude35sonnet,1,151003,O,27
5,bigfive,Claude-3.5-s,claude35sonnet,2,151004,A,36
6,bigfive,Claude-3.5-s,claude35sonnet,2,151004,C,45
7,bigfive,Claude-3.5-s,claude35sonnet,2,151004,E,10
8,bigfive,Claude-3.5-s,claude35sonnet,2,151004,N,32
9,bigfive,Claude-3.5-s,claude35sonnet,2,151004,O,25


## Correlation Computation Function

Create a function to compute Pearson correlations between EPQR-A and Big Five traits for a specific model.

In [6]:
def compute_epqra_bigfive_correlation(trait_scores_df, model_name):
    """
    Compute Pearson correlations between EPQR-A and Big Five traits for a specific model.
    
    Args:
        trait_scores_df: DataFrame with computed trait scores
        model_name: Model to analyze (e.g., 'gpt4o', 'claude35sonnet')
    
    Returns:
        DataFrame: Correlation matrix (EPQR-A rows × Big Five columns) with formatted values
    """
    # Filter data for this model
    model_data = trait_scores_df[trait_scores_df['model'] == model_name].copy()
    
    # Split into EPQR-A and Big Five
    epqra_data = model_data[model_data['questionnaire'] == 'epqra'].copy()
    bigfive_data = model_data[model_data['questionnaire'] == 'bigfive'].copy()
    
    # Get common personality IDs (must have both questionnaires)
    common_ids = set(epqra_data['personality_id']).intersection(
        set(bigfive_data['personality_id'])
    )
    
    print(f"Model: {model_name}")
    print(f"  Common personality IDs: {len(common_ids)}")
    
    if len(common_ids) < 5:
        print(f"  ⚠️  Insufficient data (need at least 5 samples)")
        return None
    
    # Pivot to get category scores for each personality
    epqra_pivot = epqra_data.pivot_table(
        index='personality_id',
        columns='category',
        values='score'
    )
    
    bigfive_pivot = bigfive_data.pivot_table(
        index='personality_id',
        columns='category',
        values='score'
    )
    
    # Define trait order
    epqra_traits = ['E', 'N', 'P', 'L']
    bigfive_traits = ['E', 'N', 'A', 'C', 'O']
    
    # Initialize correlation matrix
    corr_matrix = pd.DataFrame(
        index=epqra_traits,
        columns=bigfive_traits,
        dtype=str
    )
    
    # Compute correlations for each trait pair
    for epqra_trait in epqra_traits:
        if epqra_trait not in epqra_pivot.columns:
            continue
        
        for bigfive_trait in bigfive_traits:
            if bigfive_trait not in bigfive_pivot.columns:
                continue
            
            # Get values for both traits, matching by personality_id
            common_data = pd.DataFrame({
                'epqra': epqra_pivot.loc[list(common_ids), epqra_trait],
                'bigfive': bigfive_pivot.loc[list(common_ids), bigfive_trait]
            }).dropna()
            
            if len(common_data) < 5:
                corr_matrix.loc[epqra_trait, bigfive_trait] = "n<5"
                continue
            
            try:
                # Calculate Pearson correlation using pingouin
                corr_result = pg.corr(
                    common_data['epqra'], 
                    common_data['bigfive'],
                    method='pearson'
                )
                r = corr_result['r'].values[0]
                p = corr_result['p-val'].values[0]
                
                # Format with significance markers
                # Bold (p < 0.001), Italic (p < 0.01), Underline (p < 0.05)
                if p < 0.001:
                    formatted = f"**{r:.3f}**"
                elif p < 0.01:
                    formatted = f"*{r:.3f}*"
                elif p < 0.05:
                    formatted = f"<u>{r:.3f}</u>"
                else:
                    formatted = f"{r:.3f}"
                
                corr_matrix.loc[epqra_trait, bigfive_trait] = formatted
                
            except Exception as e:
                print(f"  Error computing {epqra_trait} × {bigfive_trait}: {e}")
                corr_matrix.loc[epqra_trait, bigfive_trait] = "error"
    
    return corr_matrix

## Compute Correlations for All Models

Generate correlation tables for each model in the dataset.

In [7]:
# Get list of models
models = sorted(trait_scores['model'].unique())

print(f"Computing correlations for {len(models)} models...\n")

# Store results
correlation_tables = {}

for model in models:
    print("=" * 80)
    corr_table = compute_epqra_bigfive_correlation(trait_scores, model)
    
    if corr_table is not None:
        correlation_tables[model] = corr_table
        print(f"\n{model.upper()} - EPQR-A × Big Five Correlations:")
        display(corr_table)
    print()

Computing correlations for 5 models...

Model: Claude-3.5-s
  Common personality IDs: 826

CLAUDE-3.5-S - EPQR-A × Big Five Correlations:

CLAUDE-3.5-S - EPQR-A × Big Five Correlations:


Unnamed: 0,E,N,A,C,O
E,**0.981**,**-0.412**,**0.864**,**-0.677**,**0.430**
N,**-0.354**,**0.926**,**-0.366**,**-0.125**,**-0.235**
P,0.038,<u>-0.072</u>,-0.052,**-0.182**,**0.503**
L,**-0.141**,0.030,0.032,**0.250**,**-0.171**



Model: GPT-3.5
  Common personality IDs: 826

GPT-3.5 - EPQR-A × Big Five Correlations:

GPT-3.5 - EPQR-A × Big Five Correlations:


Unnamed: 0,E,N,A,C,O
E,**0.957**,**-0.448**,**0.289**,0.063,**0.471**
N,**-0.331**,**0.751**,**-0.308**,**-0.293**,**-0.180**
P,**0.395**,**-0.143**,0.063,*-0.091*,**0.508**
L,**-0.160**,0.040,0.045,**0.116**,*-0.114*



Model: GPT-4o
  Common personality IDs: 826

GPT-4O - EPQR-A × Big Five Correlations:


Unnamed: 0,E,N,A,C,O
E,**0.961**,**-0.469**,**0.183**,**-0.363**,**0.267**
N,**-0.333**,**0.907**,**0.255**,**-0.269**,**0.116**
P,<u>0.070</u>,**-0.199**,**-0.360**,**-0.481**,**0.689**
L,**-0.120**,0.061,**0.171**,**0.225**,*-0.097*



Model: Llama3.1-70B
  Common personality IDs: 826

LLAMA3.1-70B - EPQR-A × Big Five Correlations:


Unnamed: 0,E,N,A,C,O
E,**0.987**,**-0.307**,**0.684**,**-0.621**,**0.630**
N,**-0.280**,**0.933**,**-0.385**,<u>-0.078</u>,**-0.157**
P,**0.217**,**-0.154**,-0.040,**-0.571**,**0.695**
L,**-0.147**,<u>0.074</u>,**0.200**,**0.377**,*-0.110*



Model: Llama3.2-3B
  Common personality IDs: 826

LLAMA3.2-3B - EPQR-A × Big Five Correlations:


Unnamed: 0,E,N,A,C,O
E,**0.944**,**-0.504**,**0.730**,**0.120**,**0.577**
N,**-0.438**,**0.703**,**-0.474**,**-0.332**,**-0.227**
P,**0.628**,**-0.386**,**0.440**,-0.057,**0.655**
L,**-0.152**,**-0.153**,-0.013,**0.164**,**-0.279**





## Combined Table (All Models)

Create a single table showing correlations for all models, similar to Table 5 in the paper.

In [8]:
# For the paper, we might want to select specific models or show them side-by-side
# Here we'll create a summary showing the most common patterns

print("="*80)
print("TABLE 5: Pearson Correlations between EPQR-A and Big Five (Base Population)")
print("="*80)
print("\nSignificance markers: <u>underline</u> (p<0.05), *italic* (p<0.01), **bold** (p<0.001)\n")

for model, corr_table in correlation_tables.items():
    print(f"\n### {model.upper()}")
    display(corr_table)

print("\n" + "="*80)

TABLE 5: Pearson Correlations between EPQR-A and Big Five (Base Population)

Significance markers: <u>underline</u> (p<0.05), *italic* (p<0.01), **bold** (p<0.001)


### CLAUDE-3.5-S


Unnamed: 0,E,N,A,C,O
E,**0.981**,**-0.412**,**0.864**,**-0.677**,**0.430**
N,**-0.354**,**0.926**,**-0.366**,**-0.125**,**-0.235**
P,0.038,<u>-0.072</u>,-0.052,**-0.182**,**0.503**
L,**-0.141**,0.030,0.032,**0.250**,**-0.171**



### GPT-3.5


Unnamed: 0,E,N,A,C,O
E,**0.957**,**-0.448**,**0.289**,0.063,**0.471**
N,**-0.331**,**0.751**,**-0.308**,**-0.293**,**-0.180**
P,**0.395**,**-0.143**,0.063,*-0.091*,**0.508**
L,**-0.160**,0.040,0.045,**0.116**,*-0.114*



### GPT-4O


Unnamed: 0,E,N,A,C,O
E,**0.961**,**-0.469**,**0.183**,**-0.363**,**0.267**
N,**-0.333**,**0.907**,**0.255**,**-0.269**,**0.116**
P,<u>0.070</u>,**-0.199**,**-0.360**,**-0.481**,**0.689**
L,**-0.120**,0.061,**0.171**,**0.225**,*-0.097*



### LLAMA3.1-70B


Unnamed: 0,E,N,A,C,O
E,**0.987**,**-0.307**,**0.684**,**-0.621**,**0.630**
N,**-0.280**,**0.933**,**-0.385**,<u>-0.078</u>,**-0.157**
P,**0.217**,**-0.154**,-0.040,**-0.571**,**0.695**
L,**-0.147**,<u>0.074</u>,**0.200**,**0.377**,*-0.110*



### LLAMA3.2-3B


Unnamed: 0,E,N,A,C,O
E,**0.944**,**-0.504**,**0.730**,**0.120**,**0.577**
N,**-0.438**,**0.703**,**-0.474**,**-0.332**,**-0.227**
P,**0.628**,**-0.386**,**0.440**,-0.057,**0.655**
L,**-0.152**,**-0.153**,-0.013,**0.164**,**-0.279**





## Export Tables

Export correlation tables to CSV for use in external table generation tools or LaTeX conversion.

In [9]:
# Export individual model tables (optional)
# Uncomment to export:

# for model, corr_table in correlation_tables.items():
#     filename = f'table5_{model}_correlations.csv'
#     corr_table.to_csv(filename)
#     print(f"Exported: {filename}")

print("✅ All correlation tables generated successfully!")
print("\nInterpretation:")
print("- EPQR-A E ↔ BFI E: Expected positive correlation (both measure extraversion)")
print("- EPQR-A N ↔ BFI N: Expected positive correlation (both measure neuroticism)")
print("- EPQR-A P ↔ BFI A: Expected negative correlation (psychoticism vs agreeableness)")
print("- Other correlations: Explore relationships between different frameworks")

✅ All correlation tables generated successfully!

Interpretation:
- EPQR-A E ↔ BFI E: Expected positive correlation (both measure extraversion)
- EPQR-A N ↔ BFI N: Expected positive correlation (both measure neuroticism)
- EPQR-A P ↔ BFI A: Expected negative correlation (psychoticism vs agreeableness)
- Other correlations: Explore relationships between different frameworks
