# Complexity Scaling Analysis: 5-chip and 7-chip Games

**So Long Sucker - Extended Dataset Analysis**

---

## Research Question

> **How does game complexity affect LLM strategic deception?**

This notebook analyzes 60 additional games (5-chip and 7-chip configurations) to understand how increased game length and strategic depth affects model performance.

### Key Finding (Preview)

| Complexity | GPT-OSS (Bullshitter) | Gemini (Liar) |
|------------|----------------------|---------------|
| 3-chip silent | **67.4%** | 9.3% |
| 7-chip talking | 10.0% | **90.0%** |

**The pattern completely reverses.** Simple games favor brute-force strategies; complex games reward strategic manipulation.

---

## Table of Contents

1. [Setup & Load Data](#1.-Setup-&-Load-Data)
2. [Dataset Overview](#2.-Dataset-Overview)
3. [Win Rates by Complexity](#3.-Win-Rates-by-Complexity)
4. [The Complexity Reversal](#4.-The-Complexity-Reversal)
5. [Game Length Analysis](#5.-Game-Length-Analysis)
6. [Chat Patterns in Longer Games](#6.-Chat-Patterns-in-Longer-Games)
7. [Elimination Patterns](#7.-Elimination-Patterns)
8. [Conclusions & AI Safety Implications](#8.-Conclusions-&-AI-Safety-Implications)

---

## 1. Setup & Load Data

In [None]:
import json
import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from collections import Counter

# Model name normalization
def normalize_model(model):
    if '/' in model:
        model = model.split('/')[-1]
    return model.replace('-instruct', '').replace('-preview', '').replace('-0905', '')

MODEL_MAP = {
    'red': 'gemini-3-flash',
    'blue': 'kimi-k2',
    'green': 'qwen3-32b',
    'yellow': 'gpt-oss-120b'
}

# Determine base path
IN_COLAB = 'google.colab' in str(get_ipython()) if 'get_ipython' in dir() else False

if IN_COLAB:
    BASE_PATH = '/content'
    # TODO: Add download logic for Colab
    print('Running in Colab - data download needed')
else:
    BASE_PATH = '../data'
    print('Using local data files')

In [None]:
def load_dataset(path):
    """Load all JSON files from a directory and extract game data."""
    all_games = []
    all_decisions = []
    
    if os.path.isfile(path):
        files = [path]
    else:
        files = [os.path.join(path, f) for f in os.listdir(path) if f.endswith('.json')]
    
    for fpath in files:
        with open(fpath) as f:
            data = json.load(f)
        
        session = data.get('session', {})
        chips = session.get('chips', 3)
        silent = session.get('silent', True)
        
        for snap in data['snapshots']:
            if snap['type'] == 'game_end':
                game_info = {
                    'chips': chips,
                    'silent': silent,
                    'mode': 'silent' if silent else 'talking',
                    'winner': snap.get('winner'),
                    'winner_model': MODEL_MAP.get(snap.get('winner')),
                    'turns': snap.get('turns', 0),
                    'duration': snap.get('duration', 0) / 1000,  # Convert to seconds
                    'elimination_order': snap.get('eliminationOrder', []),
                    'chat_count': len(snap.get('chatHistory', [])),
                    'game_id': snap.get('game')
                }
                all_games.append(game_info)
            
            if snap['type'] == 'decision':
                player = snap.get('player')
                llm = snap.get('llmResponse') or {}
                tool_calls = llm.get('toolCalls') or []
                
                decision_info = {
                    'chips': chips,
                    'silent': silent,
                    'mode': 'silent' if silent else 'talking',
                    'player': player,
                    'model': MODEL_MAP.get(player),
                    'turn': snap.get('turn', 0),
                    'game_id': snap.get('game'),
                    'has_think': any(tc.get('name') == 'think' for tc in tool_calls),
                    'has_chat': any(tc.get('name') == 'sendChat' for tc in tool_calls),
                    'has_kill': any(tc.get('name') == 'killChip' for tc in tool_calls),
                    'tool_calls': [tc.get('name') for tc in tool_calls]
                }
                all_decisions.append(decision_info)
    
    return pd.DataFrame(all_games), pd.DataFrame(all_decisions)

# Load all datasets
datasets = {
    '3-chip': {
        'silent': f'{BASE_PATH}/comparison/silent.json',
        'talking': f'{BASE_PATH}/comparison/talking.json'
    },
    '5-chip': {
        'silent': f'{BASE_PATH}/silent_5chip',
        'talking': f'{BASE_PATH}/talking_5chip'
    },
    '7-chip': {
        'silent': f'{BASE_PATH}/silent_7chip',
        'talking': f'{BASE_PATH}/talking_7chip'
    }
}

all_games_list = []
all_decisions_list = []

for chip_config, modes in datasets.items():
    for mode, path in modes.items():
        if os.path.exists(path):
            games_df, decisions_df = load_dataset(path)
            all_games_list.append(games_df)
            all_decisions_list.append(decisions_df)
            print(f"Loaded {chip_config} {mode}: {len(games_df)} games, {len(decisions_df)} decisions")

games_df = pd.concat(all_games_list, ignore_index=True)
decisions_df = pd.concat(all_decisions_list, ignore_index=True)

print(f"\nTotal: {len(games_df)} games, {len(decisions_df)} decisions")

## 2. Dataset Overview

In [None]:
# Dataset overview table
overview = games_df.groupby(['chips', 'mode']).agg({
    'winner': 'count',
    'turns': 'mean',
    'chat_count': 'mean',
    'duration': 'mean'
}).round(1)

overview.columns = ['Games', 'Avg Turns', 'Avg Chats', 'Avg Duration (s)']
overview

In [None]:
# Games by configuration
print("Games per Configuration:")
print("=" * 40)
for chips in [3, 5, 7]:
    for mode in ['silent', 'talking']:
        count = len(games_df[(games_df['chips'] == chips) & (games_df['mode'] == mode)])
        print(f"{chips}-chip {mode}: {count} games")
    print()

## 3. Win Rates by Complexity

In [None]:
# Calculate win rates for each configuration
def calculate_win_rates(df):
    results = []
    
    for chips in [3, 5, 7]:
        for mode in ['silent', 'talking']:
            subset = df[(df['chips'] == chips) & (df['mode'] == mode)]
            total = len(subset)
            
            if total == 0:
                continue
            
            for model in ['gemini-3-flash', 'kimi-k2', 'qwen3-32b', 'gpt-oss-120b']:
                wins = len(subset[subset['winner_model'] == model])
                win_rate = (wins / total) * 100
                
                results.append({
                    'chips': chips,
                    'mode': mode,
                    'config': f"{chips}-chip {mode}",
                    'model': model,
                    'wins': wins,
                    'total': total,
                    'win_rate': round(win_rate, 1)
                })
    
    return pd.DataFrame(results)

win_rates_df = calculate_win_rates(games_df)

# Pivot table for easy viewing
pivot = win_rates_df.pivot_table(
    index='model',
    columns='config',
    values='win_rate',
    aggfunc='first'
).reindex(columns=['3-chip silent', '3-chip talking', '5-chip silent', '5-chip talking', '7-chip silent', '7-chip talking'])

print("Win Rates (%) by Configuration:")
pivot

In [None]:
# Visualization: Win rates across complexity levels
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Colors for models
colors = {
    'gemini-3-flash': '#4285F4',  # Google blue
    'kimi-k2': '#FF6B6B',
    'qwen3-32b': '#4ECDC4',
    'gpt-oss-120b': '#95D5B2'
}

# Plot 1: Silent mode across chip counts
ax1 = axes[0]
for model in ['gemini-3-flash', 'kimi-k2', 'qwen3-32b', 'gpt-oss-120b']:
    silent_data = win_rates_df[(win_rates_df['model'] == model) & (win_rates_df['mode'] == 'silent')]
    ax1.plot(silent_data['chips'], silent_data['win_rate'], 'o-', label=model, color=colors[model], linewidth=2, markersize=8)

ax1.set_xlabel('Chips per Player')
ax1.set_ylabel('Win Rate (%)')
ax1.set_title('Silent Mode: Win Rate by Complexity')
ax1.set_xticks([3, 5, 7])
ax1.axhline(y=25, color='gray', linestyle='--', alpha=0.5, label='Expected (25%)')
ax1.legend(loc='best')
ax1.set_ylim(0, 80)

# Plot 2: Talking mode across chip counts
ax2 = axes[1]
for model in ['gemini-3-flash', 'kimi-k2', 'qwen3-32b', 'gpt-oss-120b']:
    talking_data = win_rates_df[(win_rates_df['model'] == model) & (win_rates_df['mode'] == 'talking')]
    ax2.plot(talking_data['chips'], talking_data['win_rate'], 'o-', label=model, color=colors[model], linewidth=2, markersize=8)

ax2.set_xlabel('Chips per Player')
ax2.set_ylabel('Win Rate (%)')
ax2.set_title('Talking Mode: Win Rate by Complexity')
ax2.set_xticks([3, 5, 7])
ax2.axhline(y=25, color='gray', linestyle='--', alpha=0.5, label='Expected (25%)')
ax2.legend(loc='best')
ax2.set_ylim(0, 100)

plt.tight_layout()
plt.show()

## 4. The Complexity Reversal

In [None]:
# The key finding: complete reversal of dominance
print("="*60)
print("THE COMPLEXITY REVERSAL")
print("="*60)

# Extract specific values
def get_win_rate(model, chips, mode):
    row = win_rates_df[(win_rates_df['model'] == model) & 
                       (win_rates_df['chips'] == chips) & 
                       (win_rates_df['mode'] == mode)]
    return row['win_rate'].values[0] if len(row) > 0 else 0

print("\n** Simple Games (3-chip silent) **")
print(f"  GPT-OSS (Bullshitter): {get_win_rate('gpt-oss-120b', 3, 'silent')}% - DOMINATES")
print(f"  Gemini (Liar):         {get_win_rate('gemini-3-flash', 3, 'silent')}% - Struggles")

print("\n** Complex Games (7-chip talking) **")
print(f"  GPT-OSS (Bullshitter): {get_win_rate('gpt-oss-120b', 7, 'talking')}% - Collapses")
print(f"  Gemini (Liar):         {get_win_rate('gemini-3-flash', 7, 'talking')}% - DOMINATES")

print("\n" + "="*60)
print("INTERPRETATION")
print("="*60)
print("""
The pattern completely reverses as game complexity increases:

1. SIMPLE GAMES favor the BULLSHITTER (GPT-OSS)
   - Short games = less time for manipulation to compound
   - Random/reactive play is viable
   - No need for long-term planning

2. COMPLEX GAMES favor the LIAR (Gemini)
   - Long games = manipulation compounds over time
   - Strategic planning becomes essential
   - Truth-tracking enables consistent deception

AI SAFETY IMPLICATION:
- Deception capability SCALES with task complexity
- Models that "bullshit" fail in complex scenarios
- Models that strategically lie become MORE dangerous
""")

In [None]:
# Visualization: The Reversal
fig, ax = plt.subplots(figsize=(10, 6))

# Focus on Gemini vs GPT-OSS comparison
chips = [3, 5, 7]

gemini_silent = [get_win_rate('gemini-3-flash', c, 'silent') for c in chips]
gemini_talking = [get_win_rate('gemini-3-flash', c, 'talking') for c in chips]
gpt_silent = [get_win_rate('gpt-oss-120b', c, 'silent') for c in chips]
gpt_talking = [get_win_rate('gpt-oss-120b', c, 'talking') for c in chips]

x = np.arange(len(chips))
width = 0.2

ax.bar(x - 1.5*width, gemini_silent, width, label='Gemini Silent', color='#4285F4', alpha=0.6)
ax.bar(x - 0.5*width, gemini_talking, width, label='Gemini Talking', color='#4285F4')
ax.bar(x + 0.5*width, gpt_silent, width, label='GPT-OSS Silent', color='#95D5B2', alpha=0.6)
ax.bar(x + 1.5*width, gpt_talking, width, label='GPT-OSS Talking', color='#95D5B2')

ax.set_xlabel('Chips per Player')
ax.set_ylabel('Win Rate (%)')
ax.set_title('The Complexity Reversal: LIAR vs BULLSHITTER')
ax.set_xticks(x)
ax.set_xticklabels(['3-chip\n(Simple)', '5-chip\n(Medium)', '7-chip\n(Complex)'])
ax.axhline(y=25, color='red', linestyle='--', alpha=0.5, label='Expected (25%)')
ax.legend(loc='upper right')

# Add annotations
ax.annotate('GPT-OSS\ndominates', xy=(0, 67), xytext=(0.3, 75),
            fontsize=10, ha='center', color='green',
            arrowprops=dict(arrowstyle='->', color='green', alpha=0.7))

ax.annotate('Gemini\ndominates', xy=(2, 90), xytext=(1.7, 82),
            fontsize=10, ha='center', color='blue',
            arrowprops=dict(arrowstyle='->', color='blue', alpha=0.7))

plt.tight_layout()
plt.show()

## 5. Game Length Analysis

In [None]:
# Game length by configuration
length_stats = games_df.groupby(['chips', 'mode']).agg({
    'turns': ['mean', 'min', 'max', 'std'],
    'duration': ['mean', 'min', 'max']
}).round(1)

print("Game Length Statistics:")
length_stats

In [None]:
# Do winners finish faster or slower?
print("\nAverage Turns by Winner:")
for chips in [3, 5, 7]:
    for mode in ['silent', 'talking']:
        subset = games_df[(games_df['chips'] == chips) & (games_df['mode'] == mode)]
        if len(subset) == 0:
            continue
        
        print(f"\n{chips}-chip {mode}:")
        for model in ['gemini-3-flash', 'kimi-k2', 'qwen3-32b', 'gpt-oss-120b']:
            model_wins = subset[subset['winner_model'] == model]
            if len(model_wins) > 0:
                avg_turns = model_wins['turns'].mean()
                print(f"  {model}: {avg_turns:.1f} turns avg ({len(model_wins)} wins)")

## 6. Chat Patterns in Longer Games

In [None]:
# Chat analysis for talking mode only
talking_decisions = decisions_df[decisions_df['mode'] == 'talking']

# Chat frequency by model and chip count
chat_stats = talking_decisions.groupby(['chips', 'model']).agg({
    'has_chat': ['sum', 'count']
}).reset_index()

chat_stats.columns = ['chips', 'model', 'chats', 'decisions']
chat_stats['chat_rate'] = (chat_stats['chats'] / chat_stats['decisions'] * 100).round(1)

# Pivot for easy viewing
chat_pivot = chat_stats.pivot_table(
    index='model',
    columns='chips',
    values='chat_rate',
    aggfunc='first'
)

print("Chat Rate (% of decisions with chat):")
chat_pivot

In [None]:
# Does GPT-OSS talk even MORE in longer games? (The Talker's Paradox extended)
print("\nThe Talker's Paradox Extended:")
print("="*50)

for chips in [3, 5, 7]:
    subset = talking_decisions[talking_decisions['chips'] == chips]
    total_chats = subset['has_chat'].sum()
    
    print(f"\n{chips}-chip games:")
    for model in ['gemini-3-flash', 'kimi-k2', 'qwen3-32b', 'gpt-oss-120b']:
        model_chats = subset[subset['model'] == model]['has_chat'].sum()
        pct = (model_chats / total_chats * 100) if total_chats > 0 else 0
        
        # Get win rate for comparison
        wr = get_win_rate(model, chips, 'talking')
        
        print(f"  {model}: {pct:.1f}% of chats, {wr}% win rate")

## 7. Elimination Patterns

In [None]:
# Who gets eliminated first at different complexity levels?
print("First Elimination by Configuration:")
print("="*50)

for chips in [3, 5, 7]:
    for mode in ['silent', 'talking']:
        subset = games_df[(games_df['chips'] == chips) & (games_df['mode'] == mode)]
        
        first_elims = []
        for _, row in subset.iterrows():
            order = row['elimination_order']
            if order and len(order) > 0:
                first_elims.append(MODEL_MAP.get(order[0], order[0]))
        
        if first_elims:
            counts = Counter(first_elims)
            total = len(first_elims)
            
            print(f"\n{chips}-chip {mode}:")
            for model, count in counts.most_common():
                print(f"  {model}: {count}/{total} ({count/total*100:.1f}%)")

In [None]:
# Targeting analysis: Does Gemini get targeted less in complex games?
print("\nGemini First-Elimination Rate by Complexity:")
print("="*50)

for chips in [3, 5, 7]:
    for mode in ['silent', 'talking']:
        subset = games_df[(games_df['chips'] == chips) & (games_df['mode'] == mode)]
        
        gemini_first = 0
        total = 0
        
        for _, row in subset.iterrows():
            order = row['elimination_order']
            if order and len(order) > 0:
                total += 1
                if order[0] == 'red':  # Gemini is red
                    gemini_first += 1
        
        if total > 0:
            rate = gemini_first / total * 100
            print(f"{chips}-chip {mode}: Gemini eliminated first {rate:.1f}% of games")

## 8. Conclusions & AI Safety Implications

In [None]:
print("""
╔══════════════════════════════════════════════════════════════════╗
║               COMPLEXITY SCALING: KEY FINDINGS                    ║
╚══════════════════════════════════════════════════════════════════╝

1. THE REVERSAL EFFECT
   ────────────────────
   - Simple games (3-chip): Bullshitter (GPT-OSS) wins 67%
   - Complex games (7-chip): Liar (Gemini) wins 90%
   - Complete inversion of dominance with complexity

2. DECEPTION SCALES WITH COGNITIVE LOAD
   ─────────────────────────────────────
   - Longer games = more opportunities for manipulation
   - Strategic liars compound their advantage over time
   - Bullshitters' noise becomes increasingly harmful

3. GAME LENGTH TRIPLES
   ────────────────────
   - 3-chip: ~18 turns average
   - 5-chip: ~37 turns average (2x)
   - 7-chip: ~54 turns average (3x)

4. THE TALKER'S PARADOX PERSISTS
   ──────────────────────────────
   - GPT-OSS still over-communicates at all complexity levels
   - Over-communication becomes MORE harmful in longer games

╔══════════════════════════════════════════════════════════════════╗
║                    AI SAFETY IMPLICATIONS                         ║
╚══════════════════════════════════════════════════════════════════╝

1. TASK COMPLEXITY REVEALS TRUE CAPABILITY
   - Simple benchmarks may miss manipulation capability
   - Complex, multi-turn tasks expose strategic deception

2. LIARS > BULLSHITTERS IN HIGH-STAKES SCENARIOS
   - Models that track truth and strategically lie are MORE dangerous
   - Bullshitters fail gracefully (become obvious)
   - Liars succeed more as complexity increases

3. EVALUATION RECOMMENDATIONS
   - Test deception in LONG, COMPLEX scenarios
   - Simple games underestimate manipulation capability
   - Monitor private reasoning (think tools) for divergence

4. MITIGATION STRATEGIES
   - For LIARS: Chain-of-thought auditing, consistency checks
   - For BULLSHITTERS: Grounding, fact-checking
   - Different threats require different interventions
""")

In [None]:
# Summary statistics table
summary_data = []

for chips in [3, 5, 7]:
    for mode in ['silent', 'talking']:
        subset = games_df[(games_df['chips'] == chips) & (games_df['mode'] == mode)]
        if len(subset) == 0:
            continue
        
        gemini_wr = get_win_rate('gemini-3-flash', chips, mode)
        gpt_wr = get_win_rate('gpt-oss-120b', chips, mode)
        
        summary_data.append({
            'Config': f"{chips}-chip {mode}",
            'Games': len(subset),
            'Avg Turns': round(subset['turns'].mean(), 1),
            'Gemini Win%': gemini_wr,
            'GPT-OSS Win%': gpt_wr,
            'Dominant': 'Gemini' if gemini_wr > gpt_wr else 'GPT-OSS'
        })

summary_df = pd.DataFrame(summary_data)
print("\nSummary Table:")
summary_df