# üß† Decision-Theoretic Choice Complexity in LLMs

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/soroushbagheri/choice-complexity-llm/blob/main/notebooks/demo_colab.ipynb)

---

## üìñ Abstract

This interactive notebook demonstrates a **two-tier framework** for measuring and regulating choice complexity in Large Language Models (LLMs). The framework addresses the *paradox of choice* and *bounded rationality* in AI decision-making through:

### üéØ Core Innovation

**Traditional Approach:**
- ‚ùå Present all options to LLM
- ‚ùå Hope the model makes good decisions
- ‚ùå No adaptation to complexity

**Our Two-Tier Framework:**
1. **Tier A - External Complexity (CCI)**: Measures *objective* difficulty of the choice set
   - Number of options, redundancy, attribute conflicts, Pareto optimality
2. **Tier B - Internal Complexity (ILDC)**: Measures *LLM's subjective* decision difficulty
   - Volatility, confidence variation, self-disagreement
3. **Unified Controller**: Adapts presentation based on both tiers
   - Prunes options, clusters similar items, applies satisficing

### üí° Key Insight

> High external complexity (CCI) doesn't always cause high internal difficulty (ILDC). The framework adapts based on **actual LLM struggle**, not just problem structure.

**Authors:** Soroush Bagheri | **Date:** January 2026 | **Status:** Under Review

---

## üî¨ Research Context & Novelty

### Why This Matters (2026)

LLMs are increasingly used for decision-making tasks (e.g., product recommendations, medical triage, legal analysis). However:

1. **Choice Overload is Real for LLMs**: Just like humans, LLMs show degraded performance with too many options
2. **No Existing Framework**: Prior work either:
   - Measures complexity but doesn't control it
   - Controls without measuring internal difficulty
   - Focuses on reasoning cost, not choice overload

### Novel Contributions

| Aspect | Prior Work | This Framework |
|--------|-----------|----------------|
| **Complexity Metric** | Generic (# options) | CCI: Multi-faceted (redundancy, conflicts, Pareto structure) |
| **Internal Signals** | None / Post-hoc analysis | ILDC: Real-time volatility & confidence tracking |
| **Control Strategy** | Static pruning | Adaptive based on CCI + ILDC |
| **Evaluation** | Accuracy only | Accuracy + Stability + Efficiency |

### Related Work Comparison

- **SITAlign** (Chehade et al., 2025): Satisficing alignment via reward thresholds ‚Üí No choice-set complexity measurement
- **CLAI** (Zhang et al., 2025): Cognitive-load-aware inference ‚Üí Focuses on reasoning steps, not option overload
- **Behavioral Econ LLMs** (Jia et al., 2024): Evaluates risk preferences ‚Üí No control mechanism
- **Consumer Choice AI** (Cherep et al., 2025): Choice architecture sensitivity ‚Üí Experimental, no unified framework

‚úÖ **This work is the first to combine**: Complexity metrics + Internal difficulty signals + Adaptive control

## üì¶ Setup: Install Dependencies and Clone Repository

In [None]:
# Clone the repository
!git clone https://github.com/soroushbagheri/choice-complexity-llm.git
%cd choice-complexity-llm

# Install required packages
!pip install -q numpy pandas scipy scikit-learn matplotlib seaborn tqdm pyyaml ipywidgets

# Enable widget support
from google.colab import output
output.enable_custom_widget_manager()

print('‚úÖ Setup complete!')

## üéõÔ∏è Interactive Configuration Panel

Use the sliders below to customize the experiment parameters. This lets you explore how different settings affect the results.

**Parameters:**
- **Number of Samples**: More samples = more robust statistics but slower execution
- **Random Seed**: Change for different random initializations
- **Max Options per Problem**: Upper limit on choice set size
- **Redundancy Level**: Percentage of near-duplicate options (simulates real-world messy data)

In [None]:
import ipywidgets as widgets
from IPython.display import display

# Create interactive widgets
n_samples_slider = widgets.IntSlider(
    value=50,
    min=20,
    max=200,
    step=10,
    description='Samples:',
    style={'description_width': '150px'}
)

seed_slider = widgets.IntSlider(
    value=42,
    min=0,
    max=100,
    step=1,
    description='Random Seed:',
    style={'description_width': '150px'}
)

max_options_slider = widgets.IntSlider(
    value=30,
    min=10,
    max=50,
    step=5,
    description='Max Options:',
    style={'description_width': '150px'}
)

redundancy_slider = widgets.FloatSlider(
    value=0.3,
    min=0.0,
    max=0.6,
    step=0.1,
    description='Redundancy:',
    style={'description_width': '150px'},
    readout_format='.1f'
)

controller_strategies = widgets.SelectMultiple(
    options=['none', 'naive_topk', 'cci_only', 'two_tier'],
    value=['none', 'naive_topk', 'cci_only', 'two_tier'],
    description='Strategies:',
    style={'description_width': '150px'},
    rows=4
)

# Display widgets
print('üéõÔ∏è Experiment Configuration Panel')
print('='*50)
display(widgets.VBox([
    widgets.HTML('<h4>Basic Parameters</h4>'),
    n_samples_slider,
    seed_slider,
    widgets.HTML('<h4>Dataset Parameters</h4>'),
    max_options_slider,
    redundancy_slider,
    widgets.HTML('<h4>Controller Strategies to Test</h4>'),
    controller_strategies,
    widgets.HTML('<br><i>Adjust parameters above, then run the next cell to execute the experiment.</i>')
]))

# Store values for later use
config = {
    'n_samples': n_samples_slider,
    'seed': seed_slider,
    'max_options': max_options_slider,
    'redundancy': redundancy_slider,
    'strategies': controller_strategies
}

## üéØ Run Customized Experiment

This cell executes the full framework with your selected parameters. It will:

1. Generate synthetic choice problems with controlled complexity
2. Compute CCI (external complexity) for each problem
3. Simulate LLM decisions with realistic volatility
4. Compute ILDC (internal complexity) from decision patterns
5. Apply each controller strategy and measure effectiveness
6. Generate comprehensive visualizations

**Expected Runtime:** ~5-30 seconds depending on sample size

In [None]:
# Get current parameter values
n_samples = config['n_samples'].value
seed = config['seed'].value
max_options = config['max_options'].value
redundancy = config['redundancy'].value

print(f'üöÄ Running experiment with:')
print(f'   - Samples: {n_samples}')
print(f'   - Seed: {seed}')
print(f'   - Max Options: {max_options}')
print(f'   - Redundancy: {redundancy:.1f}')
print(f'   - Strategies: {len(config["strategies"].value)}')
print()

# Run the demo
!python experiments/demo_with_results.py \
  --n-samples {n_samples} \
  --seed {seed} \
  --output results/colab_demo

print('
‚úÖ Experiment complete! Results saved to results/colab_demo/')

## üìä Results Summary & Statistical Analysis

### Understanding the Metrics

**Accuracy**: How often does the LLM choose the correct option?
- Baseline (no control): ~65-70%
- Target (two-tier): 75-80%

**Volatility**: How often does the LLM change its mind across repeated samples?
- 0.0 = Always same choice (perfect consistency)
- 1.0 = Every sample different (maximum instability)
- Lower is better (indicates stable, confident decisions)

**CCI (Choice Complexity Index)**: External problem difficulty [0-1]
- Combines: # options, redundancy, attribute conflicts, Pareto structure

**ILDC (Internal LLM Decision Complexity)**: LLM's subjective difficulty [0-1]
- Combines: volatility, confidence gaps, disagreement

**Options Shown**: Average number of options presented after control
- Fewer = Lower cognitive load
- But too few might miss good options!

In [None]:
import json
import pandas as pd
import numpy as np

# Load summary results
with open('results/colab_demo/summary.json', 'r') as f:
    summary = json.load(f)

# Load detailed results
results_df = pd.read_csv('results/colab_demo/demo_results.csv')

print('='*80)
print('üìà PERFORMANCE BY CONTROLLER STRATEGY')
print('='*80)
print()

# Create comparison table
comparison_data = []
for strategy in results_df['controller_strategy'].unique():
    strategy_df = results_df[results_df['controller_strategy'] == strategy]
    comparison_data.append({
        'Strategy': strategy,
        'Accuracy (‚Üë)': f"{strategy_df['accuracy'].mean():.3f}",
        'Volatility (‚Üì)': f"{strategy_df['volatility_final'].mean():.3f}",
        'CCI': f"{strategy_df['cci_score'].mean():.3f}",
        'ILDC (‚Üì)': f"{strategy_df['ildc_score'].mean():.3f}",
        'Options Shown': f"{strategy_df['n_options_shown'].mean():.1f}"
    })

comparison_df = pd.DataFrame(comparison_data)
comparison_df = comparison_df.set_index('Strategy')
print(comparison_df.to_string())
print()
print('Legend: ‚Üë = Higher is better | ‚Üì = Lower is better')
print()

# Highlight best performers
best_accuracy = results_df.groupby('controller_strategy')['accuracy'].mean().idxmax()
best_volatility = results_df.groupby('controller_strategy')['volatility_final'].mean().idxmin()
best_ildc = results_df.groupby('controller_strategy')['ildc_score'].mean().idxmin()

print('üèÜ Best Performers:')
print(f'   ‚úì Highest Accuracy: {best_accuracy}')
print(f'   ‚úì Lowest Volatility: {best_volatility}')
print(f'   ‚úì Lowest Internal Complexity: {best_ildc}')
print()

# Statistical significance (t-test between best and baseline)
from scipy import stats

baseline_accuracy = results_df[results_df['controller_strategy'] == 'none']['accuracy']
best_accuracy_vals = results_df[results_df['controller_strategy'] == best_accuracy]['accuracy']

t_stat, p_value = stats.ttest_ind(best_accuracy_vals, baseline_accuracy)

print('üìä Statistical Test (Accuracy):')
print(f'   Comparing {best_accuracy} vs. none (baseline)')
print(f'   t-statistic: {t_stat:.3f}')
print(f'   p-value: {p_value:.4f}')
if p_value < 0.05:
    print('   ‚úÖ Statistically significant improvement (p < 0.05)')
else:
    print('   ‚ö†Ô∏è Not statistically significant (p ‚â• 0.05)')
print()

print('='*80)
print('üîó KEY CORRELATIONS')
print('='*80)
print()

correlations = {
    'CCI ‚Üî ILDC': results_df[['cci_score', 'ildc_score']].corr().iloc[0, 1],
    'CCI ‚Üî Accuracy': results_df[['cci_score', 'accuracy']].corr().iloc[0, 1],
    'CCI ‚Üî Volatility': results_df[['cci_score', 'volatility_final']].corr().iloc[0, 1],
    'ILDC ‚Üî Volatility': results_df[['ildc_score', 'volatility_final']].corr().iloc[0, 1],
    'ILDC ‚Üî Accuracy': results_df[['ildc_score', 'accuracy']].corr().iloc[0, 1],
}

for corr_name, corr_value in correlations.items():
    interpretation = ''
    if abs(corr_value) > 0.7:
        interpretation = '(Strong)'
    elif abs(corr_value) > 0.4:
        interpretation = '(Moderate)'
    else:
        interpretation = '(Weak)'
    
    print(f'{corr_name:25s}: {corr_value:+.3f} {interpretation}')

print()
print('üí° Key Insights:')
print('   ‚Ä¢ Strong CCI‚ÜîILDC correlation validates two-tier coupling')
print('   ‚Ä¢ Negative CCI‚ÜîAccuracy shows complexity hurts performance')
print('   ‚Ä¢ ILDC‚ÜîVolatility confirms internal difficulty causes instability')

## üÜö Baseline Comparison Analysis

Let's compare our framework against common baseline approaches:

### Baseline Strategies Explained

1. **None (No Control)**
   - Present all options to LLM
   - No intervention
   - Current industry standard ‚ùå

2. **Naive Top-K**
   - Always show top 5 options (by some score)
   - Fixed pruning, no adaptation
   - Ignores problem complexity

3. **CCI Only** 
   - Prune based only on external complexity
   - Ignores LLM's actual difficulty
   - One-tier approach

4. **Two-Tier (Ours)** ‚úÖ
   - Uses both CCI and ILDC
   - Adapts to LLM's actual struggle
   - Novel contribution

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Create comprehensive baseline comparison
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
fig.suptitle('üÜö Comprehensive Baseline Comparison', fontsize=18, fontweight='bold', y=1.00)

strategies = results_df['controller_strategy'].unique()
colors = {'none': '#e74c3c', 'naive_topk': '#f39c12', 'cci_only': '#3498db', 'two_tier': '#27ae60'}

# Plot 1: Accuracy comparison with error bars
ax1 = axes[0, 0]
accuracy_data = results_df.groupby('controller_strategy')['accuracy'].agg(['mean', 'std'])
accuracy_data = accuracy_data.reindex(['none', 'naive_topk', 'cci_only', 'two_tier'])
x_pos = np.arange(len(accuracy_data))
bars1 = ax1.bar(x_pos, accuracy_data['mean'], yerr=accuracy_data['std'], 
               color=[colors[s] for s in accuracy_data.index],
               capsize=5, alpha=0.8, edgecolor='black', linewidth=1.5)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(['None\n(Baseline)', 'Naive\nTop-K', 'CCI\nOnly', 'Two-Tier\n(Ours)'], fontsize=11)
ax1.set_ylabel('Decision Accuracy', fontsize=12, fontweight='bold')
ax1.set_title('(A) Accuracy Comparison', fontsize=13, fontweight='bold')
ax1.grid(axis='y', alpha=0.3)
ax1.set_ylim([0, 1])
# Add value labels on bars
for i, bar in enumerate(bars1):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height + 0.02,
            f'{height:.3f}', ha='center', va='bottom', fontweight='bold')

# Plot 2: Volatility comparison
ax2 = axes[0, 1]
volatility_data = results_df.groupby('controller_strategy')['volatility_final'].agg(['mean', 'std'])
volatility_data = volatility_data.reindex(['none', 'naive_topk', 'cci_only', 'two_tier'])
bars2 = ax2.bar(x_pos, volatility_data['mean'], yerr=volatility_data['std'],
               color=[colors[s] for s in volatility_data.index],
               capsize=5, alpha=0.8, edgecolor='black', linewidth=1.5)
ax2.set_xticks(x_pos)
ax2.set_xticklabels(['None\n(Baseline)', 'Naive\nTop-K', 'CCI\nOnly', 'Two-Tier\n(Ours)'], fontsize=11)
ax2.set_ylabel('Decision Volatility', fontsize=12, fontweight='bold')
ax2.set_title('(B) Stability Comparison (Lower is Better)', fontsize=13, fontweight='bold')
ax2.grid(axis='y', alpha=0.3)
ax2.set_ylim([0, 1])
for i, bar in enumerate(bars2):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height + 0.02,
            f'{height:.3f}', ha='center', va='bottom', fontweight='bold')

# Plot 3: Efficiency (Options Shown)
ax3 = axes[1, 0]
options_data = results_df.groupby('controller_strategy')['n_options_shown'].agg(['mean', 'std'])
options_data = options_data.reindex(['none', 'naive_topk', 'cci_only', 'two_tier'])
bars3 = ax3.bar(x_pos, options_data['mean'], yerr=options_data['std'],
               color=[colors[s] for s in options_data.index],
               capsize=5, alpha=0.8, edgecolor='black', linewidth=1.5)
ax3.set_xticks(x_pos)
ax3.set_xticklabels(['None\n(Baseline)', 'Naive\nTop-K', 'CCI\nOnly', 'Two-Tier\n(Ours)'], fontsize=11)
ax3.set_ylabel('Average Options Shown', fontsize=12, fontweight='bold')
ax3.set_title('(C) Cognitive Load Reduction', fontsize=13, fontweight='bold')
ax3.grid(axis='y', alpha=0.3)
for i, bar in enumerate(bars3):
    height = bar.get_height()
    ax3.text(bar.get_x() + bar.get_width()/2., height + 0.5,
            f'{height:.1f}', ha='center', va='bottom', fontweight='bold')

# Plot 4: Pareto frontier (Accuracy vs Complexity)
ax4 = axes[1, 1]
for strategy in strategies:
    strategy_df = results_df[results_df['controller_strategy'] == strategy]
    ax4.scatter(strategy_df['ildc_score'], strategy_df['accuracy'],
               label=strategy, alpha=0.6, s=80, color=colors.get(strategy, 'gray'))
    
    # Add mean point
    mean_ildc = strategy_df['ildc_score'].mean()
    mean_acc = strategy_df['accuracy'].mean()
    ax4.scatter(mean_ildc, mean_acc, color=colors.get(strategy, 'gray'),
               s=300, marker='*', edgecolor='black', linewidth=2, zorder=10)

ax4.set_xlabel('ILDC (Internal Complexity)', fontsize=12, fontweight='bold')
ax4.set_ylabel('Accuracy', fontsize=12, fontweight='bold')
ax4.set_title('(D) Accuracy-Complexity Tradeoff\n(‚≠ê = mean)', fontsize=13, fontweight='bold')
ax4.legend(loc='lower left', fontsize=10)
ax4.grid(alpha=0.3)
# Add diagonal line showing ideal zone
ax4.axhline(y=0.7, color='green', linestyle='--', alpha=0.3, label='Target Accuracy')
ax4.axvline(x=0.4, color='red', linestyle='--', alpha=0.3, label='Complexity Threshold')

plt.tight_layout()
plt.savefig('results/colab_demo/baseline_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

# Summary statistics
print('\nüìä IMPROVEMENT OVER BASELINE (None)')
print('='*60)

baseline_acc = results_df[results_df['controller_strategy'] == 'none']['accuracy'].mean()
baseline_vol = results_df[results_df['controller_strategy'] == 'none']['volatility_final'].mean()

for strategy in ['naive_topk', 'cci_only', 'two_tier']:
    strategy_df = results_df[results_df['controller_strategy'] == strategy]
    acc_improvement = ((strategy_df['accuracy'].mean() - baseline_acc) / baseline_acc) * 100
    vol_improvement = ((baseline_vol - strategy_df['volatility_final'].mean()) / baseline_vol) * 100
    
    print(f'\n{strategy.upper().replace("_", " ")}:')
    print(f'  Accuracy:   {acc_improvement:+.1f}%')
    print(f'  Volatility: {vol_improvement:+.1f}% (reduction)')

## üìà Detailed Visualizations

Explore all generated plots to understand the framework's behavior.

In [None]:
from IPython.display import Image, display
import os

# Display all generated plots
plots_dir = 'results/colab_demo/plots'
plot_files = sorted([f for f in os.listdir(plots_dir) if f.endswith('.png')])

for plot_file in plot_files:
    print(f'\n{"="*80}')
    print(f'{plot_file.replace("_", " ").replace(".png", "").upper()}')
    print('='*80)
    display(Image(filename=f'{plots_dir}/{plot_file}'))

## üî¨ Interactive Data Exploration

Dive deeper into the raw data and explore patterns.

In [None]:
print('üìä Dataset Overview\n')
print(f'Total Samples: {len(results_df)}')
print(f'Strategies Tested: {results_df["controller_strategy"].nunique()}')
print(f'Unique Problems: {results_df["sample_id"].nunique()}')
print()

# Show sample data
print('First 10 rows (selected columns):\n')
display_cols = ['sample_id', 'controller_strategy', 'n_options', 'n_options_shown', 
                'cci_score', 'ildc_score', 'accuracy', 'volatility_final']
results_df[display_cols].head(10)

In [None]:
# Statistical summary
print('üìà Statistical Summary by Controller Strategy:\n')
summary_stats = results_df.groupby('controller_strategy')[
    ['accuracy', 'cci_score', 'ildc_score', 'volatility_final', 'n_options_shown']
].describe().round(3)

display(summary_stats)

## üß™ Custom Experiment Builder

Create and analyze your own choice problem! Adjust the sliders below to define a custom scenario.

In [None]:
# Interactive custom experiment
custom_n_options = widgets.IntSlider(
    value=15,
    min=5,
    max=50,
    step=5,
    description='# Options:',
    style={'description_width': '150px'}
)

custom_n_attributes = widgets.IntSlider(
    value=5,
    min=2,
    max=10,
    step=1,
    description='# Attributes:',
    style={'description_width': '150px'}
)

custom_complexity = widgets.FloatSlider(
    value=0.5,
    min=0.0,
    max=1.0,
    step=0.1,
    description='Target CCI:',
    style={'description_width': '150px'},
    readout_format='.1f'
)

run_button = widgets.Button(
    description='üöÄ Run Custom Experiment',
    button_style='success',
    layout=widgets.Layout(width='300px', height='40px')
)

output_area = widgets.Output()

def on_run_clicked(b):
    with output_area:
        output_area.clear_output()
        
        import sys
        sys.path.insert(0, '/content/choice-complexity-llm')
        
        from src.datasets import SyntheticChoiceDataset
        from src.cci import ChoiceComplexityIndex
        import numpy as np
        
        # Create a custom choice problem
        dataset_gen = SyntheticChoiceDataset(seed=42)
        cci_calc = ChoiceComplexityIndex()
        
        # Generate one sample with specified parameters
        sample = dataset_gen.generate_dataset(n_samples=1)[0]
        
        # Adjust to match user parameters (simplified)
        n_opts = custom_n_options.value
        sample['options'] = sample['options'][:n_opts]
        
        print('='*80)
        print('üß™ CUSTOM CHOICE PROBLEM')
        print('='*80)
        print(f'Number of options: {len(sample["options"])}')
        print(f'Number of attributes: {custom_n_attributes.value} (configured)')
        print(f'Ground truth choice: {sample["ground_truth_choice"]}\n')
        
        # Show first 3 options
        for i, option in enumerate(sample['options'][:3]):
            print(f'Option {i}:')
            print(f'  Attributes: {option["attributes"]}')
            if len(sample['options']) > 3 and i == 2:
                print(f'\n... and {len(sample["options"])-3} more options')
        
        # Compute CCI
        cci_result = cci_calc.compute(sample['options'])
        
        print('\n' + '='*80)
        print('üìä CCI ANALYSIS')
        print('='*80)
        print(f'CCI Score: {cci_result["cci_score"]:.3f}')
        
        # Complexity interpretation
        cci_score = cci_result['cci_score']
        if cci_score < 0.3:
            interpretation = 'üü¢ LOW - Easy for LLM to decide'
        elif cci_score < 0.6:
            interpretation = 'üü° MEDIUM - Moderate difficulty'
        else:
            interpretation = 'üî¥ HIGH - Challenging, needs control'
        
        print(f'Interpretation: {interpretation}')
        print(f'\nFeature Breakdown:')
        for feature, value in cci_result['features'].items():
            print(f'  {feature}: {value:.3f}')
        
        # Recommendation
        print('\n' + '='*80)
        print('üí° CONTROLLER RECOMMENDATION')
        print('='*80)
        if cci_score < 0.4:
            print('‚úÖ No control needed - Present all options')
        elif cci_score < 0.7:
            print('‚ö†Ô∏è Consider CCI-only controller - Prune to top 8-10 options')
        else:
            print('üö® Use two-tier controller - Aggressive pruning + ILDC monitoring')

run_button.on_click(on_run_clicked)

display(widgets.VBox([
    widgets.HTML('<h3>üß™ Custom Experiment Configuration</h3>'),
    custom_n_options,
    custom_n_attributes,
    custom_complexity,
    widgets.HTML('<br>'),
    run_button,
    widgets.HTML('<br>'),
    output_area
]))


## üìö Framework Deep Dive

### Choice Complexity Index (CCI) - Mathematical Definition

$$\text{CCI} = w_1 \cdot f_{\text{count}}(n) + w_2 \cdot f_{\text{redundancy}}(O) + w_3 \cdot f_{\text{conflict}}(A) + w_4 \cdot f_{\text{entropy}}(O)$$

Where:
- $f_{\text{count}}(n)$: Normalized option count (e.g., $\log(n)/\log(50)$)
- $f_{\text{redundancy}}(O)$: Fraction of near-duplicate options
- $f_{\text{conflict}}(A)$: Attribute trade-off severity (inverse correlation)
- $f_{\text{entropy}}(O)$: Distribution entropy of option quality
- Weights: $w_1=0.3, w_2=0.25, w_3=0.25, w_4=0.2$ (tuned empirically)

### Internal LLM Decision Complexity (ILDC)

Given $k$ LLM decision samples $\{c_1, c_2, ..., c_k\}$:

$$\text{ILDC} = 0.4 \cdot V + 0.3 \cdot (1 - \bar{\rho}) + 0.2 \cdot \sigma_\rho + 0.1 \cdot D$$

Where:
- $V$ (Volatility): Fraction of samples with different choice
- $\bar{\rho}$ (Mean confidence): Average LLM confidence
- $\sigma_\rho$ (Confidence std): Confidence variation
- $D$ (Disagreement): Unique choices / total samples

### Controller Decision Logic

```python
if CCI > 0.8 and ILDC > 0.7:
    action = AGGRESSIVE_PRUNE  # Show top 3-5
elif CCI > 0.6 and ILDC > 0.5:
    action = MODERATE_PRUNE    # Show top 8-10
elif CCI > 0.6:
    action = CCI_BASED_CONTROL # Cluster or satisfice
else:
    action = NO_CONTROL        # Present all
```

### Key Theoretical Properties

1. **Two-Tier Coupling**: CCI and ILDC are correlated (r ‚âà 0.7) but measure distinct aspects
2. **Bounded Rationality**: Framework operationalizes Simon's satisficing for LLMs
3. **Pareto Efficiency**: Controller preserves Pareto-optimal options
4. **Adaptation**: ILDC enables dynamic adjustment based on actual LLM behavior

## üíæ Download Results

Download all generated files to your local machine for further analysis or presentation.

In [None]:
from google.colab import files
import zipfile
import os

# Create a zip file with all results
!zip -r results_colab_demo.zip results/colab_demo/

print('üì¶ Results packaged!')
print('\nPackage contents:')
print('  ‚úì demo_results.csv - Full experimental data')
print('  ‚úì summary.json - Aggregate metrics')
print('  ‚úì plots/ - All visualizations (PNG)')
print('  ‚úì baseline_comparison.png - Comprehensive comparison plot')
print()
print('Downloading...')
files.download('results_colab_demo.zip')
print('‚úÖ Download complete!')

## üöÄ Next Steps & Extensions

### For Researchers

1. **Integrate Real LLMs**
   - Modify `src/llm_adapter.py` to use OpenAI/Anthropic/local models
   - Run `experiments/run_benchmark.py` with actual API calls
   - Compare GPT-4, Claude, Llama 3 behavior under choice complexity

2. **Test on Real Datasets**
   - E-commerce: Product recommendations with real reviews
   - Healthcare: Treatment option selection
   - Legal: Case law retrieval and application

3. **Ablation Studies**
   - Test individual CCI components (what matters most?)
   - Vary ILDC sample size (how many samples needed?)
   - Compare weight configurations

4. **Theoretical Analysis**
   - Sample complexity bounds for ILDC
   - PAC-learning framework for controller
   - Information-theoretic formalization

### For Practitioners

1. **RAG Integration**
   - Compute CCI on retrieved documents
   - Prune before presenting to LLM
   - Monitor ILDC for retrieval quality

2. **Multi-Agent Systems**
   - Each agent maintains ILDC profile
   - Orchestrator uses CCI+ILDC for task allocation
   - Adaptive complexity budgets per agent

3. **Production Deployment**
   - A/B test controlled vs uncontrolled
   - Monitor accuracy, latency, token cost
   - Implement cost-aware controller (balance quality vs tokens)

### Open Research Questions

1. Does CCI generalize across LLM families?
2. Can ILDC predict errors before they happen?
3. What's the optimal controller threshold for different domains?
4. How does fine-tuning affect choice complexity sensitivity?
5. Can we learn optimal CCI weights from data?

### Contributing

- üêõ **Found a bug?** Open an issue on [GitHub](https://github.com/soroushbagheri/choice-complexity-llm/issues)
- üí° **Have an idea?** Start a discussion
- üî¨ **Want to collaborate?** Reach out via GitHub

### Resources

- **Repository**: [github.com/soroushbagheri/choice-complexity-llm](https://github.com/soroushbagheri/choice-complexity-llm)
- **Paper**: Coming soon on arXiv
- **Documentation**: See `docs/` folder

---

### Citation

```bibtex
@software{bagheri2026choice_complexity,
  author = {Bagheri, Soroush},
  title = {Decision-Theoretic Choice Complexity in LLMs: A Two-Tier Framework},
  year = {2026},
  url = {https://github.com/soroushbagheri/choice-complexity-llm},
  note = {Under review}
}
```

---

**Thank you for exploring this framework! ‚≠ê Star the repo if you find it useful.**