# Intention Collapse: Experiment 4.1 (v2 - Corrected)
## Correlating Intention Metrics with Reasoning Accuracy

**Version 2.0** - Includes fixes for:
- ✅ Activation extraction bug (size mismatch)
- ✅ Length bias control (babble condition)
- ✅ Layer-wise dim_eff analysis
- ✅ Sanity checks for answer extraction
- ✅ Robust error handling throughout

### Metrics Implemented
1. **Intention Entropy** $H_{int}(I)$: Shannon entropy of next-token distribution
2. **Effective Dimensionality** $dim_{eff}(I)$: PCA-based dimensionality of hidden activations
3. **Latent Recoverability** $Recov(I; Z)$: Linear probe accuracy for predicting correctness

### Experimental Conditions
- **Baseline**: Zero-shot (direct answer)
- **Enhanced**: Chain-of-thought reasoning
- **Babble**: Length-matched negative control (to rule out length bias)

---
## 1. Setup and Installation

In [None]:
# Install dependencies
!pip install -q torch transformers accelerate bitsandbytes
!pip install -q datasets scikit-learn scipy
!pip install -q matplotlib seaborn tqdm pyyaml

In [None]:
# Verify GPU availability
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️ WARNING: No GPU detected. Please enable GPU in Runtime > Change runtime type")

In [None]:
# Configure Hugging Face token
try:
    from google.colab import userdata
    HF_TOKEN = userdata.get('HF_TOKEN')
    print("✓ Loaded HF_TOKEN from Colab Secrets")
except:
    import getpass
    HF_TOKEN = getpass.getpass("Enter your Hugging Face token: ")
    print("✓ Token entered manually")

from huggingface_hub import login
login(token=HF_TOKEN, add_to_git_credential=False)

---
## 2. Configuration

In [None]:
# =============================================================================
# EXPERIMENT CONFIGURATION
# =============================================================================

CONFIG = {
    # Model settings
    'model_name': 'mistralai/Mistral-7B-Instruct-v0.3',
    'quantization': '4bit',
    'extraction_layers': [27, 28, 29, 30, 31],  # Last 5 layers
    
    # Dataset settings
    'dataset': 'gsm8k',
    'subset_size': 200,  # Adjust based on time/resources
    'seed': 42,
    
    # Generation settings
    'max_new_tokens_baseline': 50,
    'max_new_tokens_cot': 512,
    'max_new_tokens_babble': 512,  # Match CoT length for fair comparison
    'temperature': 0.0,  # Greedy decoding
    
    # Metric settings
    'variance_threshold': 0.90,
    'entropy_top_k': 100,
    'probe_regularization': 1.0,
    
    # Output settings
    'output_dir': 'results',
    'save_figures': True,
}

# Prompt templates for each condition
PROMPTS = {
    'baseline': """Solve this math problem. Give only the final numerical answer.

Problem: {question}
Answer:""",
    
    'enhanced': """Solve this math problem step by step. Show your reasoning, then give the final answer after ####.

Problem: {question}
Solution:""",
    
    # Negative control: generates long text without reasoning
    # This controls for the "length bias" critique
    'babble': """Given this math problem, write a long stream of consciousness about numbers, 
calculations, and mathematical concepts. Do NOT solve the problem - just write 
loosely related mathematical musings for about 100 words.

Problem: {question}
Stream of consciousness:"""
}

print("Configuration loaded successfully!")
print(f"Model: {CONFIG['model_name']}")
print(f"Dataset: {CONFIG['dataset']} ({CONFIG['subset_size']} problems)")
print(f"Conditions: baseline, enhanced (CoT), babble (negative control)")

---
## 3. Core Implementation (with fixes)

In [None]:
# =============================================================================
# IMPORTS
# =============================================================================

import numpy as np
import torch
import torch.nn.functional as F
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from datasets import load_dataset
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm
import re
import random
from typing import List, Dict, Tuple, Optional, Any
from dataclasses import dataclass
from contextlib import contextmanager
import warnings
import os

warnings.filterwarnings('ignore')

# Set seeds for reproducibility
random.seed(CONFIG['seed'])
np.random.seed(CONFIG['seed'])
torch.manual_seed(CONFIG['seed'])

print("All imports successful!")

In [None]:
# =============================================================================
# DATA STRUCTURES
# =============================================================================

@dataclass
class MathProblem:
    """Container for a math problem from GSM8K."""
    question: str
    answer: str
    final_answer: str
    idx: int

@dataclass
class IntentionMetrics:
    """Container for intention metrics for a single example."""
    entropy: float
    dim_eff: int = 0
    recoverability: Optional[float] = None

@dataclass
class ExperimentResult:
    """Results from a single problem evaluation."""
    problem_idx: int
    condition: str
    question: str
    ground_truth: str
    model_output: str
    extracted_answer: str
    is_correct: bool
    metrics: IntentionMetrics
    activations: Optional[np.ndarray] = None
    output_length: int = 0

In [None]:
# =============================================================================
# ACTIVATION EXTRACTION (FIXED VERSION)
# =============================================================================
# Key fix: Capture only the last token at each generation step
# This prevents the size mismatch error when concatenating

class ActivationExtractor:
    """
    Extract hidden state activations from specified transformer layers.
    
    FIXED: Collapses to last token immediately to avoid size mismatch
    when concatenating tensors from different generation steps.
    """
    
    def __init__(self, model, layers: List[int]):
        self.model = model
        self.layers = sorted(layers)
        self._hooks = []
        self._activations = {l: [] for l in layers}
        self._is_capturing = False
    
    def _get_layer_module(self, layer_idx: int):
        """Get the module for a specific layer."""
        return self.model.model.layers[layer_idx]
    
    def _create_hook(self, layer_idx: int):
        """Create a hook function - captures only last token to avoid size mismatch."""
        def hook(module, input, output):
            if not self._is_capturing:
                return
            hidden_states = output[0] if isinstance(output, tuple) else output
            # FIX: Capture only the last token position
            # This ensures all captured tensors have shape [batch, hidden_dim]
            last_hidden = hidden_states[:, -1, :].detach().cpu()
            self._activations[layer_idx].append(last_hidden)
        return hook
    
    def _register_hooks(self):
        """Register forward hooks on specified layers."""
        for layer_idx in self.layers:
            layer_module = self._get_layer_module(layer_idx)
            hook = layer_module.register_forward_hook(self._create_hook(layer_idx))
            self._hooks.append(hook)
    
    def _remove_hooks(self):
        """Remove all registered hooks."""
        for hook in self._hooks:
            hook.remove()
        self._hooks = []
    
    def clear(self):
        """Clear stored activations."""
        self._activations = {l: [] for l in self.layers}
    
    @contextmanager
    def capture(self):
        """Context manager for capturing activations."""
        self.clear()
        self._register_hooks()
        self._is_capturing = True
        try:
            yield self
        finally:
            self._is_capturing = False
            self._remove_hooks()
    
    def get_activations(self, aggregate: str = "last") -> np.ndarray:
        """
        Get extracted activations.
        
        Args:
            aggregate: "last" for last generation step, "mean" for mean across steps
            
        Returns:
            Array of shape (n_layers, hidden_dim)
        """
        all_activations = []
        for l in self.layers:
            if not self._activations[l]:
                raise ValueError(f"No activations captured for layer {l}")
            
            # Now safe to concatenate: all tensors are [batch, hidden_dim]
            layer_acts = torch.cat(self._activations[l], dim=0)  # [n_steps, hidden_dim]
            
            if aggregate == "last":
                # Take the last generation step (final intention state)
                layer_acts = layer_acts[-1, :]  # [hidden_dim]
            elif aggregate == "mean":
                # Mean across all generation steps
                layer_acts = layer_acts.mean(dim=0)  # [hidden_dim]
            elif aggregate == "all":
                # Return all steps (for trajectory analysis)
                pass
            
            all_activations.append(layer_acts.numpy())
        
        # Stack layers: (n_layers, hidden_dim)
        return np.stack(all_activations, axis=0)
    
    def get_num_steps(self) -> int:
        """Get number of generation steps captured."""
        if not self._activations[self.layers[0]]:
            return 0
        return len(self._activations[self.layers[0]])

print("✓ ActivationExtractor defined (with fix for size mismatch)")

In [None]:
# =============================================================================
# INTENTION METRICS IMPLEMENTATION
# =============================================================================

def compute_intention_entropy(logits: torch.Tensor, top_k: int = 100) -> float:
    """
    Compute intention entropy H_int(I) from logits.
    
    Lower entropy indicates a more "decided" intention state.
    We measure this on the FIRST generated token to capture
    the model's initial decisiveness.
    """
    if logits.dim() == 2:
        logits = logits[-1]
    
    if top_k > 0 and top_k < logits.size(-1):
        top_logits, _ = torch.topk(logits, top_k)
        probs = F.softmax(top_logits, dim=-1)
    else:
        probs = F.softmax(logits, dim=-1)
    
    eps = 1e-10
    log_probs = torch.log2(probs + eps)
    entropy = -torch.sum(probs * log_probs).item()
    
    return entropy


def compute_effective_dimensionality(
    activations: np.ndarray,
    variance_threshold: float = 0.90
) -> Tuple[int, np.ndarray]:
    """
    Compute effective intention dimensionality dim_eff(I) using PCA.
    
    Higher dim_eff suggests a "richer" intention state with more
    information being processed.
    """
    # Handle different input shapes
    if activations.ndim == 1:
        # Single vector - can't do PCA meaningfully
        return 1, np.array([1.0])
    
    if activations.ndim == 3:
        # (n_layers, n_samples, hidden_dim) -> flatten
        n_layers, n_samples, hidden_dim = activations.shape
        activations = activations.reshape(n_layers * n_samples, hidden_dim)
    
    n_samples, n_features = activations.shape
    n_components = min(n_samples, n_features)
    
    if n_components < 2:
        return 1, np.array([1.0])
    
    # Fit PCA
    pca = PCA(n_components=n_components)
    pca.fit(activations)
    
    # Find smallest k such that cumulative variance >= threshold
    cumulative_variance = np.cumsum(pca.explained_variance_ratio_)
    dim_eff = np.searchsorted(cumulative_variance, variance_threshold) + 1
    dim_eff = min(dim_eff, n_components)
    
    return int(dim_eff), pca.explained_variance_ratio_


def train_recoverability_probe(
    activations: np.ndarray,
    labels: np.ndarray,
    cv_folds: int = 5
) -> Tuple[float, float]:
    """
    Train a linear probe to measure latent recoverability Recov(I; Z).
    
    Higher probe accuracy than verbalized accuracy suggests the model
    "knows more than it says" - information is lost during collapse.
    """
    if activations.ndim == 3:
        n_layers, n_samples, hidden_dim = activations.shape
        activations = activations.transpose(1, 0, 2).reshape(n_samples, -1)
    elif activations.ndim == 2 and activations.shape[0] != len(labels):
        # Reshape if needed
        activations = activations.reshape(len(labels), -1)
    
    labels = np.asarray(labels).astype(int)
    
    # Check if we have enough samples and class balance
    if len(np.unique(labels)) < 2:
        return 0.5, 0.0
    
    if len(labels) < cv_folds:
        cv_folds = max(2, len(labels))
    
    pipeline = Pipeline([
        ('scaler', StandardScaler()),
        ('classifier', LogisticRegression(
            C=1.0/CONFIG['probe_regularization'],
            max_iter=1000,
            random_state=CONFIG['seed'],
            solver='lbfgs'
        ))
    ])
    
    try:
        cv_scores = cross_val_score(pipeline, activations, labels, cv=cv_folds)
        return cv_scores.mean(), cv_scores.std()
    except Exception as e:
        print(f"Warning: Probe training failed: {e}")
        return 0.5, 0.0

print("✓ Metrics functions defined")

In [None]:
# =============================================================================
# DATA UTILITIES (IMPROVED)
# =============================================================================

def load_gsm8k_problems(subset_size: int, seed: int) -> List[MathProblem]:
    """Load GSM8K dataset."""
    dataset = load_dataset("gsm8k", "main", split="test")
    
    if subset_size < len(dataset):
        random.seed(seed)
        indices = random.sample(range(len(dataset)), subset_size)
        dataset = dataset.select(indices)
    
    problems = []
    for idx, item in enumerate(dataset):
        solution = item['answer']
        match = re.search(r'####\s*(.+?)$', solution, re.MULTILINE)
        final_answer = match.group(1).strip().replace(',', '') if match else ""
        
        problems.append(MathProblem(
            question=item['question'],
            answer=solution,
            final_answer=final_answer,
            idx=idx
        ))
    
    return problems


def extract_answer(model_output: str) -> str:
    """
    Extract numerical answer from model output.
    Improved to handle more formats.
    """
    output = model_output.strip()
    
    # Priority patterns (most specific first)
    patterns = [
        r'####\s*(-?[\d,]+\.?\d*)',           # GSM8K format
        r'[Ff]inal [Aa]nswer[:\s]+(-?[\d,]+\.?\d*)',
        r'[Aa]nswer[:\s]+\$?(-?[\d,]+\.?\d*)',
        r'[Tt]he answer is[:\s]+\$?(-?[\d,]+\.?\d*)',
        r'[Tt]otal[:\s]+\$?(-?[\d,]+\.?\d*)',
        r'=\s*\$?(-?[\d,]+\.?\d*)\s*$',
        r'\$(-?[\d,]+\.?\d*)\s*$',            # Dollar amount at end
    ]
    
    for pattern in patterns:
        match = re.search(pattern, output)
        if match:
            return match.group(1).replace(',', '')
    
    # Fallback: last number in output
    numbers = re.findall(r'-?[\d,]+\.?\d*', output)
    numbers = [n.replace(',', '') for n in numbers if n.replace(',', '').replace('.', '').replace('-', '').isdigit()]
    if numbers:
        return numbers[-1]
    
    return ""


def evaluate_answer(predicted: str, ground_truth: str, tolerance: float = 0.01) -> bool:
    """
    Check if predicted answer matches ground truth.
    Handles both integer and float comparisons.
    """
    pred_clean = predicted.strip().replace(',', '').replace('$', '').replace('%', '')
    truth_clean = ground_truth.strip().replace(',', '').replace('$', '').replace('%', '')
    
    # Exact string match
    if pred_clean == truth_clean:
        return True
    
    # Numerical comparison
    try:
        pred_num = float(pred_clean)
        truth_num = float(truth_clean)
        
        # Exact match
        if pred_num == truth_num:
            return True
        
        # Integer comparison (ignore decimals)
        if int(pred_num) == int(truth_num):
            return True
        
        # Tolerance-based match for floats
        if truth_num != 0 and abs((pred_num - truth_num) / truth_num) < tolerance:
            return True
            
    except ValueError:
        pass
    
    return False

print("✓ Data utilities defined")

---
## 4. Load Model and Data

In [None]:
# =============================================================================
# LOAD MODEL
# =============================================================================

print(f"Loading model: {CONFIG['model_name']}")
print("This may take a few minutes...")

# Configure quantization
if CONFIG['quantization'] == '4bit':
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4"
    )
elif CONFIG['quantization'] == '8bit':
    quantization_config = BitsAndBytesConfig(load_in_8bit=True)
else:
    quantization_config = None

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(CONFIG['model_name'], token=HF_TOKEN)
tokenizer.pad_token = tokenizer.eos_token

# Load model
model = AutoModelForCausalLM.from_pretrained(
    CONFIG['model_name'],
    quantization_config=quantization_config,
    device_map="auto",
    token=HF_TOKEN,
    torch_dtype=torch.float16
)
model.eval()

print(f"\n✓ Model loaded successfully!")
print(f"  - Parameters: {sum(p.numel() for p in model.parameters()):,}")
print(f"  - Layers: {model.config.num_hidden_layers}")
print(f"  - Hidden size: {model.config.hidden_size}")

In [None]:
# =============================================================================
# LOAD DATASET
# =============================================================================

print(f"Loading GSM8K dataset ({CONFIG['subset_size']} problems)...")

problems = load_gsm8k_problems(
    subset_size=CONFIG['subset_size'],
    seed=CONFIG['seed']
)

print(f"\n✓ Loaded {len(problems)} problems")
print(f"\nExample problem:")
print(f"  Question: {problems[0].question[:200]}...")
print(f"  Answer: {problems[0].final_answer}")

---
## 5. Run Experiment

In [None]:
# =============================================================================
# EXPERIMENT RUNNER
# =============================================================================

def run_single_problem(
    model,
    tokenizer,
    extractor: ActivationExtractor,
    problem: MathProblem,
    condition: str,
    prompt_template: str,
    max_new_tokens: int
) -> ExperimentResult:
    """
    Run model on a single problem and extract metrics.
    """
    # Format prompt
    prompt = prompt_template.format(question=problem.question)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Generate with activation capture
    activations = None
    with torch.no_grad():
        with extractor.capture():
            outputs = model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                temperature=CONFIG['temperature'] if CONFIG['temperature'] > 0 else None,
                do_sample=CONFIG['temperature'] > 0,
                pad_token_id=tokenizer.eos_token_id,
                return_dict_in_generate=True,
                output_scores=True
            )
        
        # Get activations (now with fix)
        try:
            activations = extractor.get_activations(aggregate="last")
        except Exception as e:
            # This should rarely happen now with the fix
            print(f"Warning: Failed to get activations: {e}")
            activations = None
    
    # Decode output
    generated_ids = outputs.sequences[0][inputs['input_ids'].shape[1]:]
    model_output = tokenizer.decode(generated_ids, skip_special_tokens=True)
    
    # Compute entropy from first token
    entropy = 0.0
    if outputs.scores:
        first_logits = outputs.scores[0][0]
        entropy = compute_intention_entropy(first_logits, top_k=CONFIG['entropy_top_k'])
    
    # Evaluate answer (only for baseline and enhanced, not babble)
    if condition == 'babble':
        extracted_answer = ""
        is_correct = False  # Not applicable for babble
    else:
        extracted_answer = extract_answer(model_output)
        is_correct = evaluate_answer(extracted_answer, problem.final_answer)
    
    output_length = len(model_output.split())
    
    return ExperimentResult(
        problem_idx=problem.idx,
        condition=condition,
        question=problem.question,
        ground_truth=problem.final_answer,
        model_output=model_output,
        extracted_answer=extracted_answer,
        is_correct=is_correct,
        metrics=IntentionMetrics(entropy=entropy, dim_eff=0),
        activations=activations,
        output_length=output_length
    )

print("✓ Experiment runner defined")

In [None]:
# =============================================================================
# RUN MAIN EXPERIMENT
# =============================================================================

# Initialize extractor
extractor = ActivationExtractor(model, CONFIG['extraction_layers'])

# Storage for results - now includes 'babble' condition
conditions = ['baseline', 'enhanced', 'babble']
results = {c: [] for c in conditions}
all_activations = {c: [] for c in conditions}

print("="*60)
print("RUNNING EXPERIMENT 4.1: Intention Metrics vs. Accuracy")
print("="*60)
print("\nConditions:")
print("  - baseline: Direct answer (zero-shot)")
print("  - enhanced: Chain-of-thought reasoning")
print("  - babble: Length-matched negative control")

for condition in conditions:
    print(f"\n{'='*60}")
    print(f"CONDITION: {condition.upper()}")
    print(f"{'='*60}")
    
    prompt_template = PROMPTS[condition]
    
    # Set max tokens based on condition
    if condition == 'baseline':
        max_tokens = CONFIG['max_new_tokens_baseline']
    elif condition == 'enhanced':
        max_tokens = CONFIG['max_new_tokens_cot']
    else:  # babble
        max_tokens = CONFIG['max_new_tokens_babble']
    
    activation_success = 0
    
    for problem in tqdm(problems, desc=f"Running {condition}"):
        result = run_single_problem(
            model, tokenizer, extractor,
            problem, condition,
            prompt_template, max_tokens
        )
        results[condition].append(result)
        
        if result.activations is not None:
            all_activations[condition].append(result.activations)
            activation_success += 1
        
        # Clear CUDA cache periodically
        if problem.idx % 20 == 0:
            torch.cuda.empty_cache()
    
    # Print condition summary
    if condition != 'babble':
        accuracy = sum(r.is_correct for r in results[condition]) / len(results[condition])
        print(f"\n  Accuracy: {accuracy:.1%}")
    
    avg_length = np.mean([r.output_length for r in results[condition]])
    print(f"  Avg output length: {avg_length:.1f} words")
    print(f"  Activations captured: {activation_success}/{len(problems)} ({activation_success/len(problems)*100:.1f}%)")

print("\n" + "="*60)
print("EXPERIMENT COMPLETE")
print("="*60)

In [None]:
# =============================================================================
# COMPUTE DIM_EFF FOR ALL CONDITIONS
# =============================================================================

print("\nComputing effective dimensionality...")

dim_eff_results = {}

for condition in conditions:
    if all_activations[condition]:
        # Stack all activations: (n_examples, n_layers, hidden_dim)
        stacked = np.stack(all_activations[condition], axis=0)
        n_examples, n_layers, hidden_dim = stacked.shape
        
        # Global dim_eff across all examples
        flattened = stacked.reshape(n_examples * n_layers, hidden_dim)
        global_dim_eff, explained_var = compute_effective_dimensionality(
            flattened,
            variance_threshold=CONFIG['variance_threshold']
        )
        
        # Per-example dim_eff
        per_example_dims = []
        for i, result in enumerate(results[condition]):
            if i < len(all_activations[condition]):
                acts = all_activations[condition][i]
                dim_eff, _ = compute_effective_dimensionality(
                    acts,
                    variance_threshold=CONFIG['variance_threshold']
                )
                result.metrics.dim_eff = dim_eff
                per_example_dims.append(dim_eff)
        
        dim_eff_results[condition] = {
            'global': global_dim_eff,
            'per_example': per_example_dims,
            'mean': np.mean(per_example_dims),
            'std': np.std(per_example_dims)
        }
        
        print(f"  {condition}: Global dim_eff = {global_dim_eff}, Mean per-example = {np.mean(per_example_dims):.1f} ± {np.std(per_example_dims):.1f}")
    else:
        print(f"  {condition}: No activations captured")
        dim_eff_results[condition] = None

In [None]:
# =============================================================================
# COMPUTE DIM_EFF BY LAYER (for layer-wise analysis)
# =============================================================================

print("\nComputing layer-wise dimensionality...")

layer_dim_eff = {c: [] for c in conditions}

for condition in conditions:
    if all_activations[condition]:
        stacked = np.stack(all_activations[condition], axis=0)  # (n_examples, n_layers, hidden_dim)
        n_examples, n_layers, hidden_dim = stacked.shape
        
        for layer_idx in range(n_layers):
            layer_acts = stacked[:, layer_idx, :]  # (n_examples, hidden_dim)
            dim_eff, _ = compute_effective_dimensionality(layer_acts, CONFIG['variance_threshold'])
            layer_dim_eff[condition].append(dim_eff)
        
        print(f"  {condition}: {layer_dim_eff[condition]}")

In [None]:
# =============================================================================
# TRAIN RECOVERABILITY PROBES
# =============================================================================

print("\nTraining linear probes for recoverability...")

probe_results = {}

for condition in ['baseline', 'enhanced']:  # Not for babble (no correct answers)
    if all_activations[condition] and len(all_activations[condition]) >= 10:
        # Prepare data
        stacked = np.stack(all_activations[condition], axis=0)
        n_examples = stacked.shape[0]
        X = stacked.reshape(n_examples, -1)  # Flatten layers
        y = np.array([results[condition][i].is_correct for i in range(n_examples)])
        
        # Train probe
        mean_acc, std_acc = train_recoverability_probe(X, y, cv_folds=5)
        
        probe_results[condition] = {
            'mean': mean_acc,
            'std': std_acc,
            'verbalized_accuracy': y.mean()
        }
        
        print(f"  {condition}:")
        print(f"    Probe accuracy (Recov): {mean_acc:.3f} ± {std_acc:.3f}")
        print(f"    Verbalized accuracy:    {y.mean():.3f}")
        print(f"    Recoverability gap:     {mean_acc - y.mean():+.3f}")
    else:
        print(f"  {condition}: Insufficient data for probe training")

---
## 6. Sanity Checks

In [None]:
# =============================================================================
# SANITY CHECK: INSPECT BASELINE EXAMPLES
# =============================================================================

print("="*60)
print("SANITY CHECK: BASELINE EXAMPLES")
print("="*60)

baseline_results = results['baseline']
correct_examples = [r for r in baseline_results if r.is_correct][:2]
incorrect_examples = [r for r in baseline_results if not r.is_correct][:5]
sample_examples = correct_examples + incorrect_examples

for i, r in enumerate(sample_examples):
    status = "✓ CORRECT" if r.is_correct else "✗ WRONG"
    print(f"\n--- Example {i+1} ({status}) ---")
    print(f"Question: {r.question[:120]}...")
    print(f"Ground truth: {r.ground_truth}")
    print(f"Model output: {r.model_output[:150]}...")
    print(f"Extracted answer: '{r.extracted_answer}'")

In [None]:
# =============================================================================
# SANITY CHECK: INSPECT ENHANCED (CoT) EXAMPLES
# =============================================================================

print("="*60)
print("SANITY CHECK: ENHANCED (CoT) EXAMPLES")
print("="*60)

enhanced_results = results['enhanced']
correct_cot = [r for r in enhanced_results if r.is_correct][:2]
incorrect_cot = [r for r in enhanced_results if not r.is_correct][:2]
sample_cot = correct_cot + incorrect_cot

for i, r in enumerate(sample_cot):
    status = "✓ CORRECT" if r.is_correct else "✗ WRONG"
    print(f"\n--- Example {i+1} ({status}) ---")
    print(f"Question: {r.question[:120]}...")
    print(f"Ground truth: {r.ground_truth}")
    print(f"Model output: {r.model_output[:400]}...")
    print(f"Extracted answer: '{r.extracted_answer}'")

In [None]:
# =============================================================================
# SANITY CHECK: INSPECT BABBLE (CONTROL) EXAMPLES
# =============================================================================

print("="*60)
print("SANITY CHECK: BABBLE (CONTROL) EXAMPLES")
print("="*60)

babble_results = results['babble'][:3]

for i, r in enumerate(babble_results):
    print(f"\n--- Example {i+1} ---")
    print(f"Question: {r.question[:120]}...")
    print(f"Model output: {r.model_output[:400]}...")
    print(f"Output length: {r.output_length} words")

In [None]:
# =============================================================================
# DIAGNOSTIC: ACTIVATION CAPTURE SUCCESS
# =============================================================================

print("="*60)
print("ACTIVATION CAPTURE DIAGNOSTIC")
print("="*60)

for condition in conditions:
    n_captured = len(all_activations[condition])
    n_total = len(results[condition])
    pct = (n_captured / n_total * 100) if n_total > 0 else 0
    
    print(f"\n{condition.upper()}:")
    print(f"  Activations captured: {n_captured}/{n_total} ({pct:.1f}%)")
    
    if n_captured > 0:
        sample = all_activations[condition][0]
        print(f"  Sample shape: {sample.shape}")
        print(f"  Sample stats: mean={sample.mean():.4f}, std={sample.std():.4f}")

---
## 7. Analyze Results

In [None]:
# =============================================================================
# COMPILE RESULTS
# =============================================================================

# Extract metrics into arrays
metrics_data = {}

for condition in conditions:
    metrics_data[condition] = {
        'entropy': [r.metrics.entropy for r in results[condition]],
        'dim_eff': [r.metrics.dim_eff for r in results[condition]],
        'correct': [r.is_correct for r in results[condition]],
        'output_length': [r.output_length for r in results[condition]]
    }

# Print summary statistics
print("\n" + "="*60)
print("SUMMARY STATISTICS")
print("="*60)

for condition in conditions:
    print(f"\n{condition.upper()}:")
    data = metrics_data[condition]
    if condition != 'babble':
        print(f"  Accuracy:       {np.mean(data['correct']):.1%}")
    print(f"  Entropy:        {np.mean(data['entropy']):.2f} ± {np.std(data['entropy']):.2f}")
    print(f"  Dim_eff:        {np.mean(data['dim_eff']):.1f} ± {np.std(data['dim_eff']):.1f}")
    print(f"  Output length:  {np.mean(data['output_length']):.1f} ± {np.std(data['output_length']):.1f} words")

In [None]:
# =============================================================================
# CORRELATION ANALYSIS
# =============================================================================

print("\n" + "="*60)
print("CORRELATION ANALYSIS")
print("="*60)

correlation_results = {}

for condition in ['baseline', 'enhanced']:  # Not for babble
    print(f"\n{condition.upper()}:")
    data = metrics_data[condition]
    correct = np.array(data['correct']).astype(int)
    
    # Entropy vs Correctness
    try:
        r_entropy, p_entropy = stats.pointbiserialr(correct, data['entropy'])
        print(f"  Entropy vs Correct:     r={r_entropy:.3f}, p={p_entropy:.4f}")
    except:
        r_entropy, p_entropy = np.nan, np.nan
        print(f"  Entropy vs Correct:     Could not compute")
    
    # Dim_eff vs Correctness
    try:
        dim_eff_vals = [d for d in data['dim_eff'] if d > 0]
        if len(dim_eff_vals) > 10:
            r_dim, p_dim = stats.pointbiserialr(correct[:len(dim_eff_vals)], dim_eff_vals)
            print(f"  Dim_eff vs Correct:     r={r_dim:.3f}, p={p_dim:.4f}")
        else:
            r_dim, p_dim = np.nan, np.nan
            print(f"  Dim_eff vs Correct:     Insufficient data")
    except:
        r_dim, p_dim = np.nan, np.nan
        print(f"  Dim_eff vs Correct:     Could not compute")
    
    # Output length vs Correctness
    try:
        r_len, p_len = stats.pointbiserialr(correct, data['output_length'])
        print(f"  Output len vs Correct:  r={r_len:.3f}, p={p_len:.4f}")
    except:
        r_len, p_len = np.nan, np.nan
        print(f"  Output len vs Correct:  Could not compute")
    
    correlation_results[condition] = {
        'entropy_correct': (r_entropy, p_entropy),
        'dim_correct': (r_dim, p_dim),
        'length_correct': (r_len, p_len)
    }

In [None]:
# =============================================================================
# KEY FINDINGS
# =============================================================================

print("\n" + "="*60)
print("KEY FINDINGS")
print("="*60)

# 1. Accuracy comparison
base_acc = np.mean(metrics_data['baseline']['correct'])
enh_acc = np.mean(metrics_data['enhanced']['correct'])
print(f"\n1. ACCURACY:")
print(f"   Baseline: {base_acc:.1%}")
print(f"   Enhanced (CoT): {enh_acc:.1%}")
print(f"   Improvement: {(enh_acc - base_acc)*100:+.1f} percentage points")

# 2. Entropy comparison
base_ent = np.mean(metrics_data['baseline']['entropy'])
enh_ent = np.mean(metrics_data['enhanced']['entropy'])
babble_ent = np.mean(metrics_data['babble']['entropy'])
print(f"\n2. INTENTION ENTROPY (lower = more decided):")
print(f"   Baseline: {base_ent:.2f}")
print(f"   Enhanced (CoT): {enh_ent:.2f}")
print(f"   Babble (control): {babble_ent:.2f}")
print(f"   → CoT reduces entropy by {(1 - enh_ent/base_ent)*100:.1f}%")

# 3. Dim_eff comparison (LENGTH BIAS TEST)
base_dim = np.mean([d for d in metrics_data['baseline']['dim_eff'] if d > 0] or [0])
enh_dim = np.mean([d for d in metrics_data['enhanced']['dim_eff'] if d > 0] or [0])
babble_dim = np.mean([d for d in metrics_data['babble']['dim_eff'] if d > 0] or [0])
print(f"\n3. EFFECTIVE DIMENSIONALITY (higher = richer intention):")
print(f"   Baseline: {base_dim:.1f}")
print(f"   Enhanced (CoT): {enh_dim:.1f}")
print(f"   Babble (control): {babble_dim:.1f}")
if enh_dim > 0 and babble_dim > 0:
    if enh_dim > babble_dim:
        print(f"   → dim_eff(CoT) > dim_eff(Babble): LENGTH BIAS RULED OUT ✓")
    else:
        print(f"   → dim_eff(CoT) ≤ dim_eff(Babble): Possible length bias ⚠️")

# 4. Output length comparison
base_len = np.mean(metrics_data['baseline']['output_length'])
enh_len = np.mean(metrics_data['enhanced']['output_length'])
babble_len = np.mean(metrics_data['babble']['output_length'])
print(f"\n4. OUTPUT LENGTH (words):")
print(f"   Baseline: {base_len:.1f}")
print(f"   Enhanced (CoT): {enh_len:.1f}")
print(f"   Babble (control): {babble_len:.1f}")

# 5. Recoverability
print(f"\n5. RECOVERABILITY (probe accuracy vs verbalized):")
for condition in ['baseline', 'enhanced']:
    if condition in probe_results:
        probe_acc = probe_results[condition]['mean']
        verb_acc = probe_results[condition]['verbalized_accuracy']
        gap = probe_acc - verb_acc
        print(f"   {condition}: Probe={probe_acc:.3f}, Verbalized={verb_acc:.3f}, Gap={gap:+.3f}")
    else:
        print(f"   {condition}: Not computed")

---
## 8. Visualizations

In [None]:
# =============================================================================
# VISUALIZATION SETUP
# =============================================================================

plt.rcParams.update({
    'font.family': 'serif',
    'font.size': 10,
    'axes.labelsize': 11,
    'axes.titlesize': 12,
    'xtick.labelsize': 9,
    'ytick.labelsize': 9,
    'legend.fontsize': 9,
    'figure.titlesize': 13,
})

sns.set_style("whitegrid")

COLORS = {
    'baseline': '#1f77b4',
    'enhanced': '#ff7f0e',
    'babble': '#9467bd',
    'correct': '#2ca02c',
    'incorrect': '#d62728',
}

os.makedirs(CONFIG['output_dir'], exist_ok=True)
print("✓ Visualization setup complete")

In [None]:
# =============================================================================
# FIGURE 1: MAIN COMPARISON (Accuracy, Entropy, Dim_eff)
# =============================================================================

fig, axes = plt.subplots(1, 3, figsize=(12, 4))

# (A) Accuracy comparison
ax = axes[0]
conds = ['Baseline', 'CoT']
accs = [base_acc, enh_acc]
colors = [COLORS['baseline'], COLORS['enhanced']]
bars = ax.bar(conds, accs, color=colors, alpha=0.8)
ax.set_ylabel('Accuracy')
ax.set_title('(A) Task Accuracy')
ax.set_ylim(0, 1)
for bar, acc in zip(bars, accs):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
           f'{acc:.1%}', ha='center', fontsize=10)

# (B) Entropy comparison (all 3 conditions)
ax = axes[1]
conds = ['Baseline', 'CoT', 'Babble']
ents = [base_ent, enh_ent, babble_ent]
colors = [COLORS['baseline'], COLORS['enhanced'], COLORS['babble']]
bars = ax.bar(conds, ents, color=colors, alpha=0.8)
ax.set_ylabel('Intention Entropy (bits)')
ax.set_title('(B) Intention Entropy $H_{int}(I)$')
for bar, ent in zip(bars, ents):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
           f'{ent:.2f}', ha='center', fontsize=10)

# (C) Dim_eff comparison (all 3 conditions)
ax = axes[2]
dims = [base_dim, enh_dim, babble_dim]
bars = ax.bar(conds, dims, color=colors, alpha=0.8)
ax.set_ylabel('Effective Dimensionality')
ax.set_title('(C) Intention Richness $dim_{eff}(I)$')
for bar, dim in zip(bars, dims):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5,
           f'{dim:.1f}', ha='center', fontsize=10)

plt.tight_layout()
if CONFIG['save_figures']:
    plt.savefig(f"{CONFIG['output_dir']}/fig1_main_comparison.pdf", bbox_inches='tight', dpi=300)
    plt.savefig(f"{CONFIG['output_dir']}/fig1_main_comparison.png", bbox_inches='tight', dpi=300)
plt.show()

In [None]:
# =============================================================================
# FIGURE 2: LAYER-WISE DIM_EFF (Gemini's suggestion)
# =============================================================================

if any(layer_dim_eff[c] for c in conditions):
    fig, ax = plt.subplots(figsize=(10, 5))
    
    x = np.arange(len(CONFIG['extraction_layers']))
    
    for condition in conditions:
        if layer_dim_eff[condition]:
            ax.plot(x, layer_dim_eff[condition], 'o-', 
                   color=COLORS[condition], 
                   label=condition.title(),
                   linewidth=2, markersize=8)
    
    ax.set_xticks(x)
    ax.set_xticklabels([f"L{l}" for l in CONFIG['extraction_layers']])
    ax.set_xlabel('Layer')
    ax.set_ylabel('Effective Dimensionality ($dim_{eff}$)')
    ax.set_title('Intention Richness Across Layers')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    plt.tight_layout()
    if CONFIG['save_figures']:
        plt.savefig(f"{CONFIG['output_dir']}/fig2_layer_dim_eff.pdf", bbox_inches='tight', dpi=300)
        plt.savefig(f"{CONFIG['output_dir']}/fig2_layer_dim_eff.png", bbox_inches='tight', dpi=300)
    plt.show()
else:
    print("⚠️ No layer-wise dim_eff data available for plotting")

In [None]:
# =============================================================================
# FIGURE 3: ENTROPY DISTRIBUTION BY CORRECTNESS
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

for idx, condition in enumerate(['baseline', 'enhanced']):
    ax = axes[idx]
    
    entropy_vals = metrics_data[condition]['entropy']
    correct_vals = metrics_data[condition]['correct']
    
    entropy_correct = [e for e, c in zip(entropy_vals, correct_vals) if c]
    entropy_incorrect = [e for e, c in zip(entropy_vals, correct_vals) if not c]
    
    data = [entropy_incorrect, entropy_correct]
    labels = ['Incorrect', 'Correct']
    colors_box = [COLORS['incorrect'], COLORS['correct']]
    
    bp = ax.boxplot(data, patch_artist=True, labels=labels)
    for patch, color in zip(bp['boxes'], colors_box):
        patch.set_facecolor(color)
        patch.set_alpha(0.7)
    
    ax.set_ylabel('Intention Entropy (bits)')
    ax.set_title(f'{condition.title()}')
    
    # Add correlation info
    if condition in correlation_results:
        r, p = correlation_results[condition]['entropy_correct']
        if not np.isnan(r):
            ax.text(0.95, 0.95, f'r = {r:.3f}\np = {p:.3f}',
                   transform=ax.transAxes, ha='right', va='top',
                   fontsize=9, bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))

plt.suptitle('Entropy Distribution by Answer Correctness', fontsize=12)
plt.tight_layout()
if CONFIG['save_figures']:
    plt.savefig(f"{CONFIG['output_dir']}/fig3_entropy_correctness.pdf", bbox_inches='tight', dpi=300)
    plt.savefig(f"{CONFIG['output_dir']}/fig3_entropy_correctness.png", bbox_inches='tight', dpi=300)
plt.show()

In [None]:
# =============================================================================
# FIGURE 4: LENGTH BIAS CONTROL (Critical for reviewers)
# =============================================================================

fig, axes = plt.subplots(1, 2, figsize=(10, 4))

# (A) Output length comparison
ax = axes[0]
conds = ['Baseline', 'CoT', 'Babble']
lengths = [base_len, enh_len, babble_len]
colors = [COLORS['baseline'], COLORS['enhanced'], COLORS['babble']]
bars = ax.bar(conds, lengths, color=colors, alpha=0.8)
ax.set_ylabel('Output Length (words)')
ax.set_title('(A) Output Length by Condition')
for bar, length in zip(bars, lengths):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 2,
           f'{length:.0f}', ha='center', fontsize=10)

# (B) Dim_eff vs Length scatter
ax = axes[1]
for condition in conditions:
    dims = [d for d in metrics_data[condition]['dim_eff'] if d > 0]
    lens = [metrics_data[condition]['output_length'][i] 
           for i, d in enumerate(metrics_data[condition]['dim_eff']) if d > 0]
    if dims and lens:
        ax.scatter(lens, dims, alpha=0.5, color=COLORS[condition], 
                  label=condition.title(), s=30)

ax.set_xlabel('Output Length (words)')
ax.set_ylabel('Effective Dimensionality')
ax.set_title('(B) Dim_eff vs Output Length')
ax.legend()

plt.tight_layout()
if CONFIG['save_figures']:
    plt.savefig(f"{CONFIG['output_dir']}/fig4_length_bias_control.pdf", bbox_inches='tight', dpi=300)
    plt.savefig(f"{CONFIG['output_dir']}/fig4_length_bias_control.png", bbox_inches='tight', dpi=300)
plt.show()

In [None]:
# =============================================================================
# FIGURE 5: RECOVERABILITY GAP
# =============================================================================

if probe_results:
    fig, ax = plt.subplots(figsize=(8, 5))
    
    x = np.arange(len(probe_results))
    width = 0.35
    
    conditions_with_probes = list(probe_results.keys())
    probe_accs = [probe_results[c]['mean'] for c in conditions_with_probes]
    probe_stds = [probe_results[c]['std'] for c in conditions_with_probes]
    verb_accs = [probe_results[c]['verbalized_accuracy'] for c in conditions_with_probes]
    
    bars1 = ax.bar(x - width/2, probe_accs, width, label='Probe on I (Recoverability)',
                   color=COLORS['enhanced'], alpha=0.8, yerr=probe_stds, capsize=5)
    bars2 = ax.bar(x + width/2, verb_accs, width, label='Verbalized Output',
                   color=COLORS['baseline'], alpha=0.8)
    
    ax.set_ylabel('Accuracy')
    ax.set_xticks(x)
    ax.set_xticklabels([c.title() for c in conditions_with_probes])
    ax.set_ylim(0, 1)
    ax.legend(loc='upper left')
    ax.set_title('Information Recovery: Probe vs. Verbalized Output')
    
    # Add gap annotations
    for i, (p, v) in enumerate(zip(probe_accs, verb_accs)):
        gap = p - v
        mid_y = (p + v) / 2
        ax.annotate(f'Gap: {gap:+.1%}', xy=(i, mid_y), fontsize=9,
                   ha='center', va='center',
                   bbox=dict(boxstyle='round', facecolor='yellow', alpha=0.5))
    
    plt.tight_layout()
    if CONFIG['save_figures']:
        plt.savefig(f"{CONFIG['output_dir']}/fig5_recoverability_gap.pdf", bbox_inches='tight', dpi=300)
        plt.savefig(f"{CONFIG['output_dir']}/fig5_recoverability_gap.png", bbox_inches='tight', dpi=300)
    plt.show()
else:
    print("⚠️ Probe results not available for plotting")

---
## 9. Export Results

In [None]:
# =============================================================================
# EXPORT RESULTS TABLE
# =============================================================================

import pandas as pd

# Build summary table
rows = []
rows.append(['Accuracy', f"{base_acc:.1%}", f"{enh_acc:.1%}", "N/A"])
rows.append(['Entropy (mean±std)', 
             f"{base_ent:.2f}±{np.std(metrics_data['baseline']['entropy']):.2f}",
             f"{enh_ent:.2f}±{np.std(metrics_data['enhanced']['entropy']):.2f}",
             f"{babble_ent:.2f}±{np.std(metrics_data['babble']['entropy']):.2f}"])
rows.append(['Dim_eff (mean±std)',
             f"{base_dim:.1f}±{np.std([d for d in metrics_data['baseline']['dim_eff'] if d > 0] or [0]):.1f}",
             f"{enh_dim:.1f}±{np.std([d for d in metrics_data['enhanced']['dim_eff'] if d > 0] or [0]):.1f}",
             f"{babble_dim:.1f}±{np.std([d for d in metrics_data['babble']['dim_eff'] if d > 0] or [0]):.1f}"])
rows.append(['Output Length',
             f"{base_len:.1f}",
             f"{enh_len:.1f}",
             f"{babble_len:.1f}"])

if 'baseline' in probe_results:
    rows.append(['Probe Accuracy',
                 f"{probe_results['baseline']['mean']:.3f}±{probe_results['baseline']['std']:.3f}",
                 f"{probe_results['enhanced']['mean']:.3f}±{probe_results['enhanced']['std']:.3f}" if 'enhanced' in probe_results else "N/A",
                 "N/A"])

summary_table = pd.DataFrame(rows, columns=['Metric', 'Baseline', 'CoT', 'Babble'])

print("\n" + "="*60)
print("RESULTS TABLE (for paper)")
print("="*60)
print(summary_table.to_string(index=False))

# Save to CSV
summary_table.to_csv(f"{CONFIG['output_dir']}/results_summary.csv", index=False)
print(f"\n✓ Results saved to {CONFIG['output_dir']}/results_summary.csv")

In [None]:
# =============================================================================
# EXPORT LATEX TABLE
# =============================================================================

latex_lines = [
    r"\begin{table}[h]",
    r"\centering",
    r"\caption{Experiment 4.1 Results: Intention Metrics vs. Reasoning Accuracy}",
    r"\label{tab:exp41_results}",
    r"\begin{tabular}{lccc}",
    r"\hline",
    r"\textbf{Metric} & \textbf{Baseline} & \textbf{CoT} & \textbf{Babble} \\",
    r"\hline",
    f"Accuracy & {base_acc:.1%} & {enh_acc:.1%} & N/A \\\\",
    f"$H_{{int}}(I)$ & {base_ent:.2f} & {enh_ent:.2f} & {babble_ent:.2f} \\\\",
    f"$dim_{{eff}}(I)$ & {base_dim:.1f} & {enh_dim:.1f} & {babble_dim:.1f} \\\\",
    f"Output Length & {base_len:.0f} & {enh_len:.0f} & {babble_len:.0f} \\\\",
]

if probe_results:
    base_probe = probe_results.get('baseline', {}).get('mean', 0)
    enh_probe = probe_results.get('enhanced', {}).get('mean', 0)
    latex_lines.append(f"Probe Accuracy & {base_probe:.3f} & {enh_probe:.3f} & N/A \\\\")

latex_lines.extend([
    r"\hline",
    r"\end{tabular}",
    r"\end{table}"
])

latex_table = "\n".join(latex_lines)

print("\nLaTeX Table:")
print(latex_table)

with open(f"{CONFIG['output_dir']}/results_table.tex", 'w') as f:
    f.write(latex_table)
print(f"\n✓ LaTeX table saved to {CONFIG['output_dir']}/results_table.tex")

In [None]:
# =============================================================================
# SAVE FULL RESULTS AS JSON
# =============================================================================

import json

full_results = {
    'config': CONFIG,
    'summary': {
        'baseline_accuracy': float(base_acc),
        'enhanced_accuracy': float(enh_acc),
        'baseline_entropy': float(base_ent),
        'enhanced_entropy': float(enh_ent),
        'babble_entropy': float(babble_ent),
        'baseline_dim_eff': float(base_dim),
        'enhanced_dim_eff': float(enh_dim),
        'babble_dim_eff': float(babble_dim),
    },
    'layer_dim_eff': {k: [int(v) for v in vals] for k, vals in layer_dim_eff.items() if vals},
    'probe_results': probe_results if probe_results else {},
    'n_problems': len(problems),
}

with open(f"{CONFIG['output_dir']}/full_results.json", 'w') as f:
    json.dump(full_results, f, indent=2)

print(f"✓ Full results saved to {CONFIG['output_dir']}/full_results.json")

In [None]:
# =============================================================================
# LIST ALL GENERATED FILES
# =============================================================================

print("\n" + "="*60)
print("GENERATED FILES")
print("="*60)

for f in sorted(os.listdir(CONFIG['output_dir'])):
    filepath = os.path.join(CONFIG['output_dir'], f)
    size = os.path.getsize(filepath)
    print(f"  {f} ({size/1024:.1f} KB)")

---
## 10. Conclusions

### Summary of Findings

This experiment tested three key predictions of the Intention Collapse framework:

1. **Intention Entropy**: Lower entropy indicates a more "decided" intention state
2. **Effective Dimensionality**: Higher dim_eff correlates with richer reasoning
3. **Recoverability**: Pre-collapse states contain more information than verbalized outputs

### Methodological Strengths

- **Babble control condition** rules out length bias in dim_eff measurements
- **Layer-wise analysis** shows where information is encoded
- **Sanity checks** verify answer extraction is working correctly

### Next Steps

- [ ] Increase N to 500+ for statistical power
- [ ] Try other models (Llama-3-8B, Qwen-2-7B)
- [ ] Implement Experiments 4.2 and 4.3
- [ ] Add statistical significance tests (paired t-tests, etc.)