# Self-Referential IO Demo: Coherence Gate vs Uncontrolled Tempo

This notebook demonstrates why **self-referential AI capabilities** (tool loops, self-eval gates) require governance controls.

We compare two scenarios:
- **Scenario A**: No coherence gate → tempo amplification → validation lag explodes
- **Scenario B**: P10-style coherence gate → tempo controlled → metrics stable

**Runtime**: < 1 minute, no GPU required.

**Reference**: [S-RIOCS Standard](../docs/ai_safety/self_referential_io.md) | [FIT v2.4](../docs/v2.4.md)

In [None]:
# Cell 0: Dependencies (standard library + matplotlib/numpy)
# pip install matplotlib numpy

import numpy as np
import matplotlib.pyplot as plt
from dataclasses import dataclass, field
from typing import List, Tuple
import random

random.seed(42)
np.random.seed(42)

print("Dependencies loaded.")

In [None]:
# Cell 1: Metrics Tracker (VL / RDPR / GBR)

@dataclass
class MetricsTracker:
    """Tracks the three governance metrics from S-RIOCS."""
    
    validation_lag: List[float] = field(default_factory=list)  # VL: hours since change, before eval closes
    rollback_feasibility: List[float] = field(default_factory=list)  # Proxy for RDPR (0-1 scale)
    gate_bypass_events: List[int] = field(default_factory=list)  # Cumulative bypass count
    tempo: List[float] = field(default_factory=list)  # Cycle time (lower = faster)
    
    _cumulative_bypass: int = 0
    
    def record(self, vl: float, rollback_feas: float, bypassed: bool, cycle_time: float):
        self.validation_lag.append(vl)
        self.rollback_feasibility.append(rollback_feas)
        if bypassed:
            self._cumulative_bypass += 1
        self.gate_bypass_events.append(self._cumulative_bypass)
        self.tempo.append(cycle_time)
    
    def summary(self) -> dict:
        return {
            "final_VL": self.validation_lag[-1] if self.validation_lag else 0,
            "mean_VL": np.mean(self.validation_lag) if self.validation_lag else 0,
            "final_rollback_feasibility": self.rollback_feasibility[-1] if self.rollback_feasibility else 1,
            "total_bypasses": self._cumulative_bypass,
            "final_tempo": self.tempo[-1] if self.tempo else 1,
        }

print("MetricsTracker defined.")

In [None]:
# Cell 2: Self-Referential Agent

@dataclass
class SelfReferentialAgent:
    """
    A toy agent with self-eval and tool-loop capabilities.
    
    - self_eval(): produces a confidence score (may have self-confirmation bias)
    - tool_call(): simulates external action, may trigger more loops
    - coherence_gate(): P10-style multi-estimator agreement check
    """
    
    use_coherence_gate: bool = False
    coherence_threshold: float = 0.3  # Max allowed disagreement between estimators
    
    # Internal state
    confidence: float = 0.5
    action_count: int = 0
    loop_depth: int = 0
    max_loop_depth: int = 10  # Hard limit (only enforced if gate is on)
    
    # Self-confirmation bias strength (0 = no bias, 1 = full bias)
    self_confirmation_bias: float = 0.3
    
    def self_eval(self) -> float:
        """
        Self-evaluation with potential confirmation bias.
        Returns confidence score [0, 1].
        """
        # Base evaluation (noisy)
        true_signal = 0.5 + 0.1 * np.sin(self.action_count * 0.1)
        noise = np.random.normal(0, 0.1)
        
        # Self-confirmation bias: tends to confirm current confidence
        biased_eval = (
            (1 - self.self_confirmation_bias) * (true_signal + noise) +
            self.self_confirmation_bias * self.confidence
        )
        
        return np.clip(biased_eval, 0, 1)
    
    def external_eval(self) -> float:
        """
        Independent external evaluation (no self-confirmation bias).
        Used as second estimator in coherence gate.
        """
        true_signal = 0.5 + 0.1 * np.sin(self.action_count * 0.1)
        noise = np.random.normal(0, 0.15)  # Slightly noisier
        return np.clip(true_signal + noise, 0, 1)
    
    def coherence_gate(self) -> Tuple[bool, float]:
        """
        P10-style coherence gate: check if multiple estimators agree.
        Returns (passed, disagreement_score).
        """
        self_score = self.self_eval()
        external_score = self.external_eval()
        disagreement = abs(self_score - external_score)
        
        passed = disagreement <= self.coherence_threshold
        return passed, disagreement
    
    def tool_call(self) -> bool:
        """
        Simulate a tool call / action.
        Returns True if action was taken.
        """
        self.action_count += 1
        self.loop_depth += 1
        
        # Action affects confidence (feedback loop)
        self.confidence = 0.7 * self.confidence + 0.3 * self.self_eval()
        
        return True
    
    def should_continue_loop(self) -> bool:
        """
        Decide whether to continue the tool loop.
        Without gate: continues if self-eval is high enough.
        With gate: also checks coherence and depth limit.
        """
        self_score = self.self_eval()
        
        if self.use_coherence_gate:
            # Check depth limit
            if self.loop_depth >= self.max_loop_depth:
                return False
            
            # Check coherence
            passed, _ = self.coherence_gate()
            if not passed:
                return False
            
            return self_score > 0.4
        else:
            # No gate: just use self-eval (self-confirmation prone)
            return self_score > 0.3  # Lower threshold = more likely to continue
    
    def reset_loop(self):
        """Reset loop depth for new decision cycle."""
        self.loop_depth = 0

print("SelfReferentialAgent defined.")

In [None]:
# Cell 3: Simulation Runner

def run_simulation(
    agent: SelfReferentialAgent,
    num_steps: int = 100,
    base_cycle_time: float = 1.0,
    base_validation_time: float = 2.0,
) -> MetricsTracker:
    """
    Run simulation and track metrics.
    
    Key dynamics:
    - Without gate: cycle time decreases (tempo amplification), VL increases
    - With gate: cycle time bounded, VL stable
    """
    tracker = MetricsTracker()
    
    cycle_time = base_cycle_time
    validation_backlog = 0.0
    rollback_feasibility = 1.0
    
    for step in range(num_steps):
        agent.reset_loop()
        
        # Inner loop: agent decides how many tool calls to make
        loop_actions = 0
        while agent.should_continue_loop():
            agent.tool_call()
            loop_actions += 1
            
            # Safety break for uncontrolled case
            if loop_actions > 50:
                break
        
        # Tempo dynamics
        if agent.use_coherence_gate:
            # Gate keeps tempo bounded
            cycle_time = max(0.5, base_cycle_time - 0.01 * step + np.random.normal(0, 0.05))
        else:
            # No gate: tempo accelerates (cycle time shrinks)
            acceleration = 0.02 * (1 + loop_actions * 0.1)
            cycle_time = max(0.1, cycle_time - acceleration + np.random.normal(0, 0.02))
        
        # Validation lag dynamics
        # VL increases when cycle_time < validation processing rate
        validation_processed = base_validation_time * 0.8  # Validation team capacity
        changes_produced = 1.0 / max(cycle_time, 0.1)  # Changes per unit time
        
        if changes_produced > validation_processed:
            validation_backlog += (changes_produced - validation_processed) * 0.5
        else:
            validation_backlog = max(0, validation_backlog - 0.3)
        
        current_vl = validation_backlog + cycle_time  # VL = backlog + current cycle
        
        # Rollback feasibility decreases with accumulated unvalidated changes
        if agent.use_coherence_gate:
            rollback_feasibility = max(0.6, 1.0 - 0.002 * step + np.random.normal(0, 0.02))
        else:
            rollback_feasibility = max(0.1, rollback_feasibility - 0.005 * loop_actions + np.random.normal(0, 0.01))
        
        # Bypass detection (only relevant for gated case)
        bypassed = False
        if agent.use_coherence_gate:
            # Occasionally someone might bypass (low probability if gate is on)
            bypassed = random.random() < 0.01
        else:
            # No gate = everything is effectively a bypass
            bypassed = loop_actions > 5  # Flag excessive loops as implicit bypass
        
        tracker.record(
            vl=current_vl,
            rollback_feas=rollback_feasibility,
            bypassed=bypassed,
            cycle_time=cycle_time
        )
    
    return tracker

print("Simulation runner defined.")

In [None]:
# Cell 4: Run Scenario A - No Coherence Gate

print("=" * 60)
print("SCENARIO A: No Coherence Gate")
print("Self-eval directly gates actions, no multi-estimator check")
print("=" * 60)

agent_a = SelfReferentialAgent(
    use_coherence_gate=False,
    self_confirmation_bias=0.4,  # Moderate bias
)

tracker_a = run_simulation(agent_a, num_steps=100)

summary_a = tracker_a.summary()
print(f"\nResults after 100 steps:")
print(f"  Final Validation Lag: {summary_a['final_VL']:.2f} hours")
print(f"  Mean Validation Lag:  {summary_a['mean_VL']:.2f} hours")
print(f"  Rollback Feasibility: {summary_a['final_rollback_feasibility']:.2%}")
print(f"  Total Bypass Events:  {summary_a['total_bypasses']}")
print(f"  Final Cycle Time:     {summary_a['final_tempo']:.3f} (lower = faster tempo)")

In [None]:
# Cell 5: Run Scenario B - With P10 Coherence Gate

print("=" * 60)
print("SCENARIO B: With P10-Style Coherence Gate")
print("Multi-estimator agreement required, loop depth bounded")
print("=" * 60)

agent_b = SelfReferentialAgent(
    use_coherence_gate=True,
    coherence_threshold=0.3,
    max_loop_depth=10,
    self_confirmation_bias=0.4,  # Same bias, but gate catches it
)

# Reset random seed for fair comparison
random.seed(42)
np.random.seed(42)

tracker_b = run_simulation(agent_b, num_steps=100)

summary_b = tracker_b.summary()
print(f"\nResults after 100 steps:")
print(f"  Final Validation Lag: {summary_b['final_VL']:.2f} hours")
print(f"  Mean Validation Lag:  {summary_b['mean_VL']:.2f} hours")
print(f"  Rollback Feasibility: {summary_b['final_rollback_feasibility']:.2%}")
print(f"  Total Bypass Events:  {summary_b['total_bypasses']}")
print(f"  Final Cycle Time:     {summary_b['final_tempo']:.3f}")

In [None]:
# Cell 6: Visualization - Side by Side Comparison
import os
import random

import matplotlib.pyplot as plt
import numpy as np

required = ['SelfReferentialAgent', 'run_simulation']
missing = [name for name in required if name not in globals()]
if missing:
    raise RuntimeError(
        'Missing definitions: ' + ', '.join(missing) + '. Run Cells 0-3 first (or use Run All).'
    )

tracker_a = globals().get('tracker_a')
if tracker_a is None:
    print("tracker_a not found; running Scenario A (No Gate) for plotting...")
    random.seed(42)
    np.random.seed(42)
    agent_a = SelfReferentialAgent(
        use_coherence_gate=False,
        self_confirmation_bias=0.4,
    )
    tracker_a = run_simulation(agent_a, num_steps=100)
    globals()['tracker_a'] = tracker_a
    globals()['summary_a'] = tracker_a.summary()

tracker_b = globals().get('tracker_b')
if tracker_b is None:
    print("tracker_b not found; running Scenario B (With Gate) for plotting...")
    random.seed(42)
    np.random.seed(42)
    agent_b = SelfReferentialAgent(
        use_coherence_gate=True,
        coherence_threshold=0.3,
        max_loop_depth=10,
        self_confirmation_bias=0.4,
    )
    tracker_b = run_simulation(agent_b, num_steps=100)
    globals()['tracker_b'] = tracker_b
    globals()['summary_b'] = tracker_b.summary()

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Self-Referential Agent: No Gate vs P10 Coherence Gate', fontsize=14, fontweight='bold')

steps = range(len(tracker_a.validation_lag))

# Top Left: Tempo (Cycle Time)
ax1 = axes[0, 0]
ax1.plot(steps, tracker_a.tempo, 'r-', label='No Gate', alpha=0.8, linewidth=2)
ax1.plot(steps, tracker_b.tempo, 'g-', label='With Gate', alpha=0.8, linewidth=2)
ax1.axhline(y=0.5, color='orange', linestyle='--', alpha=0.5, label='Danger threshold')
ax1.set_xlabel('Step')
ax1.set_ylabel('Cycle Time (lower = faster tempo)')
ax1.set_title('Tempo Amplification')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_ylim(0, 1.2)

# Top Right: Validation Lag
ax2 = axes[0, 1]
ax2.plot(steps, tracker_a.validation_lag, 'r-', label='No Gate', alpha=0.8, linewidth=2)
ax2.plot(steps, tracker_b.validation_lag, 'g-', label='With Gate', alpha=0.8, linewidth=2)
ax2.axhline(y=10, color='orange', linestyle='--', alpha=0.5, label='VL_MAX threshold')
ax2.set_xlabel('Step')
ax2.set_ylabel('Validation Lag (hours)')
ax2.set_title('Validation Lag (VL) - Governance Cannot Keep Up')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Bottom Left: Rollback Feasibility
ax3 = axes[1, 0]
ax3.plot(steps, tracker_a.rollback_feasibility, 'r-', label='No Gate', alpha=0.8, linewidth=2)
ax3.plot(steps, tracker_b.rollback_feasibility, 'g-', label='With Gate', alpha=0.8, linewidth=2)
ax3.axhline(y=0.5, color='orange', linestyle='--', alpha=0.5, label='RDPR_MIN threshold')
ax3.set_xlabel('Step')
ax3.set_ylabel('Rollback Feasibility (0-1)')
ax3.set_title('Rollback Feasibility - Can We Undo?')
ax3.legend()
ax3.grid(True, alpha=0.3)
ax3.set_ylim(0, 1.1)

# Bottom Right: Cumulative Bypass Events
ax4 = axes[1, 1]
ax4.plot(steps, tracker_a.gate_bypass_events, 'r-', label='No Gate (implicit bypasses)', alpha=0.8, linewidth=2)
ax4.plot(steps, tracker_b.gate_bypass_events, 'g-', label='With Gate', alpha=0.8, linewidth=2)
ax4.set_xlabel('Step')
ax4.set_ylabel('Cumulative Bypass Count')
ax4.set_title('Gate Bypass Rate (GBR) - Governance Integrity')
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()

# Save to docs/ai_safety/figures for easy reference in documentation
output_path = '../docs/ai_safety/figures/self_referential_io_comparison.png'
os.makedirs(os.path.dirname(output_path), exist_ok=True)
plt.savefig(output_path, dpi=150, bbox_inches='tight')
plt.show()

print(f"\nFigure saved to: {output_path}")

In [None]:
# Cell 7: IO-SR Classification for This Demo

print("=" * 60)
print("IO-SR CLASSIFICATION: Mapping Demo to S-RIOCS Categories")
print("=" * 60)

io_sr_mapping = [
    {
        "capability": "tool_call() with no depth limit",
        "io_category": "IO-SR-1 (Unbounded tool-use loops)",
        "why_io": "Tempo amplification; cycle time shrinks faster than governance can adapt",
        "mitigation": "max_loop_depth + coherence gate",
    },
    {
        "capability": "self_eval() directly gates actions",
        "io_category": "IO-SR-4 (Self-evaluation as gate)",
        "why_io": "Self-confirmation loop; 'passing the gate' replaces 'meeting intent'",
        "mitigation": "Multi-estimator coherence check (self + external)",
    },
    {
        "capability": "confidence updated by self_eval feedback",
        "io_category": "IO-SR-3 (Memory write-back)",
        "why_io": "Internal state accumulates bias; hard to 'rollback' learned confidence",
        "mitigation": "Bounded update rate + periodic reset/audit",
    },
]

for i, item in enumerate(io_sr_mapping, 1):
    print(f"\n[{i}] {item['capability']}")
    print(f"    Category:   {item['io_category']}")
    print(f"    Why IO:     {item['why_io']}")
    print(f"    Mitigation: {item['mitigation']}")

In [None]:
# Cell 8: Summary Comparison Table

# Ensure summary variables exist
if 'summary_a' not in globals() or summary_a is None:
    if 'tracker_a' in globals() and tracker_a is not None:
        summary_a = tracker_a.summary()
    else:
        raise RuntimeError("tracker_a not found. Run Cells 4-5 first (or use Run All).")

if 'summary_b' not in globals() or summary_b is None:
    if 'tracker_b' in globals() and tracker_b is not None:
        summary_b = tracker_b.summary()
    else:
        raise RuntimeError("tracker_b not found. Run Cells 4-6 first (or use Run All).")

print("=" * 60)
print("SUMMARY: No Gate vs With Gate")
print("=" * 60)

print(f"\n{'Metric':<30} {'No Gate':>15} {'With Gate':>15} {'Delta':>15}")
print("-" * 75)

metrics = [
    ("Final Validation Lag (hrs)", summary_a['final_VL'], summary_b['final_VL']),
    ("Mean Validation Lag (hrs)", summary_a['mean_VL'], summary_b['mean_VL']),
    ("Rollback Feasibility", summary_a['final_rollback_feasibility'], summary_b['final_rollback_feasibility']),
    ("Total Bypass Events", summary_a['total_bypasses'], summary_b['total_bypasses']),
    ("Final Cycle Time", summary_a['final_tempo'], summary_b['final_tempo']),
]

for name, val_a, val_b in metrics:
    delta = val_b - val_a
    delta_str = f"{delta:+.2f}" if isinstance(delta, float) else f"{delta:+d}"
    print(f"{name:<30} {val_a:>15.2f} {val_b:>15.2f} {delta_str:>15}")

print("\n" + "=" * 60)
print("INTERPRETATION")
print("=" * 60)
print("""
Without coherence gate:
  - Tempo accelerates uncontrollably (cycle time -> 0.1)
  - Validation lag explodes (governance can't keep up)
  - Rollback feasibility degrades (accumulated unvalidated changes)
  - Many implicit 'bypasses' (excessive loops without check)

With P10-style coherence gate:
  - Tempo stays bounded (cycle time >= 0.5)
  - Validation lag remains manageable
  - Rollback feasibility stays high
  - Minimal bypass events

Key insight: Self-referential capability is not the problem.
The problem is self-reference WITHOUT multi-estimator governance.
""")

## Next Steps

### For Your Team

1. **Add VL/RDPR/GBR to your dashboard**
   - Validation Lag: time from deploy → eval closure
   - Rollback Drill Pass Rate: % of drills that succeed
   - Gate Bypass Rate: changes that skip required gates

2. **Classify your changes with IO-SR Register**
   - IO-SR-1: Tool loops
   - IO-SR-2: Self-modifying policies
   - IO-SR-3: Memory write-back
   - IO-SR-4: Self-eval gates
   - IO-SR-5: Continuous deployment

3. **Run a rollback drill**
   - Pick one recent IO-class change
   - Attempt rollback in staging
   - Record: time, success/fail, blockers

### References

- [S-RIOCS Standard](../docs/ai_safety/self_referential_io.md) — Full IO-SR categories, YAML templates, RACI
- [FIT v2.4 Spec](../docs/v2.4.md) — Theoretical foundation (EST, tempo mismatch, coherence gates)
- [IO-SR Mapping](../docs/ai_safety/io_sr_mapping.md) — Quick reference for IO classes

---

*This demo is part of the FIT (Force–Information–Time) framework.*  
*License: CC BY 4.0*