# Temporal Uncertainty Tracking in Conversational RAG
## Getting Started Notebook

This notebook demonstrates how to use the Temporal Uncertainty Router for conversational question answering.

**Paper**: "Temporal Uncertainty Tracking in Conversational RAG: Learning to Route Multi-Turn Queries Through Uncertainty Evolution"

## 1. Setup and Installation

In [None]:
# Install dependencies (if not already installed)
# !pip install -r ../requirements.txt

import sys
sys.path.append('..')

import torch
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import AutoTokenizer

# Import our modules
from src.data.dataloader import ConversationDataLoader, create_dataloaders
from src.models.temporal_router import TemporalUncertaintyRouter, ConversationState
from src.models.baselines import create_baseline_models
from src.evaluation.metrics import (
    compute_routing_metrics,
    compute_uncertainty_decay_rate,
    compute_epistemic_convergence_speed
)

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

print("Setup complete!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 2. Load Data

We'll use the CoQA dataset for this example.

In [None]:
# Load CoQA dataset
data_loader = ConversationDataLoader(
    dataset_name='coqa',
    cache_dir='../data/cache',
    min_turns=3,
    max_turns=10
)

train_conversations, val_conversations = data_loader.load_and_preprocess()

print(f"Loaded {len(train_conversations)} training conversations")
print(f"Loaded {len(val_conversations)} validation conversations")

# Show example conversation
example = train_conversations[0]
print(f"\nExample conversation: {example.conversation_id}")
print(f"Context: {example.context[:200]}...")
print(f"\nNumber of turns: {len(example.turns)}")
for i, turn in enumerate(example.turns[:3]):
    print(f"\nTurn {i}:")
    print(f"  Q: {turn.question}")
    print(f"  A: {turn.answer}")

## 3. Initialize Model

Create the Temporal Uncertainty Router model.

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

# Initialize model
model = TemporalUncertaintyRouter(
    encoder_name='bert-base-uncased',
    embedding_dim=768,
    hidden_dim=256,
    num_lstm_layers=2,
    num_sources=4,
    dropout=0.1,
    num_mc_samples=10
).to(device)

num_params = sum(p.numel() for p in model.parameters())
num_trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"\nModel initialized!")
print(f"Total parameters: {num_params:,}")
print(f"Trainable parameters: {num_trainable:,}")

## 4. Single Query Example

Let's test the model on a single query.

In [None]:
# Initialize tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Example query
context = "France is a country in Western Europe. Its capital is Paris, which is known for the Eiffel Tower."
question = "What is the capital of France?"

# Format input
input_text = f"Context: {context} Question: {question}"

# Tokenize
encoding = tokenizer(
    input_text,
    max_length=512,
    padding='max_length',
    truncation=True,
    return_tensors='pt'
).to(device)

# Forward pass
model.eval()
with torch.no_grad():
    output = model(
        input_ids=encoding['input_ids'],
        attention_mask=encoding['attention_mask']
    )

# Display results
routing_probs = output['routing_probs'].cpu().numpy()[0]
epistemic = output['epistemic_uncertainty'].cpu().numpy()[0, 0]
aleatoric = output['aleatoric_uncertainty'].cpu().numpy()[0, 0]

sources = ['Internal KB', 'External Search', 'Clarification', 'Multi-source']
predicted_route = np.argmax(routing_probs)

print("Query:", question)
print(f"\nPredicted route: {sources[predicted_route]}")
print(f"Confidence: {routing_probs[predicted_route]:.4f}")
print(f"\nRouting probabilities:")
for source, prob in zip(sources, routing_probs):
    print(f"  {source}: {prob:.4f}")
print(f"\nUncertainties:")
print(f"  Epistemic (model uncertainty): {epistemic:.4f}")
print(f"  Aleatoric (query ambiguity): {aleatoric:.4f}")
print(f"  Total: {np.sqrt(epistemic**2 + aleatoric**2):.4f}")

## 5. Multi-Turn Conversation Example

Now let's see how uncertainty evolves across multiple turns.

In [None]:
# Create conversation state tracker
conv_state = ConversationState(max_history=10)

# Conversation turns
context = "The Eiffel Tower is a wrought-iron lattice tower in Paris, France. It was built in 1889 and stands 330 meters tall."
questions = [
    "Where is the Eiffel Tower located?",
    "When was it built?",
    "How tall is it?",
    "Who designed it?"
]

# Track metrics across turns
turn_metrics = []

model.eval()
for turn_id, question in enumerate(questions):
    print(f"\n{'='*60}")
    print(f"Turn {turn_id}: {question}")
    print('='*60)
    
    # Prepare input
    input_text = f"Context: {context} Question: {question}"
    encoding = tokenizer(
        input_text,
        max_length=512,
        padding='max_length',
        truncation=True,
        return_tensors='pt'
    ).to(device)
    
    # Forward pass with history
    with torch.no_grad():
        output = model(
            input_ids=encoding['input_ids'],
            attention_mask=encoding['attention_mask'],
            history_embeddings=conv_state.get_history_embeddings(),
            history_uncertainties=conv_state.get_history_uncertainties()
        )
    
    # Extract metrics
    routing_probs = output['routing_probs'].cpu().numpy()[0]
    epistemic = output['epistemic_uncertainty'].cpu().numpy()[0, 0]
    aleatoric = output['aleatoric_uncertainty'].cpu().numpy()[0, 0]
    
    temporal_metrics = {
        k: v.cpu().numpy()[0, 0] if v.numel() > 0 else 0.0
        for k, v in output['temporal_metrics'].items()
    }
    
    # Display
    predicted_route = np.argmax(routing_probs)
    print(f"\nPredicted route: {sources[predicted_route]}")
    print(f"Epistemic uncertainty: {epistemic:.4f}")
    print(f"Aleatoric uncertainty: {aleatoric:.4f}")
    
    if turn_id > 0:
        print(f"\nTemporal Metrics:")
        print(f"  UDR (Uncertainty Decay Rate): {temporal_metrics['udr']:.4f}")
        print(f"  ECS (Epistemic Convergence Speed): {temporal_metrics['ecs']:.4f}")
    
    # Update conversation state
    conv_state.update(
        embedding=output['query_embedding'].squeeze(0),
        epistemic=output['epistemic_uncertainty'].squeeze(0),
        aleatoric=output['aleatoric_uncertainty'].squeeze(0),
        routing_decision=predicted_route
    )
    
    # Track for visualization
    turn_metrics.append({
        'turn': turn_id,
        'question': question,
        'epistemic': epistemic,
        'aleatoric': aleatoric,
        'route': predicted_route,
        **temporal_metrics
    })

## 6. Visualize Uncertainty Evolution

In [None]:
import pandas as pd

# Create DataFrame
df = pd.DataFrame(turn_metrics)

# Plot uncertainty evolution
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Epistemic and Aleatoric uncertainty
ax = axes[0, 0]
ax.plot(df['turn'], df['epistemic'], marker='o', label='Epistemic', linewidth=2)
ax.plot(df['turn'], df['aleatoric'], marker='s', label='Aleatoric', linewidth=2)
ax.set_xlabel('Turn')
ax.set_ylabel('Uncertainty')
ax.set_title('Uncertainty Evolution Across Turns')
ax.legend()
ax.grid(True, alpha=0.3)

# Total uncertainty
ax = axes[0, 1]
total_uncertainty = np.sqrt(df['epistemic']**2 + df['aleatoric']**2)
ax.plot(df['turn'], total_uncertainty, marker='D', color='red', linewidth=2)
ax.set_xlabel('Turn')
ax.set_ylabel('Total Uncertainty')
ax.set_title('Total Uncertainty Over Time')
ax.grid(True, alpha=0.3)

# UDR
ax = axes[1, 0]
if len(df) > 1:
    ax.plot(df['turn'][1:], df['udr'][1:], marker='o', color='green', linewidth=2)
ax.set_xlabel('Turn')
ax.set_ylabel('UDR')
ax.set_title('Uncertainty Decay Rate (UDR)')
ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
ax.grid(True, alpha=0.3)

# Routing decisions
ax = axes[1, 1]
colors = ['blue', 'orange', 'red', 'purple']
route_colors = [colors[r] for r in df['route']]
ax.scatter(df['turn'], df['route'], c=route_colors, s=100)
ax.set_xlabel('Turn')
ax.set_ylabel('Route')
ax.set_yticks([0, 1, 2, 3])
ax.set_yticklabels(sources)
ax.set_title('Routing Decisions')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('uncertainty_evolution.png', dpi=300, bbox_inches='tight')
plt.show()

print("\nVisualization saved to 'uncertainty_evolution.png'")

## 7. Compare with Baselines

In [None]:
# Create baseline models
baselines = create_baseline_models(num_sources=4, device=device)

print("Baseline models:")
for name in baselines.keys():
    print(f"  - {name}")

# Compare on a sample
print("\nExample routing decisions:")
print(f"Temporal Router: {sources[predicted_route]}")
print(f"Random: {sources[baselines['random']()]}")
print(f"Heuristic (turn 2): {sources[baselines['heuristic'](turn_id=2)]}")

## 8. Key Metrics

Summary of the novel temporal metrics introduced in this work.

In [None]:
print("Novel Temporal Metrics:\n")
print("1. Uncertainty Decay Rate (UDR):")
print("   - Measures how quickly uncertainty decreases across turns")
print("   - Higher values = faster uncertainty reduction")
print(f"   - Example value: {df['udr'][1]:.4f}")
print()
print("2. Epistemic Convergence Speed (ECS):")
print("   - Measures how quickly epistemic uncertainty converges to low values")
print("   - Higher values = faster convergence")
print(f"   - Example value: {df['ecs'][1]:.4f}")
print()
print("3. Routing Adaptation Score (RAS):")
print("   - Measures how well routing adapts to uncertainty changes")
print("   - Range: 0-1, higher = better adaptation")
print("   - Computed over full conversations")

## Next Steps

1. **Train the model**: Use `scripts/train.py` to train on full dataset
2. **Evaluate**: Use `scripts/evaluate.py` for comprehensive evaluation
3. **Analyze results**: Check `notebooks/02_model_analysis.ipynb` for detailed analysis
4. **Reproduce paper results**: Run `scripts/run_all_experiments.sh`

For more information, see the README.md and documentation.