# RLCard Superhuman Protocol - Iterative Master Cycle

This notebook implements the master control panel for the iterative fine-tuning loop as part of the **RLCard Superhuman Protocol (Pillar 2)**.

## Overview
This notebook performs one complete cycle of:
1. **Cloud Train** - Fine-tune the RLCard CFR model using GPU acceleration
2. **Local Test** - Instructions for local evaluation
3. **Compare** - Analysis of performance improvements

## Requirements
- Google Colab Pro (recommended for GPU access)
- Google Drive mounted
- Repository cloned in Colab environment

## Step 1: Environment Setup

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

import os
print("✅ Google Drive mounted successfully!")
print(f"Available space: {os.statvfs('/content/drive').f_bavail * os.statvfs('/content/drive').f_frsize / (1024**3):.1f} GB")

In [None]:
# Check GPU availability
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠️ No GPU available - training will be slower")

## Step 2: Repository Setup

In [None]:
# Clone the repository (if not already cloned)
import os
if not os.path.exists('/content/poker-ai'):
    !git clone https://github.com/elliotttmiller/poker-ai.git /content/poker-ai
    print("✅ Repository cloned successfully")
else:
    print("✅ Repository already exists")

# Change to repository directory
os.chdir('/content/poker-ai')
print(f"Working directory: {os.getcwd()}")

In [None]:
# Install requirements
!pip install -r requirements.txt
print("✅ Requirements installed successfully")

## Step 3: Model Preparation

Prepare the model directories on Google Drive for the training process.

In [None]:
# Setup Google Drive directories
import os
from pathlib import Path
from datetime import datetime

# Define Google Drive paths
drive_base = "/content/drive/MyDrive/poker-ai-models"
input_dir = f"{drive_base}/input"
output_dir = f"{drive_base}/output"
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
cycle_output_dir = f"{drive_base}/output/cycle_{timestamp}"

# Create directories
Path(input_dir).mkdir(parents=True, exist_ok=True)
Path(cycle_output_dir).mkdir(parents=True, exist_ok=True)

print(f"Input directory: {input_dir}")
print(f"Output directory: {cycle_output_dir}")
print("✅ Google Drive directories created")

In [None]:
# Copy the current best model to input directory
# (First time: copy the original RLCard model)
import shutil

# Check if we have a previous fine-tuned model or use the original
local_model_path = "./models/cfr_pretrained_original"

if not os.path.exists(local_model_path):
    # Download the original model first
    print("Downloading original RLCard CFR model...")
    !python training/verify_pretrained_model.py

# Copy model to Google Drive input directory
if os.path.exists(local_model_path):
    shutil.copytree(local_model_path, f"{input_dir}/current_model", dirs_exist_ok=True)
    print(f"✅ Model copied to {input_dir}/current_model")
else:
    print("⚠️ No model found - creating placeholder")
    Path(f"{input_dir}/current_model").mkdir(parents=True, exist_ok=True)
    with open(f"{input_dir}/current_model/model_info.txt", 'w') as f:
        f.write("Original RLCard CFR model for fine-tuning\n")

## Step 4: Execute Training Burst 🚀

This is where the magic happens - fine-tuning the RLCard CFR model using cloud GPU acceleration.

In [None]:
# Configure training parameters
TRAINING_ITERATIONS = 1000  # Adjust based on your needs
LEARNING_RATE = 1e-5       # Low learning rate for fine-tuning

print(f"Training Configuration:")
print(f"  Iterations: {TRAINING_ITERATIONS:,}")
print(f"  Learning Rate: {LEARNING_RATE}")
print(f"  Input: {input_dir}/current_model")
print(f"  Output: {cycle_output_dir}")

In [None]:
# Execute the training burst
import subprocess
import time

start_time = time.time()

cmd = [
    'python', 'training/run_training_burst.py',
    '--model-input-dir', f'{input_dir}/current_model',
    '--model-output-dir', cycle_output_dir,
    '--num-iterations', str(TRAINING_ITERATIONS),
    '--learning-rate', str(LEARNING_RATE)
]

print("🚀 Starting training burst...")
print(f"Command: {' '.join(cmd)}")
print("-" * 50)

try:
    result = subprocess.run(cmd, capture_output=True, text=True, timeout=3600)  # 1 hour timeout
    
    if result.returncode == 0:
        training_time = time.time() - start_time
        print(f"\n✅ Training burst completed successfully!")
        print(f"Time taken: {training_time:.1f} seconds ({training_time/60:.1f} minutes)")
        print("\nTraining output:")
        print(result.stdout)
    else:
        print(f"\n❌ Training failed with return code {result.returncode}")
        print("Error output:")
        print(result.stderr)
        
except subprocess.TimeoutExpired:
    print("⏰ Training timed out after 1 hour")
except Exception as e:
    print(f"❌ Training failed with exception: {e}")

## Step 5: Training Results Analysis

In [None]:
# Analyze training results
import json

training_report_path = f"{cycle_output_dir}/final_model/training_report.json"

if os.path.exists(training_report_path):
    with open(training_report_path, 'r') as f:
        training_stats = json.load(f)
    
    print("📊 TRAINING RESULTS SUMMARY")
    print("=" * 30)
    print(f"Status: {training_stats.get('status', 'Unknown')}")
    print(f"Iterations completed: {training_stats.get('iterations_completed', 0):,}")
    print(f"Learning rate: {training_stats.get('learning_rate', 'Unknown')}")
    print(f"Average payoff: {training_stats.get('avg_payoff', 0):.2f}")
    print(f"Total hands processed: {training_stats.get('total_hands', 0):,}")
    
    if 'start_time' in training_stats and 'end_time' in training_stats:
        from datetime import datetime
        start = datetime.fromisoformat(training_stats['start_time'])
        end = datetime.fromisoformat(training_stats['end_time'])
        duration = (end - start).total_seconds()
        print(f"Training duration: {duration:.1f} seconds ({duration/60:.1f} minutes)")
    
else:
    print("⚠️ Training report not found - check training output above")

## Step 6: Instructions for Local Testing 📋

**IMPORTANT**: The following steps must be performed on your local machine.

### 6.1 Download the Fine-Tuned Model

1. **Navigate to your Google Drive**
   - Go to Google Drive in your browser
   - Find the folder: `poker-ai-models/output/cycle_[timestamp]/final_model/`
   - Download the entire `final_model` folder

2. **Copy to Local Repository**
   ```bash
   # On your local machine:
   cd /path/to/your/poker-ai
   mkdir -p models/fine_tuned_v1
   # Copy the downloaded files to models/fine_tuned_v1/
   ```

### 6.2 Run Local Evaluation

Run the standardized evaluation on your local machine:

```bash
cd /path/to/your/poker-ai
python evaluation/run_standard_evaluation.py --model-path models/fine_tuned_v1
```

This will generate a new evaluation report in the `reports/` directory.

### 6.3 Compare Performance

Compare the new results with your baseline:

1. **Baseline**: `reports/baseline_performance_RLCard.txt`
2. **New Results**: `reports/fine_tuned_v1_evaluation_[timestamp].txt`

Look for improvements in:
- Win rate
- Average profit
- Average position
- Confidence intervals


## Step 7: Performance Comparison Template 📈

Use this template to track your results:

In [None]:
# Performance tracking template
# Fill this in with your local evaluation results

performance_comparison = {
    "cycle_timestamp": timestamp,
    "training_iterations": TRAINING_ITERATIONS,
    "learning_rate": LEARNING_RATE,
    
    "baseline_performance": {
        "win_rate": 0.56,      # From baseline_performance_RLCard.txt
        "avg_profit": 62.78,   # Fill in your baseline values
        "avg_position": 1.4,
    },
    
    "new_performance": {
        "win_rate": None,      # Fill in after local evaluation
        "avg_profit": None,    # Fill in after local evaluation  
        "avg_position": None,  # Fill in after local evaluation
    },
    
    "improvement": {
        "win_rate_delta": None,     # new_win_rate - baseline_win_rate
        "profit_delta": None,       # new_profit - baseline_profit
        "position_delta": None,     # baseline_position - new_position (lower is better)
    },
    
    "conclusion": "UPDATE_AFTER_LOCAL_TESTING"  # "IMPROVED", "NO_CHANGE", or "DEGRADED"
}

# Save comparison template to Google Drive
import json
comparison_path = f"{cycle_output_dir}/performance_comparison_template.json"
with open(comparison_path, 'w') as f:
    json.dump(performance_comparison, f, indent=2)

print(f"📊 Performance comparison template saved to:")
print(f"   {comparison_path}")
print("\n📋 Next steps:")
print("1. Download the fine-tuned model from Google Drive")
print("2. Run local evaluation")
print("3. Update the performance comparison template")
print("4. Decide whether to continue with this model or run another cycle")

## Step 8: Cycle Completion Checklist ✅

Mark off each step as you complete it:

**Cloud Training (Colab):**
- [ ] Environment setup completed
- [ ] Repository cloned and requirements installed
- [ ] Model prepared in Google Drive
- [ ] Training burst executed successfully
- [ ] Training results analyzed

**Local Testing:**
- [ ] Fine-tuned model downloaded from Google Drive
- [ ] Model copied to local `models/` directory
- [ ] Standard evaluation executed locally
- [ ] New performance report generated

**Performance Analysis:**
- [ ] Baseline vs. new performance compared
- [ ] Performance deltas calculated
- [ ] Conclusion reached (IMPROVED/NO_CHANGE/DEGRADED)

**Decision:**
- [ ] If IMPROVED: Update baseline and prepare for next cycle
- [ ] If NO_CHANGE/DEGRADED: Analyze and adjust training parameters

---

## 🎯 Cycle Complete!

You have successfully completed one full iteration of the **RLCard Superhuman Protocol**. 

The iterative process continues until you achieve superhuman performance levels. Each cycle should bring incremental improvements through careful fine-tuning and evaluation.
