# PokerBot Championship Pipeline (Google Colab)

This notebook sets up and runs the full PokerBot championship pipeline in Colab:
1. Clone the GitHub repository
2. Install dependencies
3. Verify GPU availability
4. Generate training data
5. Train the model (GPU)
6. Validate results
7. Analyze and visualize results

## ⚙️ Setup Instructions
**IMPORTANT:** 
- Set Runtime Type to **GPU** (Runtime → Change runtime type → Hardware accelerator: GPU)
- Recommended: T4 or better for optimal performance
- Estimated total runtime: ~1-2 hours for quick training (1K samples)

## 📊 What This Notebook Does
- Trains a DeepStack neural network for poker decision-making
- Uses championship-level hyperparameters
- Generates comprehensive analysis reports
- Expected correlation: >0.85 (target for production use)

In [None]:
# 1. Clone the PokerBot GitHub repository (with safety checks)
import os
import sys

# Check if already cloned to prevent re-cloning
if os.path.exists('pokerbot'):
    print('✓ Repository already exists, skipping clone.')
    %cd pokerbot
    !git pull origin main  # Update to latest version
else:
    print('Cloning PokerBot repository...')
    !git clone https://github.com/elliotttmiller/pokerbot.git
    %cd pokerbot
    print('✓ Repository cloned successfully')

# Verify we're in the right directory
if not os.path.exists('requirements.txt'):
    print('❌ ERROR: Not in the correct directory. Please restart and re-run.')
    sys.exit(1)

print(f'\n📁 Current directory: {os.getcwd()}')

In [None]:
# 2. Install required dependencies with progress tracking
import subprocess
import sys

print('Installing dependencies...')
print('This may take 2-3 minutes.\n')

# Install with progress
try:
    result = subprocess.run(
        [sys.executable, '-m', 'pip', 'install', '-q', '-r', 'requirements.txt'],
        capture_output=True,
        text=True,
        timeout=300  # 5 minute timeout
    )
    
    if result.returncode == 0:
        print('✓ All dependencies installed successfully')
    else:
        print(f'⚠️ Warning: Some packages may have had issues:')
        print(result.stderr)
except subprocess.TimeoutExpired:
    print('❌ Installation timed out. Please try again.')
except Exception as e:
    print(f'❌ Error during installation: {e}')

# Verify critical packages
print('\n📦 Verifying critical packages...')
critical_packages = ['torch', 'numpy', 'pandas', 'matplotlib']
for package in critical_packages:
    try:
        __import__(package)
        print(f'  ✓ {package}')
    except ImportError:
        print(f'  ❌ {package} - MISSING!')

In [None]:
# 3. Verify GPU availability and system resources
import torch

try:
    import psutil
except ImportError:
    !pip install -q psutil
    import psutil

print('=== SYSTEM RESOURCES ===' )
print(f'\n🖥️  CPU Cores: {psutil.cpu_count()}')
print(f'💾 RAM: {psutil.virtual_memory().total / (1024**3):.1f} GB')
print(f'💿 Disk: {psutil.disk_usage("/").free / (1024**3):.1f} GB free')

print('\n=== GPU STATUS ===')
if torch.cuda.is_available():
    print(f'✓ GPU Available: {torch.cuda.get_device_name(0)}')
    print(f'✓ CUDA Version: {torch.version.cuda}')
    print(f'✓ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / (1024**3):.1f} GB')
    USE_GPU = True
    print('\n✅ Ready for GPU-accelerated training!')
else:
    print('❌ No GPU detected!')
    print('\n⚠️  WARNING: Training will be MUCH slower on CPU.')
    print('   Recommendation: Go to Runtime → Change runtime type → GPU')
    USE_GPU = False
    
    response = input('\nContinue without GPU? (yes/no): ')
    if response.lower() != 'yes':
        print('Please enable GPU and restart the notebook.')
        import sys
        sys.exit(0)

print('\n=== CONFIGURATION ===')
print(f'GPU Training: {"Enabled" if USE_GPU else "Disabled"}')
print('Ready to proceed! ✓')

In [None]:
# 4. Generate training data with progress monitoring
import time
import subprocess
from datetime import datetime

print('=== DATA GENERATION PHASE ===')
print('\n📊 Configuration:')
print('  - Samples: 1000 (quick training)')
print('  - CFR Iterations: 2000 (high quality)')
print('  - Estimated time: 15-20 minutes\n')

print('💡 TIP: For production use, increase to --samples 50000')
print('         (will take ~10 hours but give much better results)\n')

start_time = time.time()
print(f'⏰ Started at: {datetime.now().strftime("%H:%M:%S")}')
print('\nGenerating data... (this will take a while)\n')

try:
    # Run data generation with real-time output
    !python scripts/generate_quick_data.py --samples 1000 --cfr-iters 2000
    
    elapsed = time.time() - start_time
    print(f'\n✓ Data generation completed in {elapsed/60:.1f} minutes')
    
    # Verify output
    import os
    if os.path.exists('src/train_samples'):
        files = os.listdir('src/train_samples')
        print(f'✓ Generated {len(files)} data files')
    else:
        print('⚠️  Warning: Output directory not found')
        
except Exception as e:
    print(f'\n❌ Error during data generation: {e}')
    print('Check the output above for details.')

In [None]:
# 5. Train model with championship configuration
import time
from datetime import datetime

print('=== TRAINING PHASE ===')
print('\n🎯 Configuration:')
print('  - Config: Championship (optimized hyperparameters)')
print('  - Epochs: 200')
print(f'  - GPU: {"Enabled" if USE_GPU else "Disabled"}')
print(f'  - Estimated time: {"30-40 minutes" if USE_GPU else "3-4 hours"}\n')

start_time = time.time()
print(f'⏰ Started at: {datetime.now().strftime("%H:%M:%S")}')
print('\nTraining... (watch for progress updates)\n')

try:
    # Build command based on GPU availability
    gpu_flag = '--use-gpu' if USE_GPU else ''
    cmd = f'python scripts/train_deepstack.py --config scripts/config/championship.json {gpu_flag} --epochs 200'
    
    # Run training with real-time output
    !{cmd}
    
    elapsed = time.time() - start_time
    print(f'\n✓ Training completed in {elapsed/60:.1f} minutes')
    
    # Check for model file
    import os
    if os.path.exists('models/versions/best_model.pt'):
        size_mb = os.path.getsize('models/versions/best_model.pt') / (1024*1024)
        print(f'✓ Model saved: best_model.pt ({size_mb:.1f} MB)')
    else:
        print('⚠️  Warning: Model file not found at expected location')
        
except Exception as e:
    print(f'\n❌ Error during training: {e}')
    print('Check the output above for details.')

In [None]:
# 6. Validate model performance
import os

print('=== VALIDATION PHASE ===')
print('\n📊 Checking model quality metrics...\n')

try:
    # Check if model exists
    model_path = 'models/versions/best_model.pt'
    if not os.path.exists(model_path):
        print(f'❌ Model not found at {model_path}')
        print('Please ensure training completed successfully.')
    else:
        # Run validation
        !python scripts/validate_deepstack_model.py --model models/versions/best_model.pt
        
        print('\n=== INTERPRETATION GUIDE ===')
        print('✅ GOOD: Correlation > 0.85, Relative Error < 5%')
        print('⚠️  NEEDS IMPROVEMENT: Correlation 0.5-0.85, Relative Error 5-20%')
        print('❌ POOR: Correlation < 0.5, Relative Error > 20%')
        print('\n💡 If results are poor, try:')
        print('   1. Generate more data (--samples 10000)')
        print('   2. Train for more epochs (--epochs 300)')
        print('   3. Use adaptive bucket weighting')
        
except Exception as e:
    print(f'❌ Error during validation: {e}')

## 📈 Results Analysis and Visualization

The cells below provide detailed analysis and visualization of your trained model:

### What to Expect:
- **Loss Curves**: Training vs validation loss over epochs
- **Correlation Plots**: How well predictions match targets
- **Per-Street Analysis**: Performance breakdown by game stage
- **Per-Bucket Correlations**: Which hand types the model understands best

### Next Steps:
1. Review validation metrics above
2. Examine visualizations below
3. If performance is good (correlation > 0.85), download the model
4. If performance needs improvement, see troubleshooting tips

### Troubleshooting Tips:
- **Low correlation?** → Regenerate data with more samples or higher CFR iterations
- **High variance?** → Train for more epochs or increase batch size
- **Poor on specific streets?** → Check street coverage in data generation
- **Some buckets poor?** → Use adaptive bucket weighting (see advanced cells)

In [None]:
# 7. Generate and display analysis reports
import os
import matplotlib.pyplot as plt
from IPython.display import Image, display

print('=== GENERATING ANALYSIS REPORTS ===')
print('\nCreating visualizations...\n')

try:
    # Generate analysis report
    !python scripts/run_analysis_report.py
    
    print('\n=== VISUALIZATION PLOTS ===\n')
    
    # List of expected plots
    plots = [
        ('Loss Curves', 'models/reports/loss_curves.png'),
        ('Correlation Analysis', 'models/reports/correlation_curves.png'),
        ('Per-Street Performance', 'models/reports/per_street_correlation.png'),
        ('Per-Bucket Histogram', 'models/reports/per_bucket_corrs_hist.png')
    ]
    
    # Display each plot with error handling
    displayed_count = 0
    for title, plot_path in plots:
        try:
            if os.path.exists(plot_path):
                print(f'📊 {title}')
                display(Image(plot_path))
                displayed_count += 1
            else:
                print(f'⚠️  {title}: Not found at {plot_path}')
        except Exception as e:
            print(f'❌ Error displaying {title}: {e}')
    
    print(f'\n✓ Displayed {displayed_count}/{len(plots)} plots')
    
    if displayed_count == 0:
        print('\n💡 No plots were generated. This might mean:')
        print('   - Analysis script needs the model to be trained first')
        print('   - There was an error in report generation')
        print('   - Check the output above for error messages')
        
except Exception as e:
    print(f'\n❌ Error generating analysis: {e}')
    print('You can still download the model and use it, but visualizations are unavailable.')

---
## 🔧 Advanced Features (Optional)

The cells below provide advanced functionality for power users:
- Interactive metric visualization with detailed statistics
- Model and results download for local use

**Note:** These are optional and only needed for deeper analysis or custom workflows.

In [None]:
# 8. Advanced: Interactive metric visualization
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import os
import json

print('=== INTERACTIVE METRICS VISUALIZATION ===\n')

# Configure matplotlib for better display
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

# 1. Training/Validation Loss Curves
metrics_path = 'models/reports/training_metrics.csv'
if os.path.exists(metrics_path):
    print('📊 Training Metrics Found')
    try:
        df = pd.read_csv(metrics_path)
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
        
        # Loss curves
        train_data = df[df['phase'] == 'train']
        val_data = df[df['phase'] == 'valid']
        
        ax1.plot(train_data['epoch'], train_data['loss'], 
                label='Training Loss', linewidth=2, alpha=0.8)
        ax1.plot(val_data['epoch'], val_data['loss'], 
                label='Validation Loss', linewidth=2, alpha=0.8)
        ax1.set_xlabel('Epoch', fontsize=12)
        ax1.set_ylabel('Loss', fontsize=12)
        ax1.set_title('Training vs Validation Loss', fontsize=14, fontweight='bold')
        ax1.legend(fontsize=11)
        ax1.grid(True, alpha=0.3)
        
        # Correlation over time (if available)
        if 'correlation' in df.columns:
            ax2.plot(val_data['epoch'], val_data['correlation'], 
                    linewidth=2, color='green', alpha=0.8)
            ax2.axhline(y=0.85, color='r', linestyle='--', 
                       label='Target (0.85)', alpha=0.7)
            ax2.set_xlabel('Epoch', fontsize=12)
            ax2.set_ylabel('Correlation', fontsize=12)
            ax2.set_title('Validation Correlation Over Time', 
                         fontsize=14, fontweight='bold')
            ax2.legend(fontsize=11)
            ax2.grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()
        
        # Print summary statistics
        print('\n📈 Summary Statistics:')
        print(f'  Final Training Loss: {train_data["loss"].iloc[-1]:.4f}')
        print(f'  Final Validation Loss: {val_data["loss"].iloc[-1]:.4f}')
        if 'correlation' in df.columns:
            print(f'  Final Correlation: {val_data["correlation"].iloc[-1]:.4f}')
        
    except Exception as e:
        print(f'❌ Error plotting metrics: {e}')
else:
    print(f'⚠️  Training metrics not found at {metrics_path}')

print('\n' + '='*50 + '\n')

# 2. Per-Bucket Correlation Distribution
corr_path = 'models/reports/per_bucket_corrs.json'
if os.path.exists(corr_path):
    print('📊 Per-Bucket Correlations Found')
    try:
        with open(corr_path) as f:
            data = json.load(f)
        
        bucket_corrs = np.array(data.get('bucket_corrs', []))
        
        if len(bucket_corrs) > 0:
            fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
            
            # Histogram
            ax1.hist(bucket_corrs, bins=30, alpha=0.7, color='skyblue', 
                    edgecolor='black')
            ax1.axvline(x=bucket_corrs.mean(), color='r', linestyle='--',
                       label=f'Mean: {bucket_corrs.mean():.3f}', linewidth=2)
            ax1.axvline(x=0.3, color='g', linestyle='--',
                       label='Target: 0.3', linewidth=2, alpha=0.7)
            ax1.set_xlabel('Correlation', fontsize=12)
            ax1.set_ylabel('Count', fontsize=12)
            ax1.set_title('Distribution of Per-Bucket Correlations', 
                         fontsize=14, fontweight='bold')
            ax1.legend(fontsize=11)
            ax1.grid(True, alpha=0.3, axis='y')
            
            # Box plot
            ax2.boxplot(bucket_corrs, vert=True)
            ax2.axhline(y=0.3, color='g', linestyle='--', 
                       label='Target: 0.3', alpha=0.7)
            ax2.set_ylabel('Correlation', fontsize=12)
            ax2.set_title('Per-Bucket Correlation Statistics', 
                         fontsize=14, fontweight='bold')
            ax2.legend(fontsize=11)
            ax2.grid(True, alpha=0.3, axis='y')
            
            plt.tight_layout()
            plt.show()
            
            # Statistics
            print('\n📈 Per-Bucket Statistics:')
            print(f'  Mean Correlation: {bucket_corrs.mean():.4f}')
            print(f'  Median Correlation: {np.median(bucket_corrs):.4f}')
            print(f'  Std Deviation: {bucket_corrs.std():.4f}')
            print(f'  Min: {bucket_corrs.min():.4f}, Max: {bucket_corrs.max():.4f}')
            print(f'  Buckets with Corr > 0.3: {(bucket_corrs > 0.3).sum()}/{len(bucket_corrs)}')
        
    except Exception as e:
        print(f'❌ Error: {e}')
else:
    print(f'⚠️  Per-bucket correlations not found')

print('\n✓ Visualization complete')

In [None]:
# 9. Download trained model and results
import shutil
import os
from datetime import datetime

try:
    from google.colab import files
    IN_COLAB = True
except ImportError:
    IN_COLAB = False
    print('⚠️  Not running in Google Colab - download functionality disabled')

if IN_COLAB:
    print('=== DOWNLOAD ARTIFACTS ===\n')
    print('Preparing files for download...\n')
    
    # Create timestamp for unique filename
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    archive_name = f'pokerbot_artifacts_{timestamp}'
    
    try:
        # Check what we have
        items_to_include = []
        
        if os.path.exists('models/versions/best_model.pt'):
            items_to_include.append('Trained model (best_model.pt)')
        
        if os.path.exists('models/reports'):
            report_files = os.listdir('models/reports')
            items_to_include.append(f'Analysis reports ({len(report_files)} files)')
        
        if os.path.exists('scripts/config/championship.json'):
            items_to_include.append('Championship config')
        
        print('📦 Package contents:')
        for item in items_to_include:
            print(f'  ✓ {item}')
        
        print(f'\nCreating archive: {archive_name}.zip...')
        
        # Create temporary directory for packaging
        os.makedirs('download_package', exist_ok=True)
        
        # Copy files to package directory
        if os.path.exists('models'):
            shutil.copytree('models', 'download_package/models', dirs_exist_ok=True)
        
        if os.path.exists('scripts/config'):
            shutil.copytree('scripts/config', 'download_package/config', dirs_exist_ok=True)
        
        # Create README for the package
        with open('download_package/README.txt', 'w') as f:
            f.write('PokerBot Training Artifacts\n')
            f.write('='*50 + '\n\n')
            f.write(f'Generated: {datetime.now()}\n\n')
            f.write('Contents:\n')
            for item in items_to_include:
                f.write(f'  - {item}\n')
            f.write('\nUsage:\n')
            f.write('  1. Extract this archive\n')
            f.write('  2. Model file: models/versions/best_model.pt\n')
            f.write('  3. Reports: models/reports/\n')
            f.write('  4. Config: config/championship.json\n')
        
        # Create ZIP archive
        shutil.make_archive(archive_name, 'zip', 'download_package')
        
        # Get file size
        file_size = os.path.getsize(f'{archive_name}.zip') / (1024 * 1024)
        print(f'✓ Archive created: {archive_name}.zip ({file_size:.1f} MB)')
        
        # Trigger download
        print('\n⬇️  Downloading...')
        files.download(f'{archive_name}.zip')
        print('✓ Download started!')
        
        # Cleanup
        shutil.rmtree('download_package')
        
        print('\n✅ Download complete!')
        print('\n💡 Next steps:')
        print('   1. Extract the ZIP file on your computer')
        print('   2. Use the model in your poker application')
        print('   3. Review the analysis reports for insights')
        
    except Exception as e:
        print(f'\n❌ Error creating download package: {e}')
        print('\n💡 Alternative: Manually download files from the Files panel')
else:
    print('\n💡 To download files in local Jupyter:')
    print('   - Navigate to the Files panel')
    print('   - Right-click on models/ folder → Download')

---
## 📚 Additional Resources and Tips

### 🎯 Performance Targets
- **Correlation**: > 0.85 (excellent), 0.5-0.85 (good), < 0.5 (needs work)
- **Relative Error**: < 5% (excellent), 5-20% (good), > 20% (needs work)
- **Coverage**: All streets should have > 95% coverage

### 🔧 Troubleshooting Common Issues

**1. Out of Memory Errors**
- Reduce batch size: Edit `championship.json` and set `batch_size: 256`
- Use CPU instead: Remove `--use-gpu` flag
- Restart runtime: Runtime → Restart runtime

**2. Poor Model Performance (Low Correlation)**
- Generate more data: Increase `--samples 5000` or `--samples 10000`
- Higher quality: Increase `--cfr-iters 2500` or `--cfr-iters 3000`
- Train longer: Increase `--epochs 300`
- Use adaptive sampling: See OPTIMIZATION_GUIDE.md

**3. Training Takes Too Long**
- Verify GPU is enabled (see cell 3)
- Start with fewer samples for testing: `--samples 100 --cfr-iters 500`
- Reduce epochs for initial testing: `--epochs 50`

**4. Files Not Found Errors**
- Ensure previous cells ran successfully
- Check for error messages in cell outputs
- Re-run cells in order from top to bottom

### 📖 Documentation
- **QUICK_REFERENCE.md**: Quick start guide and common commands
- **OPTIMIZATION_GUIDE.md**: Comprehensive 8-page optimization guide
- **AUDIT_SUMMARY.md**: Summary of fixes and improvements
- **README.md**: General project information

### 💡 Pro Tips
1. **Start small**: Test with 100-1000 samples before scaling up
2. **Monitor progress**: Watch the training output for convergence
3. **Save checkpoints**: The training script automatically saves best models
4. **Use GPU**: 10-20x faster than CPU training
5. **Download artifacts**: Save your trained models before closing the notebook

### 🔗 Repository
GitHub: https://github.com/elliotttmiller/pokerbot

### 📧 Support
For issues or questions, please open an issue on GitHub.

---
**Happy Training! 🎰🤖**