# 🚀 SalesAI - Multimodal Generative AI Training Notebook

**Complete training pipeline for the SalesAI multimodal AGI-like model with GPU acceleration**

This notebook provides a comprehensive training pipeline for the SalesAI multimodal generative AI model, optimized for Google Colab with GPU acceleration.

## 🎯 What this notebook does:

1. **Environment Setup**: Install all required dependencies
2. **Repository Setup**: Clone the SalesAI codebase to Google Drive
3. **Model Training**: Train the multimodal model with GPU acceleration
4. **Evaluation**: Comprehensive model evaluation and analysis
5. **Model Saving**: Save trained model and artifacts
6. **Inference**: Test the trained model

## 🏗️ Model Architecture:
- **Multimodal Encoders**: Text, Vision, and Audio processing
- **Unified Transformer Backbone**: Cross-modal attention with MoE
- **Reinforcement Learning**: DQN agent for autonomous learning
- **Meta-Learning**: Rapid task adaptation capabilities

---

**Author**: N.E.N (Nthuku Elijah Nzeli) and SalesA Team  
**Model**: SalesA AI - Multimodal AGI-like Model

## 📋 Prerequisites and Setup

Before running this notebook, ensure you have:
1. **Google Colab Pro** (recommended for better GPU access)
2. **Google Drive** connected
3. **GitHub repository access**

### Runtime Configuration:
- **Hardware accelerator**: GPU (T4, V100, or A100)
- **Runtime type**: Python 3
- **RAM**: 16GB+ (recommended)

In [None]:
# Check GPU availability and configuration
import torch
import sys
import os

print("🔍 Checking system configuration...")
print(f"Python version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
    print(f"GPU count: {torch.cuda.device_count()}")
    
    # Set device
    device = torch.device("cuda")
    print(f"✅ Using GPU: {device}")
else:
    device = torch.device("cpu")
    print("⚠️  No GPU detected, using CPU (training will be slower)")

# Check available memory
import psutil
print(f"RAM available: {psutil.virtual_memory().total / 1e9:.1f} GB")
print(f"RAM free: {psutil.virtual_memory().available / 1e9:.1f} GB")

## 📦 Step 1: Install Dependencies

Install all required packages for the SalesAI model training.

In [None]:
# Install system dependencies
print("📦 Installing system dependencies...")
!apt-get update -qq
!apt-get install -y ffmpeg

# Install Python packages
print("\n📦 Installing Python packages...")
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q transformers datasets tiktoken
!pip install -q scikit-learn numpy matplotlib tqdm
!pip install -q Pillow safetensors torchcodec
!pip install -q soundfile seaborn psutil
!pip install -q accelerate bitsandbytes
!pip install -q wandb tensorboard

print("✅ All dependencies installed successfully!")

## 📁 Step 2: Mount Google Drive and Setup Repository

Mount Google Drive and clone the SalesAI repository.

In [None]:
# Mount Google Drive
from google.colab import drive
import os

print("📁 Mounting Google Drive...")
drive.mount('/content/drive')

# Create project directory
project_dir = "/content/drive/MyDrive/SalesAI_Project"
os.makedirs(project_dir, exist_ok=True)
os.chdir(project_dir)

print(f"📂 Project directory: {project_dir}")
print(f"📂 Current working directory: {os.getcwd()}")

In [None]:
# Clone the SalesAI repository
# Replace with your actual repository URL
repo_url = "https://github.com/your-username/SalesAI.git"  # Update this URL

print(f"📥 Cloning repository: {repo_url}")
!git clone {repo_url} SalesAI

# Navigate to the project directory
os.chdir("SalesAI")
print(f"📂 Working directory: {os.getcwd()}")

# List project files
print("\n📋 Project structure:")
!ls -la

## ✅ Step 3: Verify Setup and Import Modules

Verify that all modules can be imported correctly.

In [None]:
# Add current directory to Python path
import sys
sys.path.append('.')

# Test imports
print("🔍 Testing imports...")

try:
    from config import SalesAConfig
    from model.salesa_model import SalesAModel
    from tokenizer import SalesATokenizer
    from data.dataset import MultimodalDataset
    from train import SalesATrainer
    from evaluate import SalesAEvaluator
    from rl.agent import DQNAgent, SimpleTextEnv
    
    print("✅ All core modules imported successfully!")
    
    # Test configuration
    config = SalesAConfig()
    print(f"✅ Configuration loaded: {config.model_name}")
    
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("Please check the repository structure and file paths.")
    raise

## 📊 Step 4: Data Preparation and Dataset Setup

Prepare datasets for multimodal training.

In [None]:
import torch
from datasets import load_dataset
from tokenizer import build_vocab_with_tiktoken, SalesATokenizer
from data.dataset import MultimodalDataset
from data.collate import create_multimodal_dataloaders

print("📊 Setting up datasets...")

# Load all supported datasets with full multimodal support
datasets_to_load = [
    "luma",           # Multimodal dataset (audio+text)
    "beans",          # Image classification dataset (vision+text)
    "humaneval"       # Code generation dataset (text)
]

print(f"📥 Loading {len(datasets_to_load)} datasets...")

# Initialize basic tokenizer first
tokenizer = SalesATokenizer(vocab_size=32000)

# Create raw dataset with error handling
try:
    raw_dataset = MultimodalDataset(
        config=SalesAConfig(),
        tokenizer=tokenizer,
        split="train",
        dataset_name=datasets_to_load
    )
    
    # Build vocabulary with error handling
    print("🔤 Building vocabulary...")
    try:
        # Get some samples for vocabulary building
        samples_for_vocab = []
        for i in range(min(50, len(raw_dataset))):
            try:
                sample = raw_dataset[i]
                if sample and "text" in sample:
                    samples_for_vocab.append(sample)
            except Exception as e:
                continue
        
        if samples_for_vocab:
            vocab, enc = build_vocab_with_tiktoken(
                samples_for_vocab,
                vocab_size=32000,
                model_name="gpt2"
            )
        else:
            raise Exception("No valid samples found for vocabulary building")
            
    except Exception as e:
        print(f"⚠️  Vocabulary building error: {e}")
        print("🔄 Using fallback vocabulary...")
        # Create basic vocabulary as fallback
        vocab = {"<pad>": 0, "<unk>": 1, "<bos>": 2, "<eos>": 3, "<code>": 4}
        enc = None
    
    # Update tokenizer with built vocabulary
    tokenizer = SalesATokenizer(
        vocab_size=32000,
        vocab=vocab,
        enc=enc
    )
    
    # Fix vocab length calculation
    if isinstance(vocab, dict):
        vocab_length = len(vocab)
    elif isinstance(vocab, list):
        vocab_length = len(vocab)
    else:
        vocab_length = len(list(vocab)) if hasattr(vocab, '__iter__') else 0
    
    print(f"✅ Vocabulary built with {vocab_length} tokens")
    print(f"✅ Tokenizer initialized successfully")
    print(f"✅ Multimodal datasets loaded: Audio, Vision, and Text processing enabled")
    
except Exception as e:
    print(f"❌ Dataset error: {e}")
    print("🔄 Using synthetic data...")
    vocab = {"<pad>": 0, "<unk>": 1, "<bos>": 2, "<eos>": 3, "<code>": 4}
    tokenizer = SalesATokenizer(vocab_size=32000, vocab=vocab, enc=None)
    print(f"✅ Fallback tokenizer created with {len(vocab)} tokens")

In [None]:
# Create dataloaders for training
print("🔄 Creating dataloaders...")

# Load configuration
from utils.config import load_config
config = load_config("multimodal")

# Update config for GPU training
config.training.batch_size = 4  # Adjust based on GPU memory
config.training.use_mixed_precision = True
config.training.gradient_accumulation_steps = 8

# Create dataloaders
train_loader, val_loader = create_multimodal_dataloaders(
    config=config,
    tokenizer=tokenizer,
    batch_size=config.training.batch_size,
    dataset_name=datasets_to_load,
    task_type="multimodal"
)

print(f"✅ Training dataloader: {len(train_loader)} batches")
print(f"✅ Validation dataloader: {len(val_loader)} batches")

# Test a batch
print("\n🧪 Testing data loading...")
for batch in train_loader:
    print(f"Batch keys: {list(batch.keys())}")
    if 'input_ids' in batch:
        print(f"Input shape: {batch['input_ids'].shape}")
    if 'images' in batch:
        print(f"Images shape: {batch['images'].shape}")
    if 'audio' in batch:
        print(f"Audio shape: {batch['audio'].shape}")
    break

print("✅ Data loading test successful!")

## 🧠 Step 5: Initialize Model

Initialize the SalesAI multimodal model with GPU optimization.

In [None]:
import torch
from model.salesa_model import SalesAModel

print("🧠 Initializing SalesAI model...")

# Initialize model with GPU-optimized configuration
model = SalesAModel(config)

# Move model to GPU
model = model.to(device)

# Enable mixed precision training
if config.training.use_mixed_precision:
    from torch.cuda.amp import autocast, GradScaler
    scaler = GradScaler()
    print("✅ Mixed precision training enabled")
else:
    scaler = None

# Print model information
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"📊 Model Information:")
print(f"   - Total parameters: {total_params:,}")
print(f"   - Trainable parameters: {trainable_params:,}")
print(f"   - Model size: {total_params * 4 / 1e6:.1f} MB")
print(f"   - Device: {next(model.parameters()).device}")
print(f"   - Mixed precision: {config.training.use_mixed_precision}")

# Test model forward pass
print("\n🧪 Testing model forward pass...")
model.eval()
with torch.no_grad():
    # Create dummy input
    batch_size = 2
    seq_len = 128
    
    input_ids = torch.randint(0, config.model.vocab_size, (batch_size, seq_len)).to(device)
    attention_mask = torch.ones_like(input_ids).to(device)
    
    # Test text-only forward pass
    outputs = model(
        input_ids=input_ids,
        attention_mask=attention_mask,
        task_type="text"
    )
    
    print(f"✅ Forward pass successful!")
    print(f"   - Output shape: {outputs.logits.shape}")
    print(f"   - Hidden states shape: {outputs.hidden_states[-1].shape}")

model.train()

## 🚀 Step 6: Model Training

Train the SalesAI model with comprehensive monitoring and optimization.

In [None]:
from train import SalesATrainer
import torch.optim as optim
from torch.optim.lr_scheduler import CosineAnnealingLR
import time
from tqdm.notebook import tqdm

print("🚀 Setting up training...")

# Initialize trainer
trainer = SalesATrainer(config)
trainer.model = model
trainer.tokenizer = tokenizer

# Setup optimizer with weight decay
no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
    {
        'params': [p for n, p in model.named_parameters() 
                   if not any(nd in n for nd in no_decay)],
        'weight_decay': config.training.weight_decay
    },
    {
        'params': [p for n, p in model.named_parameters() 
                   if any(nd in n for nd in no_decay)],
        'weight_decay': 0.0
    }
]

optimizer = optim.AdamW(
    optimizer_grouped_parameters,
    lr=config.training.learning_rate,
    betas=(0.9, 0.999),
    eps=1e-8
)

# Setup learning rate scheduler
scheduler = CosineAnnealingLR(
    optimizer,
    T_max=len(train_loader) * config.training.num_epochs,
    eta_min=1e-6
)

trainer.optimizer = optimizer
trainer.scheduler = scheduler
trainer.scaler = scaler

print(f"✅ Optimizer: AdamW with lr={config.training.learning_rate}")
print(f"✅ Scheduler: CosineAnnealingLR")
print(f"✅ Mixed precision: {config.training.use_mixed_precision}")
print(f"✅ Gradient accumulation steps: {config.training.gradient_accumulation_steps}")

In [None]:
# Training loop with progress tracking
print("🚀 Starting training loop...")

# Training metrics
training_history = {
    'train_loss': [],
    'val_loss': [],
    'learning_rate': [],
    'train_accuracy': [],
    'val_accuracy': []
}

best_val_loss = float('inf')
patience_counter = 0
start_time = time.time()

# Training epochs
for epoch in range(config.training.num_epochs):
    epoch_start_time = time.time()
    
    print(f"\n📅 Epoch {epoch + 1}/{config.training.num_epochs}")
    print("=" * 50)
    
    # Training phase
    model.train()
    train_loss = 0.0
    train_correct = 0
    train_total = 0
    
    train_pbar = tqdm(train_loader, desc=f"Training Epoch {epoch + 1}")
    
    for batch_idx, batch in enumerate(train_pbar):
        # Move batch to device
        batch = {k: v.to(device) if isinstance(v, torch.Tensor) else v 
                for k, v in batch.items()}
        
        # Forward pass with mixed precision
        if config.training.use_mixed_precision:
            with autocast():
                outputs = model(**batch)
                loss = outputs.loss
        else:
            outputs = model(**batch)
            loss = outputs.loss
        
        # Scale loss for gradient accumulation
        loss = loss / config.training.gradient_accumulation_steps
        
        # Backward pass
        if config.training.use_mixed_precision:
            scaler.scale(loss).backward()
        else:
            loss.backward()
        
        # Gradient accumulation
        if (batch_idx + 1) % config.training.gradient_accumulation_steps == 0:
            # Gradient clipping
            if config.training.use_mixed_precision:
                scaler.unscale_(optimizer)
                torch.nn.utils.clip_grad_norm_(model.parameters(), config.training.gradient_clip_norm)
                scaler.step(optimizer)
                scaler.update()
            else:
                torch.nn.utils.clip_grad_norm_(model.parameters(), config.training.gradient_clip_norm)
                optimizer.step()
            
            scheduler.step()
            optimizer.zero_grad()
        
        # Update metrics
        train_loss += loss.item() * config.training.gradient_accumulation_steps
        
        # Update progress bar
        train_pbar.set_postfix({
            'loss': f"{loss.item():.4f}",
            'lr': f"{scheduler.get_last_lr()[0]:.2e}"
        })
    
    # Calculate average training loss
    avg_train_loss = train_loss / len(train_loader)
    
    # Validation phase
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    
    with torch.no_grad():
        val_pbar = tqdm(val_loader, desc=f"Validation Epoch {epoch + 1}")
        
        for batch in val_pbar:
            # Move batch to device
            batch = {k: v.to(device) if isinstance(v, torch.Tensor) else v 
                    for k, v in batch.items()}
            
            # Forward pass
            outputs = model(**batch)
            loss = outputs.loss
            
            val_loss += loss.item()
            
            val_pbar.set_postfix({'val_loss': f"{loss.item():.4f}"})
    
    avg_val_loss = val_loss / len(val_loader)
    
    # Update history
    training_history['train_loss'].append(avg_train_loss)
    training_history['val_loss'].append(avg_val_loss)
    training_history['learning_rate'].append(scheduler.get_last_lr()[0])
    
    # Print epoch summary
    epoch_time = time.time() - epoch_start_time
    print(f"\n📊 Epoch {epoch + 1} Summary:")
    print(f"   - Train Loss: {avg_train_loss:.4f}")
    print(f"   - Val Loss: {avg_val_loss:.4f}")
    print(f"   - Learning Rate: {scheduler.get_last_lr()[0]:.2e}")
    print(f"   - Time: {epoch_time:.1f}s")
    
    # Early stopping check
    if avg_val_loss < best_val_loss:
        best_val_loss = avg_val_loss
        patience_counter = 0
        
        # Save best model
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'scheduler_state_dict': scheduler.state_dict(),
            'best_val_loss': best_val_loss,
            'config': config
        }, f'/content/drive/MyDrive/SalesAI_Project/best_model_epoch_{epoch + 1}.pt')
        print(f"   - 💾 Best model saved!")
    else:
        patience_counter += 1
        print(f"   - ⏳ Early stopping patience: {patience_counter}/{config.training.early_stopping_patience}")
    
    # Early stopping
    if patience_counter >= config.training.early_stopping_patience:
        print(f"\n🛑 Early stopping triggered after {epoch + 1} epochs")
        break

total_training_time = time.time() - start_time
print(f"\n🎉 Training completed in {total_training_time / 3600:.1f} hours")
print(f"📊 Best validation loss: {best_val_loss:.4f}")

## 📈 Step 7: Training Visualization

Visualize training progress and metrics.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Create training plots
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('SalesAI Training Progress', fontsize=16, fontweight='bold')

# Plot 1: Training and Validation Loss
axes[0, 0].plot(training_history['train_loss'], label='Train Loss', linewidth=2)
axes[0, 0].plot(training_history['val_loss'], label='Validation Loss', linewidth=2)
axes[0, 0].set_title('Training and Validation Loss')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Learning Rate
axes[0, 1].plot(training_history['learning_rate'], color='green', linewidth=2)
axes[0, 1].set_title('Learning Rate Schedule')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Learning Rate')
axes[0, 1].set_yscale('log')
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Loss Difference
loss_diff = np.array(training_history['train_loss']) - np.array(training_history['val_loss'])
axes[1, 0].plot(loss_diff, color='orange', linewidth=2)
axes[1, 0].axhline(y=0, color='red', linestyle='--', alpha=0.5)
axes[1, 0].set_title('Train-Val Loss Difference')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Loss Difference')
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Loss Ratio
loss_ratio = np.array(training_history['val_loss']) / np.array(training_history['train_loss'])
axes[1, 1].plot(loss_ratio, color='purple', linewidth=2)
axes[1, 1].axhline(y=1, color='red', linestyle='--', alpha=0.5)
axes[1, 1].set_title('Validation/Train Loss Ratio')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Loss Ratio')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Save plots
plt.savefig('/content/drive/MyDrive/SalesAI_Project/training_plots.png', dpi=300, bbox_inches='tight')
print("📊 Training plots saved to Google Drive!")

## 🔍 Step 8: Model Evaluation

Comprehensive evaluation of the trained model.

In [None]:
# Load the best model
print("🔍 Loading best model for evaluation...")

# Find the best model checkpoint
import glob
checkpoint_files = glob.glob('/content/drive/MyDrive/SalesAI_Project/best_model_epoch_*.pt')
if checkpoint_files:
    # Get the latest checkpoint
    latest_checkpoint = max(checkpoint_files, key=lambda x: int(x.split('_')[-1].split('.')[0]))
    
    # Load checkpoint
    checkpoint = torch.load(latest_checkpoint, map_location=device)
    model.load_state_dict(checkpoint['model_state_dict'])
    
    print(f"✅ Loaded checkpoint: {latest_checkpoint}")
    print(f"   - Epoch: {checkpoint['epoch'] + 1}")
    print(f"   - Best validation loss: {checkpoint['best_val_loss']:.4f}")
else:
    print("⚠️  No checkpoint found, using current model state")

model.eval()

In [None]:
# Text generation evaluation
print("📝 Evaluating text generation capabilities...")

from evaluate import SalesAEvaluator

# Initialize evaluator
evaluator = SalesAEvaluator(model, tokenizer)

# Test prompts
test_prompts = [
    "The future of artificial intelligence is",
    "In a world where technology continues to evolve",
    "Machine learning algorithms can",
    "The most important aspect of AGI is",
    "When we think about multimodal AI"
]

print("\n🧪 Text Generation Examples:")
print("=" * 60)

for i, prompt in enumerate(test_prompts, 1):
    print(f"\n📝 Prompt {i}: {prompt}")
    
    # Generate text
    generated_text = evaluator.generate_text(
        prompt=prompt,
        max_length=100,
        temperature=0.7,
        top_p=0.9
    )
    
    print(f"🤖 Generated: {generated_text}")
    
    # Calculate metrics
    metrics = evaluator.evaluate_text_quality(generated_text)
    print(f"📊 Metrics: Fluency={metrics['fluency']:.2f}, Coherence={metrics['coherence']:.2f}")

In [None]:
# Code generation evaluation
print("\n💻 Evaluating code generation capabilities...")

code_prompts = [
    "Write a function to calculate fibonacci numbers",
    "Create a class for a binary tree data structure",
    "Implement a quicksort algorithm",
    "Write a function to find the longest common subsequence",
    "Create a simple neural network class"
]

print("\n🧪 Code Generation Examples:")
print("=" * 60)

for i, prompt in enumerate(code_prompts, 1):
    print(f"\n💻 Code Prompt {i}: {prompt}")
    
    # Generate code
    generated_code = evaluator.generate_code(
        prompt=prompt,
        max_length=200,
        temperature=0.3,
        top_p=0.9
    )
    
    print(f"🤖 Generated Code:")
    print(f"```python")
    print(generated_code)
    print(f"```")
    
    # Evaluate code quality
    code_metrics = evaluator.evaluate_code_quality(generated_code)
    print(f"📊 Code Metrics: Syntax={code_metrics['syntax_valid']}, Completeness={code_metrics['completeness']:.2f}")

In [None]:
# Expert usage analysis
print("\n🔬 Analyzing MoE expert usage...")

# Get expert usage statistics
expert_stats = evaluator.analyze_expert_usage()

print(f"📊 Expert Usage Analysis for {len(expert_stats)} layers:")
print("=" * 60)

# Create expert usage visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 10))
fig.suptitle('MoE Expert Usage Analysis', fontsize=16, fontweight='bold')

for i, stats in enumerate(expert_stats[:4]):  # Show first 4 layers
    row, col = i // 2, i % 2
    
    # Expert usage distribution
    expert_usage = stats['expert_usage']
    axes[row, col].bar(range(len(expert_usage)), expert_usage, alpha=0.7)
    axes[row, col].set_title(f"Layer {stats['layer_name']} - Expert Usage")
    axes[row, col].set_xlabel('Expert Index')
    axes[row, col].set_ylabel('Usage Count')
    axes[row, col].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print expert statistics
for stats in expert_stats:
    print(f"Layer {stats['layer_name']}:")
    print(f"  - Load balance: {stats['load_balance']:.4f}")
    print(f"  - Expert utilization: {stats['utilization']:.2f}%")
    print(f"  - Most used expert: {stats['most_used_expert']}")
    print(f"  - Least used expert: {stats['least_used_expert']}")
    print()

## 🤖 Step 9: Reinforcement Learning Training

Train the RL agent for autonomous learning capabilities.

In [None]:
# Setup RL training
print("🤖 Setting up Reinforcement Learning training...")

from rl.agent import DQNAgent, SimpleTextEnv

# Initialize environment and agent
env = SimpleTextEnv()
agent = DQNAgent(
    model=model,
    tokenizer=tokenizer,
    n_actions=config.data.action_dim,
    buffer_capacity=config.rl.buffer_capacity,
    memory_capacity=config.rl.memory_capacity
)

print(f"✅ Environment initialized")
print(f"✅ RL Agent initialized with {config.rl.buffer_capacity} buffer capacity")
print(f"✅ Episodic memory capacity: {config.rl.memory_capacity}")

In [None]:
# RL training loop
print("🤖 Starting RL training...")

# RL training metrics
rl_history = {
    'episode_rewards': [],
    'episode_losses': [],
    'memory_sizes': [],
    'buffer_sizes': [],
    'episode_lengths': []
}

best_rl_reward = float('-inf')
start_time = time.time()

# Training episodes
for episode in range(config.rl.num_episodes):
    episode_start_time = time.time()
    
    # Train one episode
    metrics = agent.train_episode(env)
    
    # Store metrics
    rl_history['episode_rewards'].append(metrics['reward'])
    rl_history['episode_losses'].append(metrics['avg_loss'])
    rl_history['memory_sizes'].append(metrics['memory_size'])
    rl_history['buffer_sizes'].append(metrics['buffer_size'])
    rl_history['episode_lengths'].append(metrics['episode_length'])
    
    # Print progress
    if (episode + 1) % 50 == 0:
        episode_time = time.time() - episode_start_time
        avg_reward = np.mean(rl_history['episode_rewards'][-50:])
        avg_loss = np.mean(rl_history['episode_losses'][-50:])
        
        print(f"Episode {episode + 1}/{config.rl.num_episodes}:")
        print(f"  - Reward: {metrics['reward']:.2f} (Avg: {avg_reward:.2f})")
        print(f"  - Loss: {metrics['avg_loss']:.4f} (Avg: {avg_loss:.4f})")
        print(f"  - Memory: {metrics['memory_size']}, Buffer: {metrics['buffer_size']}")
        print(f"  - Time: {episode_time:.1f}s")
        print()
    
    # Save best agent
    if metrics['reward'] > best_rl_reward:
        best_rl_reward = metrics['reward']
        torch.save({
            'episode': episode,
            'agent_state_dict': agent.state_dict(),
            'best_reward': best_rl_reward,
            'config': config
        }, f'/content/drive/MyDrive/SalesAI_Project/best_rl_agent_episode_{episode + 1}.pt')
        print(f"  - 💾 Best RL agent saved! (Reward: {best_rl_reward:.2f})")

total_rl_time = time.time() - start_time
print(f"\n🎉 RL training completed in {total_rl_time / 3600:.1f} hours")
print(f"📊 Best RL reward: {best_rl_reward:.2f}")

In [None]:
# RL training visualization
print("📈 Visualizing RL training progress...")

# Create RL plots
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
fig.suptitle('RL Training Progress', fontsize=16, fontweight='bold')

# Plot 1: Episode Rewards
axes[0, 0].plot(rl_history['episode_rewards'], alpha=0.6, linewidth=1)
axes[0, 0].plot(np.convolve(rl_history['episode_rewards'], np.ones(50)/50, mode='valid'), 
                color='red', linewidth=2, label='Moving Average')
axes[0, 0].set_title('Episode Rewards')
axes[0, 0].set_xlabel('Episode')
axes[0, 0].set_ylabel('Reward')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Episode Losses
axes[0, 1].plot(rl_history['episode_losses'], alpha=0.6, linewidth=1)
axes[0, 1].plot(np.convolve(rl_history['episode_losses'], np.ones(50)/50, mode='valid'), 
                color='red', linewidth=2, label='Moving Average')
axes[0, 1].set_title('Episode Losses')
axes[0, 1].set_xlabel('Episode')
axes[0, 1].set_ylabel('Loss')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Memory and Buffer Sizes
axes[0, 2].plot(rl_history['memory_sizes'], label='Memory Size', alpha=0.7)
axes[0, 2].plot(rl_history['buffer_sizes'], label='Buffer Size', alpha=0.7)
axes[0, 2].set_title('Memory and Buffer Usage')
axes[0, 2].set_xlabel('Episode')
axes[0, 2].set_ylabel('Size')
axes[0, 2].legend()
axes[0, 2].grid(True, alpha=0.3)

# Plot 4: Episode Lengths
axes[1, 0].plot(rl_history['episode_lengths'], alpha=0.6, linewidth=1)
axes[1, 0].plot(np.convolve(rl_history['episode_lengths'], np.ones(50)/50, mode='valid'), 
                color='red', linewidth=2, label='Moving Average')
axes[1, 0].set_title('Episode Lengths')
axes[1, 0].set_xlabel('Episode')
axes[1, 0].set_ylabel('Length')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot 5: Reward Distribution
axes[1, 1].hist(rl_history['episode_rewards'], bins=50, alpha=0.7, edgecolor='black')
axes[1, 1].set_title('Reward Distribution')
axes[1, 1].set_xlabel('Reward')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].grid(True, alpha=0.3)

# Plot 6: Loss Distribution
axes[1, 2].hist(rl_history['episode_losses'], bins=50, alpha=0.7, edgecolor='black')
axes[1, 2].set_title('Loss Distribution')
axes[1, 2].set_xlabel('Loss')
axes[1, 2].set_ylabel('Frequency')
axes[1, 2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Save RL plots
plt.savefig('/content/drive/MyDrive/SalesAI_Project/rl_training_plots.png', dpi=300, bbox_inches='tight')
print("📊 RL training plots saved to Google Drive!")

## 💾 Step 10: Save Model and Artifacts

Save the trained model, tokenizer, and all artifacts for future use.

In [None]:
# Save complete model artifacts
print("💾 Saving model artifacts...")

import json
from pathlib import Path

# Create export directory
export_dir = Path('/content/drive/MyDrive/SalesAI_Project/SalesAI_Model')
export_dir.mkdir(parents=True, exist_ok=True)

# Save model
model_path = export_dir / "model.pt"
torch.save({
    'model_state_dict': model.state_dict(),
    'config': config,
    'training_history': training_history,
    'rl_history': rl_history,
    'best_val_loss': best_val_loss,
    'best_rl_reward': best_rl_reward
}, model_path)
print(f"✅ Model saved to {model_path}")

# Save tokenizer
tokenizer_path = export_dir / "tokenizer.json"
tokenizer_config = {
    "vocab": tokenizer.vocab,
    "token_to_id": tokenizer.token_to_id,
    "id_to_token": tokenizer.id_to_token,
    "special_tokens": {
        "pad_token": tokenizer.pad_token,
        "unk_token": tokenizer.unk_token,
        "bos_token": tokenizer.bos_token,
        "eos_token": tokenizer.eos_token,
        "code_token": tokenizer.code_token
    }
}
with open(tokenizer_path, "w") as f:
    json.dump(tokenizer_config, f, indent=2)
print(f"✅ Tokenizer saved to {tokenizer_path}")

# Save configuration
config_path = export_dir / "config.json"
config_dict = {
    "model": {
        "name": config.model.name,
        "author": config.model.author,
        "vocab_size": config.model.vocab_size,
        "hidden_dim": config.model.hidden_dim,
        "num_layers": config.model.num_layers,
        "num_heads": config.model.num_heads,
        "num_experts": config.model.num_experts,
        "top_k": config.model.top_k
    },
    "training": {
        "batch_size": config.training.batch_size,
        "learning_rate": config.training.learning_rate,
        "num_epochs": config.training.num_epochs,
        "use_mixed_precision": config.training.use_mixed_precision
    },
    "rl": {
        "num_episodes": config.rl.num_episodes,
        "buffer_capacity": config.rl.buffer_capacity,
        "memory_capacity": config.rl.memory_capacity
    }
}
with open(config_path, "w") as f:
    json.dump(config_dict, f, indent=2)
print(f"✅ Configuration saved to {config_path}")

# Save training summary
summary_path = export_dir / "training_summary.json"
training_summary = {
    "model_info": {
        "total_parameters": sum(p.numel() for p in model.parameters()),
        "trainable_parameters": sum(p.numel() for p in model.parameters() if p.requires_grad),
        "model_size_mb": sum(p.numel() for p in model.parameters()) * 4 / 1e6
    },
    "training_results": {
        "best_val_loss": best_val_loss,
        "final_train_loss": training_history['train_loss'][-1] if training_history['train_loss'] else None,
        "final_val_loss": training_history['val_loss'][-1] if training_history['val_loss'] else None,
        "total_epochs_trained": len(training_history['train_loss'])
    },
    "rl_results": {
        "best_rl_reward": best_rl_reward,
        "final_avg_reward": np.mean(rl_history['episode_rewards'][-100:]) if rl_history['episode_rewards'] else None,
        "total_episodes_trained": len(rl_history['episode_rewards'])
    },
    "training_time": {
        "total_training_time_hours": total_training_time / 3600,
        "total_rl_time_hours": total_rl_time / 3600
    }
}
with open(summary_path, "w") as f:
    json.dump(training_summary, f, indent=2)
print(f"✅ Training summary saved to {summary_path}")

print(f"\n🎉 All artifacts saved to {export_dir}")
print(f"📁 Model artifacts include:")
print(f"   - model.pt (trained model weights)")
print(f"   - tokenizer.json (tokenizer configuration)")
print(f"   - config.json (model configuration)")
print(f"   - training_summary.json (training results)")

## 🎯 Step 11: Model Inference and Testing

Test the trained model with various inputs and demonstrate its capabilities.

In [None]:
# Load model for inference
print("🎯 Loading model for inference...")

# Load the saved model
model_checkpoint = torch.load(export_dir / "model.pt", map_location=device)
model.load_state_dict(model_checkpoint['model_state_dict'])
model.eval()

print("✅ Model loaded successfully for inference")
print(f"📊 Best validation loss: {model_checkpoint['best_val_loss']:.4f}")
print(f"📊 Best RL reward: {model_checkpoint['best_rl_reward']:.2f}")

In [None]:
# Interactive inference function
def generate_response(prompt, max_length=100, temperature=0.7, task_type="text"):
    """Generate response for given prompt"""
    model.eval()
    
    with torch.no_grad():
        # Tokenize input
        input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
        
        # Generate response
        if task_type == "code":
            # Add code token for code generation
            input_ids = torch.cat([
                torch.tensor([[tokenizer.code_token_id]]).to(device),
                input_ids
            ], dim=1)
        
        # Generate with sampling
        generated_ids = model.generate(
            input_ids=input_ids,
            max_length=input_ids.shape[1] + max_length,
            temperature=temperature,
            do_sample=True,
            top_p=0.9,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id
        )
        
        # Decode response
        response = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
        
        # Remove original prompt
        response = response[len(prompt):].strip()
        
        return response

print("🎯 Interactive Inference Ready!")
print("Use the generate_response() function to test the model.")
print("Example: generate_response('Hello, how are you?', max_length=50)")

In [None]:
# Test various examples
print("🧪 Testing model with various examples...")
print("=" * 60)

# Text generation examples
text_examples = [
    "The future of artificial intelligence is",
    "In a world where technology continues to evolve",
    "The most important aspect of AGI is",
    "When we think about multimodal AI"
]

print("\n📝 Text Generation Examples:")
for i, example in enumerate(text_examples, 1):
    print(f"\n{i}. Prompt: {example}")
    response = generate_response(example, max_length=80, temperature=0.7)
    print(f"   Response: {response}")

# Code generation examples
code_examples = [
    "Write a function to calculate fibonacci numbers",
    "Create a class for a binary tree",
    "Implement a simple neural network"
]

print("\n\n💻 Code Generation Examples:")
for i, example in enumerate(code_examples, 1):
    print(f"\n{i}. Prompt: {example}")
    response = generate_response(example, max_length=150, temperature=0.3, task_type="code")
    print(f"   Response:\n```python\n{response}\n```")

## 📊 Step 12: Performance Analysis and Benchmarks

Analyze model performance and compare with benchmarks.

In [None]:
# Performance analysis
print("📊 Analyzing model performance...")

# Calculate model statistics
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
model_size_mb = total_params * 4 / 1e6

# Memory usage
if torch.cuda.is_available():
    gpu_memory_allocated = torch.cuda.memory_allocated() / 1e6
    gpu_memory_reserved = torch.cuda.memory_reserved() / 1e6
else:
    gpu_memory_allocated = 0
    gpu_memory_reserved = 0

# Inference speed test
print("⚡ Testing inference speed...")
model.eval()
test_prompt = "The quick brown fox jumps over the lazy dog."
input_ids = tokenizer.encode(test_prompt, return_tensors="pt").to(device)

# Warm up
with torch.no_grad():
    for _ in range(5):
        _ = model(input_ids=input_ids)

# Speed test
import time
start_time = time.time()
with torch.no_grad():
    for _ in range(100):
        _ = model(input_ids=input_ids)
end_time = time.time()

avg_inference_time = (end_time - start_time) / 100
tokens_per_second = input_ids.shape[1] / avg_inference_time

# Print performance summary
print("\n📊 Performance Summary:")
print("=" * 50)
print(f"Model Architecture:")
print(f"  - Total parameters: {total_params:,}")
print(f"  - Trainable parameters: {trainable_params:,}")
print(f"  - Model size: {model_size_mb:.1f} MB")
print(f"  - Hidden dimension: {config.model.hidden_dim}")
print(f"  - Number of layers: {config.model.num_layers}")
print(f"  - Number of experts: {config.model.num_experts}")
print(f"  - Top-k experts: {config.model.top_k}")

print(f"\nTraining Results:")
print(f"  - Best validation loss: {best_val_loss:.4f}")
print(f"  - Training epochs: {len(training_history['train_loss'])}")
print(f"  - Total training time: {total_training_time / 3600:.1f} hours")

print(f"\nRL Results:")
print(f"  - Best RL reward: {best_rl_reward:.2f}")
print(f"  - RL episodes: {len(rl_history['episode_rewards'])}")
print(f"  - Total RL time: {total_rl_time / 3600:.1f} hours")

print(f"\nInference Performance:")
print(f"  - Average inference time: {avg_inference_time*1000:.2f} ms")
print(f"  - Tokens per second: {tokens_per_second:.1f}")
print(f"  - GPU memory allocated: {gpu_memory_allocated:.1f} MB")
print(f"  - GPU memory reserved: {gpu_memory_reserved:.1f} MB")

In [None]:
# Comparison with benchmarks
print("\n📈 Model Comparison with Benchmarks:")
print("=" * 50)

# Define benchmark models for comparison
benchmarks = {
    "GPT-2 Small": {
        "params": 124e6,
        "layers": 12,
        "hidden_dim": 768,
        "vocab_size": 50257
    },
    "GPT-2 Medium": {
        "params": 355e6,
        "layers": 24,
        "hidden_dim": 1024,
        "vocab_size": 50257
    },
    "SalesAI (Our Model)": {
        "params": total_params,
        "layers": config.model.num_layers,
        "hidden_dim": config.model.hidden_dim,
        "vocab_size": config.model.vocab_size
    }
}

# Create comparison table
print(f"{'Model':<20} {'Params':<12} {'Layers':<8} {'Hidden':<8} {'Vocab':<8}")
print("-" * 60)

for model_name, specs in benchmarks.items():
    params_m = specs["params"] / 1e6
    print(f"{model_name:<20} {params_m:<12.1f} {specs['layers']:<8} {specs['hidden_dim']:<8} {specs['vocab_size']:<8}")

print(f"\n🎯 Key Advantages of SalesAI:")
print(f"  - Multimodal capabilities (text, vision, audio)")
print(f"  - Mixture of Experts architecture for efficiency")
print(f"  - Reinforcement learning integration")
print(f"  - Meta-learning capabilities")
print(f"  - Cross-modal attention mechanisms")

## 🎉 Conclusion and Next Steps

Congratulations! You have successfully trained a comprehensive multimodal generative AI model using the SalesAI framework. Here's what we accomplished:

### ✅ What We Built:

1. **Complete Training Pipeline**: From data preparation to model deployment
2. **Multimodal Model**: Text, vision, and audio processing capabilities
3. **MoE Architecture**: Efficient mixture of experts implementation
4. **RL Integration**: Autonomous learning through reinforcement learning
5. **Comprehensive Evaluation**: Performance analysis and benchmarking

### 🚀 Model Capabilities:

- **Text Generation**: Human-like text generation with context awareness
- **Code Generation**: Programming code generation with syntax accuracy
- **Multimodal Processing**: Cross-modal understanding and generation
- **Autonomous Learning**: Continuous improvement through RL
- **Efficient Architecture**: MoE design for computational efficiency

### 📊 Training Results:

- **Model Size**: {model_size_mb:.1f} MB
- **Parameters**: {total_params:,}
- **Best Validation Loss**: {best_val_loss:.4f}
- **Best RL Reward**: {best_rl_reward:.2f}
- **Training Time**: {total_training_time / 3600:.1f} hours

### 🔮 Next Steps:

1. **Model Deployment**: Deploy the trained model for production use
2. **Fine-tuning**: Fine-tune on specific domains or tasks
3. **Scaling**: Scale up the model with more parameters and data
4. **Integration**: Integrate with applications and APIs
5. **Research**: Explore advanced capabilities and improvements

### 📁 Saved Artifacts:

All model artifacts have been saved to Google Drive:
- **Model weights**: `/content/drive/MyDrive/SalesAI_Project/SalesAI_Model/model.pt`
  - **Tokenizer**: `/content/drive/MyDrive/SalesAI_Project/SalesAI_Model/tokenizer.json`
  - **Configuration**: `/content/drive/MyDrive/SalesAI_Project/SalesAI_Model/config.json`
  - **Training summary**: `/content/drive/MyDrive/SalesAI_Project/SalesAI_Model/training_summary.json`

### 🎯 Usage Instructions:

To use the trained model in other environments:

```python
# Load the model
checkpoint = torch.load('model.pt', map_location='cuda')
model.load_state_dict(checkpoint['model_state_dict'])

# Generate text
response = generate_response('Your prompt here', max_length=100)
```

---

**Thank you for using SalesAI! 🚀**

This notebook demonstrates the power of multimodal AI and the potential for creating AGI-like systems. The model you've trained represents a significant step toward more intelligent and versatile AI systems.

**Built with ❤️ by N.E.N (Nthuku Elijah Nzeli) and the SalesA Team**

## 📚 Appendix: Additional Resources

### 🔗 Useful Links:

- **SalesAI Repository**: [GitHub Link]
- **PyTorch Documentation**: https://pytorch.org/docs/
- **Hugging Face Transformers**: https://huggingface.co/docs/transformers/
- **Google Colab**: https://colab.research.google.com/

### 📖 Further Reading:

1. **Mixture of Experts**: Switch Transformers paper
2. **Multimodal AI**: CLIP, DALL-E, and GPT-4V research
3. **Reinforcement Learning**: Deep Q-Networks and policy gradients
4. **Meta-Learning**: Few-shot learning and rapid adaptation

### 🛠️ Troubleshooting:

**Common Issues and Solutions:**

1. **Out of Memory Errors**:
   - Reduce batch size
   - Enable gradient checkpointing
   - Use mixed precision training

2. **Slow Training**:
   - Ensure GPU is being used
   - Check for CPU bottlenecks
   - Optimize data loading

3. **Poor Model Performance**:
   - Increase training epochs
   - Adjust learning rate
   - Add more training data

### 📞 Support:

For questions, issues, or contributions:
- **GitHub Issues**: [Repository Issues Page]
- **Email**: [Contact Email]
- **Documentation**: [Project Documentation]

---

**Happy Training! 🎉**