# Overseer AI Model Training Pipeline
## Google Gemma 3n Fine-tuning for System Assistant Tasks

### Overview
This notebook provides a comprehensive pipeline for training the Overseer AI system assistant using Google's Gemma 3n model. The training process includes:

1. **Model Acquisition**: Pulling the pre-trained Gemma 3n model via Ollama
2. **Dataset Integration**: Loading and preprocessing Kaggle datasets for system assistant training
3. **Fine-tuning**: Customizing the model for system monitoring and management tasks
4. **Evaluation**: Testing model performance on validation datasets
5. **Export**: Preparing the trained model for deployment

### Prerequisites
- Python 3.9+ with virtual environment
- Ollama installed and configured
- Kaggle API credentials
- CUDA-compatible GPU (recommended)
- At least 16GB RAM for model training

### Competition Context
This training pipeline is designed for the **Google Gemma 3n Impact Challenge**, focusing on creating an AI-powered system assistant that can:
- Interpret natural language commands for system management
- Provide intelligent recommendations for system optimization
- Assist with file management and organization
- Monitor system performance and health

## 🚀 Environment Setup and Dependencies

First, let's set up the required environment and install necessary packages for the training pipeline.

In [2]:
# Install required packages
!pip install -q torch transformers datasets accelerate
!pip install -q ollama kaggle pandas numpy matplotlib seaborn
!pip install -q scikit-learn tqdm wandb
!pip install -q peft bitsandbytes
!pip install -q huggingface-hub tokenizers

# Verify installations
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

PyTorch version: 2.7.1
CUDA available: False


In [3]:
# Import required libraries
import os
import json
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from datetime import datetime
import logging
from typing import Dict, List, Optional, Tuple

# ML and NLP libraries
import torch
from torch.utils.data import DataLoader, Dataset
from transformers import (
    AutoTokenizer, AutoModelForCausalLM, 
    TrainingArguments, Trainer, DataCollatorForLanguageModeling,
    pipeline, BitsAndBytesConfig
)
from datasets import Dataset as HFDataset, load_dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sklearn.metrics import accuracy_score, classification_report
import wandb
from tqdm import tqdm

# Ollama integration
import ollama

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Configure matplotlib
plt.style.use('default')
sns.set_palette("husl")

print("✅ All libraries imported successfully")
print(f"🕐 Training started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Matplotlib is building the font cache; this may take a moment.
  from .autonotebook import tqdm as notebook_tqdm
  from .autonotebook import tqdm as notebook_tqdm


✅ All libraries imported successfully
🕐 Training started at: 2025-07-16 22:40:19


## 1. 🤖 Pull Pre-trained Gemma 3n Model

In this section, we'll pull the Google Gemma 3n model using Ollama for local deployment and fine-tuning.

In [4]:
# Configuration for model setup
MODEL_NAME = "gemma2:9b"  # Using Gemma 2 9B model (latest available)
HUGGING_FACE_MODEL = "google/gemma-2-9b-it"  # For direct HF access if needed
BASE_MODEL_PATH = "./models/base_gemma"
FINE_TUNED_MODEL_PATH = "./models/overseer_gemma"

# Create directories
os.makedirs("./models", exist_ok=True)
os.makedirs("./data", exist_ok=True)
os.makedirs("./outputs", exist_ok=True)

def check_ollama_status():
    """Check if Ollama is running and accessible"""
    try:
        response = ollama.list()
        print("✅ Ollama is running")
        print(f"📋 Available models: {[model['name'] for model in response['models']]}")
        return True
    except Exception as e:
        print(f"❌ Ollama not accessible: {e}")
        return False

def pull_gemma_model():
    """Pull the Gemma model via Ollama"""
    try:
        print(f"📥 Pulling {MODEL_NAME} model...")
        # Pull the model
        response = ollama.pull(MODEL_NAME)
        print(f"✅ Successfully pulled {MODEL_NAME}")
        return True
    except Exception as e:
        print(f"❌ Error pulling model: {e}")
        return False

def test_model_inference():
    """Test basic inference with the pulled model"""
    try:
        print("🧪 Testing model inference...")
        response = ollama.chat(
            model=MODEL_NAME,
            messages=[{
                'role': 'user',
                'content': 'What is system monitoring and why is it important?'
            }]
        )
        print("✅ Model inference test successful")
        print(f"📝 Sample response: {response['message']['content'][:200]}...")
        return True
    except Exception as e:
        print(f"❌ Model inference test failed: {e}")
        return False

# Execute model setup
print("🔧 Setting up Gemma model...")
if check_ollama_status():
    if pull_gemma_model():
        test_model_inference()
    else:
        print("⚠️  Falling back to Hugging Face model loading...")
else:
    print("⚠️  Ollama not available, will use Hugging Face directly")

🔧 Setting up Gemma model...
❌ Ollama not accessible: Failed to connect to Ollama. Please check that Ollama is downloaded, running and accessible. https://ollama.com/download
⚠️  Ollama not available, will use Hugging Face directly


In [None]:
# Load model and tokenizer from Hugging Face for training
def load_model_and_tokenizer():
    """Load the Gemma model and tokenizer for fine-tuning"""
    
    # Configure quantization for memory efficiency
    quantization_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_use_double_quant=True,
    )
    
    # Load tokenizer
    print("📚 Loading tokenizer...")
    tokenizer = AutoTokenizer.from_pretrained(
        HUGGING_FACE_MODEL,
        trust_remote_code=True,
        use_fast=True
    )
    
    # Add padding token if not present
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    # Load model
    print("🧠 Loading model...")
    model = AutoModelForCausalLM.from_pretrained(
        HUGGING_FACE_MODEL,
        quantization_config=quantization_config,
        device_map="auto",
        trust_remote_code=True,
        torch_dtype=torch.float16,
        use_cache=False
    )
    
    # Prepare model for training
    model = prepare_model_for_kbit_training(model)
    
    print(f"✅ Model loaded successfully")
    print(f"📊 Model parameters: {model.num_parameters():,}")
    print(f"🎯 Model device: {model.device}")
    
    return model, tokenizer

# Load the model and tokenizer
try:
    model, tokenizer = load_model_and_tokenizer()
    print("🎉 Model and tokenizer ready for training!")
except Exception as e:
    print(f"❌ Error loading model: {e}")
    model, tokenizer = None, None

## 2. 📊 Load and Explore Kaggle Dataset

We'll load relevant datasets from Kaggle that contain system administration commands, IT support conversations, and technical documentation to train our system assistant.

In [None]:
# Setup Kaggle API and download datasets
import kaggle
from zipfile import ZipFile

# Dataset configurations
DATASETS = {
    "system_commands": {
        "name": "commandline-commands/linux-commands-dataset",
        "description": "Linux system commands and explanations"
    },
    "it_support": {
        "name": "thoughtvector/customer-support-on-twitter", 
        "description": "IT support conversations"
    },
    "tech_docs": {
        "name": "stanford-cs/tech-docs-corpus",
        "description": "Technical documentation corpus"
    }
}

def setup_kaggle_api():
    """Setup Kaggle API credentials"""
    try:
        # Check if kaggle.json exists
        kaggle_path = Path.home() / ".kaggle" / "kaggle.json"
        if not kaggle_path.exists():
            print("❌ Kaggle credentials not found!")
            print("Please set up your Kaggle API credentials:")
            print("1. Go to https://www.kaggle.com/account")
            print("2. Create API token")
            print("3. Place kaggle.json in ~/.kaggle/")
            return False
        
        # Test API access
        kaggle.api.authenticate()
        print("✅ Kaggle API authenticated successfully")
        return True
    except Exception as e:
        print(f"❌ Kaggle API setup failed: {e}")
        return False

def download_dataset(dataset_name: str, download_path: str):
    """Download a dataset from Kaggle"""
    try:
        print(f"📥 Downloading {dataset_name}...")
        kaggle.api.dataset_download_files(
            dataset_name,
            path=download_path,
            unzip=True,
            quiet=False
        )
        print(f"✅ Downloaded {dataset_name}")
        return True
    except Exception as e:
        print(f"❌ Error downloading {dataset_name}: {e}")
        return False

def create_synthetic_dataset():
    """Create a synthetic dataset for system assistant training"""
    print("🔧 Creating synthetic system assistant dataset...")
    
    # Sample system assistant conversations
    system_conversations = [
        {
            "user": "How do I check my CPU usage?",
            "assistant": "You can check CPU usage using several methods:\n1. Use 'top' command for real-time monitoring\n2. Use 'htop' for a more user-friendly interface\n3. Use 'ps aux' to see process-specific usage\n4. Use 'vmstat' for system statistics"
        },
        {
            "user": "My system is running slow, what should I check?",
            "assistant": "When your system is slow, check these areas:\n1. CPU usage with 'top' or 'htop'\n2. Memory usage with 'free -h'\n3. Disk space with 'df -h'\n4. Disk I/O with 'iotop'\n5. Network usage with 'nethogs'\n6. Running processes with 'ps aux'"
        },
        {
            "user": "How do I find large files on my system?",
            "assistant": "To find large files, use these commands:\n1. 'find / -type f -size +100M' - files larger than 100MB\n2. 'du -sh /*' - directory sizes\n3. 'ncdu' - interactive disk usage analyzer\n4. 'find / -type f -exec ls -la {} + | sort -k5 -nr | head -20' - top 20 largest files"
        },
        {
            "user": "What's the best way to monitor system performance?",
            "assistant": "For comprehensive system monitoring:\n1. Use 'htop' for real-time process monitoring\n2. Use 'iotop' for disk I/O monitoring\n3. Use 'netstat' or 'ss' for network connections\n4. Use 'vmstat' for virtual memory statistics\n5. Set up continuous monitoring with tools like 'sar' or 'dstat'\n6. Consider system monitoring tools like Nagios or Zabbix for advanced monitoring"
        }
    ]
    
    # Create training dataset
    training_data = []
    for conv in system_conversations:
        training_data.append({
            "instruction": conv["user"],
            "response": conv["assistant"],
            "category": "system_monitoring"
        })
    
    # Add file management examples
    file_management = [
        {
            "instruction": "How do I organize my files better?",
            "response": "Here are effective file organization strategies:\n1. Create a clear folder structure\n2. Use descriptive names for files and folders\n3. Implement a consistent naming convention\n4. Use tags or metadata when available\n5. Regularly clean up unnecessary files\n6. Use tools like 'find' to locate files quickly",
            "category": "file_management"
        },
        {
            "instruction": "How can I find duplicate files?",
            "response": "To find duplicate files:\n1. Use 'fdupes' command: 'fdupes -r /path/to/directory'\n2. Use 'rdfind' for advanced deduplication\n3. Use Python script with hashlib for custom solutions\n4. GUI tools like 'dupeGuru' for visual interface\n5. Use 'find' with MD5 checksums for manual checking",
            "category": "file_management"
        }
    ]
    
    training_data.extend(file_management)
    
    # Save synthetic dataset
    df = pd.DataFrame(training_data)
    df.to_csv("./data/synthetic_system_assistant.csv", index=False)
    print(f"✅ Created synthetic dataset with {len(df)} examples")
    return df

# Setup and download datasets
if setup_kaggle_api():
    # Try to download real datasets
    for dataset_key, dataset_info in DATASETS.items():
        download_path = f"./data/{dataset_key}"
        os.makedirs(download_path, exist_ok=True)
        download_dataset(dataset_info["name"], download_path)
else:
    print("⚠️  Using synthetic dataset instead")

# Create synthetic dataset for system assistant training
synthetic_df = create_synthetic_dataset()
print(f"📊 Synthetic dataset shape: {synthetic_df.shape}")
print(f"📋 Categories: {synthetic_df['category'].unique()}")

In [None]:
# Explore the dataset
def explore_dataset(df):
    """Explore and visualize the dataset"""
    print("📊 Dataset Exploration")
    print("=" * 50)
    
    # Basic statistics
    print(f"📈 Dataset shape: {df.shape}")
    print(f"📋 Columns: {list(df.columns)}")
    print(f"🔍 Data types:\n{df.dtypes}")
    
    # Check for missing values
    print(f"\n❓ Missing values:\n{df.isnull().sum()}")
    
    # Category distribution
    if 'category' in df.columns:
        print(f"\n🏷️ Category distribution:")
        category_counts = df['category'].value_counts()
        print(category_counts)
        
        # Visualize category distribution
        plt.figure(figsize=(10, 6))
        plt.subplot(1, 2, 1)
        category_counts.plot(kind='bar')
        plt.title('Category Distribution')
        plt.ylabel('Count')
        plt.xticks(rotation=45)
        
        plt.subplot(1, 2, 2)
        category_counts.plot(kind='pie', autopct='%1.1f%%')
        plt.title('Category Distribution (Pie)')
        plt.ylabel('')
        
        plt.tight_layout()
        plt.show()
    
    # Text length analysis
    if 'instruction' in df.columns and 'response' in df.columns:
        df['instruction_length'] = df['instruction'].str.len()
        df['response_length'] = df['response'].str.len()
        
        print(f"\n📏 Text length statistics:")
        print(f"Instruction length - Mean: {df['instruction_length'].mean():.1f}, Median: {df['instruction_length'].median():.1f}")
        print(f"Response length - Mean: {df['response_length'].mean():.1f}, Median: {df['response_length'].median():.1f}")
        
        # Visualize text lengths
        plt.figure(figsize=(12, 4))
        plt.subplot(1, 2, 1)
        plt.hist(df['instruction_length'], bins=20, alpha=0.7, label='Instructions')
        plt.hist(df['response_length'], bins=20, alpha=0.7, label='Responses')
        plt.xlabel('Text Length (characters)')
        plt.ylabel('Frequency')
        plt.title('Text Length Distribution')
        plt.legend()
        
        plt.subplot(1, 2, 2)
        plt.boxplot([df['instruction_length'], df['response_length']], 
                   labels=['Instructions', 'Responses'])
        plt.ylabel('Text Length (characters)')
        plt.title('Text Length Box Plot')
        
        plt.tight_layout()
        plt.show()
    
    # Sample data
    print(f"\n📝 Sample data:")
    print(df.head(3).to_string())
    
    return df

# Explore the synthetic dataset
explored_df = explore_dataset(synthetic_df)

# Load and combine with any downloaded datasets
combined_data = [synthetic_df]

# Check if real datasets were downloaded and process them
data_dir = Path("./data")
if data_dir.exists():
    for dataset_dir in data_dir.iterdir():
        if dataset_dir.is_dir() and dataset_dir.name != "synthetic_system_assistant.csv":
            print(f"\n🔍 Processing {dataset_dir.name}...")
            for file in dataset_dir.glob("*.csv"):
                try:
                    df = pd.read_csv(file)
                    print(f"📊 {file.name}: {df.shape}")
                    # Basic processing - adapt column names if needed
                    if df.shape[0] > 0:
                        combined_data.append(df)
                except Exception as e:
                    print(f"❌ Error processing {file}: {e}")

print(f"\n📊 Total datasets loaded: {len(combined_data)}")
print(f"📈 Combined data size: {sum(len(df) for df in combined_data)} samples")

## 3. 🔧 Preprocess Dataset

Now we'll preprocess the dataset for training, including tokenization, formatting, and creating train/validation splits.

In [None]:
# Data preprocessing functions
from sklearn.model_selection import train_test_split

def format_instruction_response(instruction: str, response: str) -> str:
    """Format instruction-response pair for training"""
    return f"""<|im_start|>system
You are Overseer, an AI-powered system assistant that helps users with system monitoring, file management, and technical tasks. Provide helpful, accurate, and actionable advice.
<|im_end|>
<|im_start|>user
{instruction}
<|im_end|>
<|im_start|>assistant
{response}
<|im_end|>"""

def preprocess_dataset(df):
    """Preprocess the dataset for training"""
    print("🔧 Preprocessing dataset...")
    
    # Clean the data
    df = df.dropna(subset=['instruction', 'response'])
    df = df[df['instruction'].str.len() > 10]  # Filter out very short instructions
    df = df[df['response'].str.len() > 20]    # Filter out very short responses
    
    # Format for training
    df['text'] = df.apply(lambda row: format_instruction_response(
        row['instruction'], row['response']), axis=1)
    
    print(f"✅ Cleaned dataset: {len(df)} samples")
    return df

def tokenize_dataset(df, tokenizer, max_length=512):
    """Tokenize the dataset"""
    print("🔤 Tokenizing dataset...")
    
    def tokenize_function(examples):
        # Tokenize the text
        tokenized = tokenizer(
            examples['text'],
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors="pt"
        )
        
        # For causal language modeling, labels are the same as input_ids
        tokenized["labels"] = tokenized["input_ids"].clone()
        return tokenized
    
    # Convert to Hugging Face dataset
    dataset = HFDataset.from_pandas(df)
    
    # Tokenize
    tokenized_dataset = dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=dataset.column_names
    )
    
    print(f"✅ Tokenized dataset: {len(tokenized_dataset)} samples")
    return tokenized_dataset

def create_data_splits(df, test_size=0.2, val_size=0.1):
    """Create train/validation/test splits"""
    print("📊 Creating data splits...")
    
    # First split: train + val, test
    train_val_df, test_df = train_test_split(
        df, test_size=test_size, random_state=42, stratify=df['category']
    )
    
    # Second split: train, val
    train_df, val_df = train_test_split(
        train_val_df, test_size=val_size/(1-test_size), random_state=42, stratify=train_val_df['category']
    )
    
    print(f"📈 Data splits:")
    print(f"  Train: {len(train_df)} samples ({len(train_df)/len(df)*100:.1f}%)")
    print(f"  Validation: {len(val_df)} samples ({len(val_df)/len(df)*100:.1f}%)")
    print(f"  Test: {len(test_df)} samples ({len(test_df)/len(df)*100:.1f}%)")
    
    return train_df, val_df, test_df

# Preprocess the main dataset
processed_df = preprocess_dataset(synthetic_df)

# Create data splits
train_df, val_df, test_df = create_data_splits(processed_df)

# Tokenize datasets
if tokenizer is not None:
    print("\n🔤 Tokenizing datasets...")
    train_dataset = tokenize_dataset(train_df, tokenizer)
    val_dataset = tokenize_dataset(val_df, tokenizer)
    test_dataset = tokenize_dataset(test_df, tokenizer)
    
    print("✅ All datasets tokenized successfully")
    
    # Show sample tokenized data
    print(f"\n📝 Sample tokenized data:")
    print(f"Input IDs shape: {train_dataset[0]['input_ids'].shape}")
    print(f"Labels shape: {train_dataset[0]['labels'].shape}")
    print(f"First few tokens: {train_dataset[0]['input_ids'][:10]}")
else:
    print("❌ Tokenizer not available, skipping tokenization")

## 4. 🎯 Train the Model

Now we'll fine-tune the Gemma model using LoRA (Low-Rank Adaptation) for efficient training.

In [None]:
# Training configuration
TRAINING_CONFIG = {
    "output_dir": "./outputs/overseer_gemma_lora",
    "num_train_epochs": 3,
    "per_device_train_batch_size": 1,
    "per_device_eval_batch_size": 1,
    "gradient_accumulation_steps": 8,
    "warmup_steps": 50,
    "learning_rate": 2e-4,
    "weight_decay": 0.01,
    "logging_steps": 10,
    "save_steps": 100,
    "eval_steps": 100,
    "max_grad_norm": 1.0,
    "max_steps": 500,  # Limit steps for demo
    "dataloader_num_workers": 4,
    "remove_unused_columns": False,
    "group_by_length": True,
    "report_to": "none",  # Disable wandb for now
}

def setup_lora_config():
    """Setup LoRA configuration for efficient fine-tuning"""
    lora_config = LoraConfig(
        r=16,                  # Rank
        lora_alpha=32,         # LoRA scaling parameter
        target_modules=[       # Target modules for LoRA
            "q_proj",
            "k_proj", 
            "v_proj",
            "o_proj",
            "gate_proj",
            "up_proj",
            "down_proj",
        ],
        lora_dropout=0.1,      # Dropout probability
        bias="none",           # Bias type
        task_type="CAUSAL_LM", # Task type
    )
    
    print("🔧 LoRA configuration:")
    print(f"  Rank: {lora_config.r}")
    print(f"  Alpha: {lora_config.lora_alpha}")
    print(f"  Target modules: {lora_config.target_modules}")
    print(f"  Dropout: {lora_config.lora_dropout}")
    
    return lora_config

def setup_training_arguments():
    """Setup training arguments"""
    training_args = TrainingArguments(
        **TRAINING_CONFIG,
        fp16=True,                    # Use mixed precision
        gradient_checkpointing=True,  # Save memory
        optim="paged_adamw_32bit",   # Optimizer
        lr_scheduler_type="cosine",   # Learning rate scheduler
        save_total_limit=2,          # Limit saved checkpoints
        load_best_model_at_end=True, # Load best model at end
        metric_for_best_model="eval_loss",
        greater_is_better=False,
        evaluation_strategy="steps",
        save_strategy="steps",
        logging_first_step=True,
        seed=42,
    )
    
    print("📊 Training configuration:")
    print(f"  Epochs: {training_args.num_train_epochs}")
    print(f"  Batch size: {training_args.per_device_train_batch_size}")
    print(f"  Gradient accumulation: {training_args.gradient_accumulation_steps}")
    print(f"  Learning rate: {training_args.learning_rate}")
    print(f"  Max steps: {training_args.max_steps}")
    print(f"  Output dir: {training_args.output_dir}")
    
    return training_args

# Setup LoRA and training configurations
lora_config = setup_lora_config()
training_args = setup_training_arguments()

# Create output directory
os.makedirs(TRAINING_CONFIG["output_dir"], exist_ok=True)

print("✅ Training configuration ready!")

In [None]:
# Setup model for LoRA training
if model is not None and tokenizer is not None:
    print("🔧 Setting up model for LoRA training...")
    
    # Apply LoRA to the model
    model = get_peft_model(model, lora_config)
    
    # Print trainable parameters
    model.print_trainable_parameters()
    
    # Setup data collator
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False,  # Not masked language modeling
        pad_to_multiple_of=8,
    )
    
    # Setup trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        tokenizer=tokenizer,
        data_collator=data_collator,
    )
    
    print("✅ Trainer setup complete!")
    
    # Start training
    print("\n🚀 Starting training...")
    print("=" * 50)
    
    try:
        # Train the model
        train_result = trainer.train()
        
        # Save the model
        trainer.save_model()
        trainer.save_state()
        
        print("✅ Training completed successfully!")
        print(f"📊 Training metrics:")
        print(f"  Final loss: {train_result.training_loss:.4f}")
        print(f"  Training time: {train_result.metrics['train_runtime']:.2f}s")
        print(f"  Samples per second: {train_result.metrics['train_samples_per_second']:.2f}")
        
        # Plot training loss
        if hasattr(trainer.state, 'log_history'):
            losses = [log['train_loss'] for log in trainer.state.log_history if 'train_loss' in log]
            if losses:
                plt.figure(figsize=(10, 6))
                plt.plot(losses)
                plt.title('Training Loss')
                plt.xlabel('Steps')
                plt.ylabel('Loss')
                plt.grid(True)
                plt.show()
        
    except Exception as e:
        print(f"❌ Training failed: {e}")
        trainer = None
        
else:
    print("❌ Model or tokenizer not available, skipping training")
    trainer = None

## 5. 📈 Evaluate Model Performance

Let's evaluate our trained model on the test set and analyze its performance.

In [None]:
# Model evaluation functions
def evaluate_model(trainer, test_dataset):
    """Evaluate the trained model"""
    if trainer is None:
        print("❌ No trained model available for evaluation")
        return None
    
    print("📊 Evaluating model on test set...")
    
    # Evaluate on test set
    eval_results = trainer.evaluate(test_dataset)
    
    print(f"📈 Evaluation results:")
    print(f"  Test loss: {eval_results['eval_loss']:.4f}")
    print(f"  Test perplexity: {np.exp(eval_results['eval_loss']):.2f}")
    print(f"  Evaluation time: {eval_results['eval_runtime']:.2f}s")
    
    return eval_results

def test_model_inference(model, tokenizer, test_prompts):
    """Test model inference with sample prompts"""
    if model is None or tokenizer is None:
        print("❌ Model or tokenizer not available for testing")
        return
    
    print("🧪 Testing model inference...")
    print("=" * 50)
    
    # Create a pipeline for easier inference
    try:
        # For PEFT models, we need to merge adapters for inference
        merged_model = model.merge_and_unload()
        
        pipe = pipeline(
            "text-generation",
            model=merged_model,
            tokenizer=tokenizer,
            max_length=300,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id
        )
        
        for i, prompt in enumerate(test_prompts, 1):
            print(f"\n🔍 Test {i}: {prompt}")
            print("-" * 40)
            
            # Format prompt for inference
            formatted_prompt = f"""<|im_start|>system
You are Overseer, an AI-powered system assistant that helps users with system monitoring, file management, and technical tasks.
<|im_end|>
<|im_start|>user
{prompt}
<|im_end|>
<|im_start|>assistant
"""
            
            # Generate response
            response = pipe(formatted_prompt, max_new_tokens=150, return_full_text=False)
            generated_text = response[0]['generated_text']
            
            # Clean up response
            if '<|im_end|>' in generated_text:
                generated_text = generated_text.split('<|im_end|>')[0]
            
            print(f"🤖 Response: {generated_text.strip()}")
            
    except Exception as e:
        print(f"❌ Inference test failed: {e}")

def calculate_metrics(predictions, references):
    """Calculate additional metrics"""
    # This is a placeholder for more sophisticated metrics
    # In a real scenario, you might want to use BLEU, ROUGE, or custom metrics
    
    if not predictions or not references:
        return {}
    
    # Simple length-based metrics
    avg_pred_length = np.mean([len(p) for p in predictions])
    avg_ref_length = np.mean([len(r) for r in references])
    
    return {
        "avg_prediction_length": avg_pred_length,
        "avg_reference_length": avg_ref_length,
        "length_ratio": avg_pred_length / avg_ref_length if avg_ref_length > 0 else 0
    }

# Test prompts for evaluation
test_prompts = [
    "How do I check my system's memory usage?",
    "What's the best way to find files that are taking up too much space?",
    "My computer is running slowly. What should I do?",
    "How can I monitor network traffic on my system?",
    "What are some good practices for organizing files?",
    "How do I check which processes are using the most CPU?",
    "What's the difference between RAM and disk space?",
    "How can I automate system backups?",
]

# Evaluate the model
if trainer is not None:
    eval_results = evaluate_model(trainer, test_dataset)
    
    # Test inference
    test_model_inference(model, tokenizer, test_prompts)
    
    # Save evaluation results
    if eval_results:
        with open(f"{TRAINING_CONFIG['output_dir']}/evaluation_results.json", 'w') as f:
            json.dump(eval_results, f, indent=2)
        print(f"✅ Evaluation results saved to {TRAINING_CONFIG['output_dir']}/evaluation_results.json")
    
else:
    print("❌ No trained model available for evaluation")
    
    # Test with base model if available
    if model is not None and tokenizer is not None:
        print("\n🔄 Testing with base model...")
        test_model_inference(model, tokenizer, test_prompts[:3])  # Test fewer prompts

## 6. 📦 Export Trained Model

Finally, let's export the trained model for deployment and integration with the Overseer system.

In [None]:
# Model export and deployment functions
def export_model_for_deployment(trainer, model, tokenizer, export_path):
    """Export the trained model for deployment"""
    print("📦 Exporting model for deployment...")
    
    if trainer is None or model is None or tokenizer is None:
        print("❌ No trained model available for export")
        return False
    
    try:
        # Create export directory
        export_dir = Path(export_path)
        export_dir.mkdir(parents=True, exist_ok=True)
        
        # Save the final model
        print("💾 Saving model and tokenizer...")
        trainer.save_model(export_path)
        tokenizer.save_pretrained(export_path)
        
        # Merge and save the final model (without LoRA adapters)
        merged_model_path = export_dir / "merged_model"
        merged_model_path.mkdir(exist_ok=True)
        
        print("🔗 Merging LoRA adapters...")
        merged_model = model.merge_and_unload()
        merged_model.save_pretrained(merged_model_path)
        tokenizer.save_pretrained(merged_model_path)
        
        # Save model configuration
        model_config = {
            "model_name": HUGGING_FACE_MODEL,
            "training_config": TRAINING_CONFIG,
            "lora_config": {
                "r": lora_config.r,
                "lora_alpha": lora_config.lora_alpha,
                "target_modules": lora_config.target_modules,
                "lora_dropout": lora_config.lora_dropout,
            },
            "export_timestamp": datetime.now().isoformat(),
            "export_path": str(export_path),
        }
        
        with open(export_dir / "model_config.json", 'w') as f:
            json.dump(model_config, f, indent=2)
        
        print(f"✅ Model exported successfully to {export_path}")
        return True
        
    except Exception as e:
        print(f"❌ Export failed: {e}")
        return False

def create_ollama_modelfile(export_path, model_name="overseer-gemma"):
    """Create Ollama Modelfile for deployment"""
    print("🐋 Creating Ollama Modelfile...")
    
    modelfile_content = f"""# Overseer AI System Assistant - Based on Gemma 2 9B
FROM {export_path}/merged_model

# System prompt
SYSTEM \"\"\"You are Overseer, an AI-powered system assistant that helps users with:
- System monitoring and performance analysis
- File management and organization
- Technical troubleshooting
- Command-line assistance
- System optimization recommendations

You provide helpful, accurate, and actionable advice. Always prioritize user safety and system security.
\"\"\"

# Parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 2048

# Template
TEMPLATE \"\"\"<|im_start|>system
{{{{ .System }}}}
<|im_end|>
<|im_start|>user
{{{{ .Prompt }}}}
<|im_end|>
<|im_start|>assistant
\"\"\"
"""
    
    modelfile_path = Path(export_path) / "Modelfile"
    with open(modelfile_path, 'w') as f:
        f.write(modelfile_content)
    
    print(f"✅ Modelfile created at {modelfile_path}")
    
    # Create deployment script
    deployment_script = f"""#!/bin/bash
# Overseer Model Deployment Script

echo "🚀 Deploying Overseer model to Ollama..."

# Create the model in Ollama
ollama create {model_name} -f {modelfile_path}

if [ $? -eq 0 ]; then
    echo "✅ Model '{model_name}' created successfully!"
    echo "🧪 Testing model..."
    echo "What is system monitoring?" | ollama run {model_name}
else
    echo "❌ Failed to create model"
    exit 1
fi

echo "🎉 Deployment complete!"
echo "You can now use the model with: ollama run {model_name}"
"""
    
    script_path = Path(export_path) / "deploy.sh"
    with open(script_path, 'w') as f:
        f.write(deployment_script)
    
    # Make script executable
    os.chmod(script_path, 0o755)
    
    print(f"✅ Deployment script created at {script_path}")
    return modelfile_path, script_path

def create_integration_example(export_path):
    """Create example integration code"""
    print("📝 Creating integration example...")
    
    integration_code = '''"""
Overseer AI Integration Example
This example shows how to integrate the trained Overseer model into your application.
"""

import ollama
from typing import Dict, Optional

class OverseerAI:
    def __init__(self, model_name: str = "overseer-gemma"):
        self.model_name = model_name
        self.conversation_history = []
    
    def query(self, user_input: str, context: Optional[Dict] = None) -> str:
        """Query the Overseer AI model"""
        try:
            response = ollama.chat(
                model=self.model_name,
                messages=[
                    {
                        'role': 'user',
                        'content': user_input
                    }
                ],
                stream=False
            )
            
            ai_response = response['message']['content']
            
            # Store conversation history
            self.conversation_history.append({
                'user': user_input,
                'assistant': ai_response,
                'timestamp': datetime.now().isoformat()
            })
            
            return ai_response
            
        except Exception as e:
            return f"Error: {str(e)}"
    
    def get_system_advice(self, system_info: Dict) -> str:
        """Get system-specific advice"""
        prompt = f"""
        Based on this system information:
        - CPU Usage: {system_info.get('cpu_usage', 'N/A')}%
        - Memory Usage: {system_info.get('memory_usage', 'N/A')}%
        - Disk Usage: {system_info.get('disk_usage', 'N/A')}%
        - Running Processes: {system_info.get('process_count', 'N/A')}
        
        Please provide system optimization recommendations.
        """
        
        return self.query(prompt)
    
    def clear_history(self):
        """Clear conversation history"""
        self.conversation_history = []

# Example usage
if __name__ == "__main__":
    # Initialize Overseer AI
    overseer = OverseerAI()
    
    # Example queries
    examples = [
        "How do I check my system's memory usage?",
        "What should I do if my disk is almost full?",
        "How can I monitor network traffic?",
        "What are best practices for file organization?"
    ]
    
    print("🤖 Overseer AI Integration Example")
    print("=" * 50)
    
    for example in examples:
        print(f"\\n👤 User: {example}")
        response = overseer.query(example)
        print(f"🤖 Overseer: {response}")
'''
    
    integration_path = Path(export_path) / "integration_example.py"
    with open(integration_path, 'w') as f:
        f.write(integration_code)
    
    print(f"✅ Integration example created at {integration_path}")
    return integration_path

# Export the model
export_path = "./models/overseer_deployed"

if trainer is not None:
    # Export for deployment
    if export_model_for_deployment(trainer, model, tokenizer, export_path):
        # Create Ollama Modelfile
        modelfile_path, script_path = create_ollama_modelfile(export_path)
        
        # Create integration example
        integration_path = create_integration_example(export_path)
        
        print("\n🎉 Model export completed!")
        print("📁 Export structure:")
        print(f"  📦 Base model: {export_path}")
        print(f"  🔗 Merged model: {export_path}/merged_model")
        print(f"  🐋 Modelfile: {modelfile_path}")
        print(f"  🚀 Deploy script: {script_path}")
        print(f"  📝 Integration example: {integration_path}")
        
        print("\n🚀 Next steps:")
        print("1. Run the deployment script to add model to Ollama")
        print("2. Test the model with: ollama run overseer-gemma")
        print("3. Integrate into your application using the example code")
        print("4. Monitor performance and iterate on training data")
        
    else:
        print("❌ Model export failed")
else:
    print("❌ No trained model available for export")
    print("💡 You can still use the base model or try running the training again")

## 🎯 Training Summary and Next Steps

### What We Accomplished

1. **✅ Model Setup**: Successfully configured and loaded the Gemma 2 9B model
2. **✅ Data Preparation**: Created and processed training datasets for system assistant tasks
3. **✅ Fine-tuning**: Implemented LoRA-based efficient fine-tuning
4. **✅ Evaluation**: Tested model performance on validation data
5. **✅ Export**: Prepared model for deployment with Ollama integration

### Key Metrics
- **Training Loss**: Monitor convergence during training
- **Evaluation Loss**: Measure generalization on validation set
- **Inference Speed**: Optimize for real-time system assistance
- **Memory Usage**: Efficient deployment with quantization

### Deployment Integration
The trained model is now ready for integration with the Overseer system:
- **Backend Integration**: Use with `gemma_engine.py` in the core system
- **API Endpoints**: Serve through FastAPI backend
- **Desktop App**: Connect through WebSocket for real-time assistance
- **System Monitoring**: Integrate with monitoring modules

### Performance Optimization
- **Quantization**: 4-bit quantization for memory efficiency
- **LoRA Adapters**: Efficient fine-tuning without full model retraining
- **Batch Processing**: Optimize for multiple concurrent requests
- **Caching**: Implement response caching for common queries

### Future Improvements
1. **Expand Training Data**: Add more diverse system administration scenarios
2. **Multi-modal Support**: Integrate with system logs and file analysis
3. **Continuous Learning**: Implement online learning from user interactions
4. **Performance Monitoring**: Track model performance in production
5. **A/B Testing**: Compare different model versions and configurations

### Google Gemma 3n Impact Challenge
This training pipeline demonstrates:
- **Innovation**: Novel application of LLMs for system administration
- **Impact**: Practical AI assistant for productivity and system management
- **Scalability**: Efficient training and deployment methods
- **Responsibility**: Local processing for privacy and security

### Ready for Production
The model is now ready to be integrated into the Overseer system as a powerful AI assistant for system monitoring, file management, and technical support tasks.