In [None]:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "header"
      },
      "source": [
        "# üöÄ Sarika AI - Stage 2 Training\n",
        "## Context Teachers Sequential Distillation\n",
        "\n",
        "**Author:** Noushad  \n",
        "**Repository:** [noushad999/sarika-ai](https://github.com/noushad999/sarika-ai)  \n",
        "**GPU Required:** T4 (16GB VRAM)  \n",
        "**Duration:** ~4-6 hours  \n",
        "\n",
        "### What This Does:\n",
        "- Loads Llama-3.1-8B student model\n",
        "- Trains with 6 specialized teacher models sequentially\n",
        "- Applies LoRA (only ~16M trainable parameters)\n",
        "- Saves checkpoints to Google Drive\n",
        "\n",
        "### Teachers:\n",
        "1. **Bengali Culture** - Mistral-7B\n",
        "2. **Emotional Intelligence** - Qwen2.5-7B\n",
        "3. **Conversation Flow** - Llama-3.1-8B\n",
        "4. **Humor** - Gemma-2-9B\n",
        "5. **Deep Conversations** - Mistral-7B\n",
        "6. **Crisis Support** - Qwen2.5-7B"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "step1"
      },
      "source": [
        "## ‚úÖ Step 1: Check GPU"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "check_gpu"
      },
      "outputs": [],
      "source": [
        "!nvidia-smi\n",
        "\n",
        "import torch\n",
        "print(f\"\\n{'='*60}\")\n",
        "print(f\"PyTorch Version: {torch.__version__}\")\n",
        "print(f\"CUDA Available: {torch.cuda.is_available()}\")\n",
        "if torch.cuda.is_available():\n",
        "    print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n",
        "    print(f\"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f}GB\")\n",
        "print(f\"{'='*60}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "step2"
      },
      "source": [
        "## üìÇ Step 2: Mount Drive & Clone Repository"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mount_drive"
      },
      "outputs": [],
      "source": [
        "# Mount Google Drive\n",
        "from google.colab import drive\n",
        "drive.mount('/content/drive')\n",
        "\n",
        "# Navigate to Drive\n",
        "import os\n",
        "os.chdir('/content/drive/MyDrive')\n",
        "\n",
        "# Clone repository (skip if already exists)\n",
        "if not os.path.exists('sarika-ai'):\n",
        "    !git clone https://github.com/noushad999/sarika-ai.git\n",
        "    print(\"‚úì Repository cloned\")\n",
        "else:\n",
        "    print(\"‚úì Repository already exists\")\n",
        "    %cd sarika-ai\n",
        "    !git pull origin main\n",
        "    print(\"‚úì Repository updated\")\n",
        "\n",
        "# Change to project directory\n",
        "%cd sarika-ai\n",
        "\n",
        "!ls -la"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "step3"
      },
      "source": [
        "## üì¶ Step 3: Install Dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "install_deps"
      },
      "outputs": [],
      "source": [
        "# Install required packages\n",
        "!pip install -q transformers accelerate peft bitsandbytes datasets huggingface_hub scipy\n",
        "\n",
        "print(\"‚úì Dependencies installed\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "step4"
      },
      "source": [
        "## üîë Step 4: HuggingFace Login"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "hf_login"
      },
      "outputs": [],
      "source": [
        "# Login to HuggingFace\n",
        "from huggingface_hub import login\n",
        "from getpass import getpass\n",
        "\n",
        "HF_TOKEN = getpass(\"Enter your HuggingFace token: \")\n",
        "login(token=HF_TOKEN)\n",
        "\n",
        "# Set environment variable\n",
        "import os\n",
        "os.environ[\"HF_TOKEN\"] = HF_TOKEN\n",
        "\n",
        "print(\"‚úì Logged in to HuggingFace\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "step5"
      },
      "source": [
        "## ‚öôÔ∏è Step 5: Update Config for Colab"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "config"
      },
      "outputs": [],
      "source": [
        "%%writefile ml/config.py\n",
        "\"\"\"\n",
        "Sarika AI - Configuration (Colab Optimized)\n",
        "\"\"\"\n",
        "\n",
        "import torch\n",
        "from pathlib import Path\n",
        "import os\n",
        "\n",
        "# Paths - Colab\n",
        "PROJECT_ROOT = Path(\"/content/drive/MyDrive/sarika-ai\")\n",
        "CHECKPOINT_DIR = PROJECT_ROOT / \"ml\" / \"checkpoints\"\n",
        "MODEL_DIR = PROJECT_ROOT / \"models\"\n",
        "DATA_DIR = PROJECT_ROOT / \"data\"\n",
        "HF_HOME = str(PROJECT_ROOT / \"models\" / \"cache\")\n",
        "\n",
        "# Create directories\n",
        "CHECKPOINT_DIR.mkdir(parents=True, exist_ok=True)\n",
        "MODEL_DIR.mkdir(parents=True, exist_ok=True)\n",
        "DATA_DIR.mkdir(parents=True, exist_ok=True)\n",
        "\n",
        "# Set environment\n",
        "os.environ[\"HF_HOME\"] = HF_HOME\n",
        "os.environ[\"TRANSFORMERS_CACHE\"] = HF_HOME\n",
        "os.environ[\"HF_DATASETS_CACHE\"] = str(DATA_DIR / \"datasets_cache\")\n",
        "\n",
        "# Device\n",
        "DEVICE = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
        "GPU_MEMORY_GB = 16  # T4\n",
        "\n",
        "# Tokens\n",
        "HF_TOKEN = os.getenv(\"HF_TOKEN\")\n",
        "\n",
        "# Training Configuration\n",
        "class TrainingConfig:\n",
        "    # LoRA parameters\n",
        "    LORA_R = 16\n",
        "    LORA_ALPHA = 32\n",
        "    LORA_DROPOUT = 0.05\n",
        "    LORA_TARGET_MODULES = [\n",
        "        \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n",
        "        \"gate_proj\", \"up_proj\", \"down_proj\"\n",
        "    ]\n",
        "    \n",
        "    # Training hyperparameters\n",
        "    LEARNING_RATE = 2e-4\n",
        "    WEIGHT_DECAY = 0.01\n",
        "    MAX_GRAD_NORM = 1.0\n",
        "    MAX_SEQ_LENGTH = 512\n",
        "    BATCH_SIZE = 1\n",
        "    \n",
        "    # Distillation\n",
        "    DISTILLATION_ALPHA = 0.5\n",
        "    TEMPERATURE = 2.0\n",
        "\n",
        "class LogConfig:\n",
        "    LOG_STEPS = 10\n",
        "    SAVE_STEPS = 100\n",
        "\n",
        "class SpaceConfig:\n",
        "    MAX_TOTAL_USAGE = 50\n",
        "    CLEANUP_THRESHOLD = 80\n",
        "    MAX_CHECKPOINTS = 3\n",
        "\n",
        "print(f\"‚úì Config loaded\")\n",
        "print(f\"  Device: {DEVICE}\")\n",
        "print(f\"  Project: {PROJECT_ROOT}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "step6"
      },
      "source": [
        "## üöÄ Step 6: Start Training!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "train"
      },
      "outputs": [],
      "source": [
        "# Run Stage 2 training\n",
        "!python ml/training/stage2_context_teachers.py"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "monitor"
      },
      "source": [
        "## üìä Monitor Progress"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "check_gpu_usage"
      },
      "outputs": [],
      "source": [
        "# Check GPU usage\n",
        "!nvidia-smi"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "check_checkpoints"
      },
      "outputs": [],
      "source": [
        "# View checkpoints\n",
        "!ls -lh ml/checkpoints/stage2_context_teachers/"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "check_disk"
      },
      "outputs": [],
      "source": [
        "# Check disk usage\n",
        "!df -h | grep drive"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "download"
      },
      "source": [
        "## üíæ Download Final Model (Optional)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "download_model"
      },
      "outputs": [],
      "source": [
        "# Zip and download final model\n",
        "!cd ml/checkpoints/stage2_context_teachers && tar -czf final_model.tar.gz final_model/\n",
        "\n",
        "from google.colab import files\n",
        "files.download('ml/checkpoints/stage2_context_teachers/final_model.tar.gz')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "update"
      },
      "source": [
        "## üîÑ Update Code (If Needed)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "git_pull"
      },
      "outputs": [],
      "source": [
        "# Pull latest changes from GitHub\n",
        "%cd /content/drive/MyDrive/sarika-ai\n",
        "!git pull origin main"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "troubleshoot"
      },
      "source": [
        "## üõ†Ô∏è Troubleshooting\n",
        "\n",
        "### Out of Memory?\n",
        "```python\n",
        "# Reduce batch size in config\n",
        "# Edit ml/config.py: BATCH_SIZE = 1\n",
        "```\n",
        "\n",
        "### Session Timeout?\n",
        "```python\n",
        "# Training resumes from last checkpoint automatically\n",
        "# Just re-run the training cell\n",
        "```\n",
        "\n",
        "### Model Download Failed?\n",
        "```python\n",
        "# Check HuggingFace token\n",
        "# Make sure you accepted Llama license: https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct\n",
        "```"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": [],
      "machine_shape": "hm"
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
