# 🚧 AU-Ggregates AI Server — Google Colab Deployment

**Pipeline:** Mistral-7B-Instruct-v0.2 (4-bit) → T5-text-to-SQL → Supabase

## Before you start:
1. **Runtime → Change runtime type → T4 GPU** (required!)
2. Get your **ngrok auth token** (free): https://dashboard.ngrok.com/get-started/your-authtoken
3. Get your **HuggingFace token** (free): https://huggingface.co/settings/tokens
4. Have your **Supabase URL + anon key** ready

## Steps:
Run each cell in order (Shift+Enter). The whole setup takes ~5 minutes.

| Cell | What it does | Time |
|------|-------------|------|
| 1 | Install Python packages | ~2 min |
| 2 | Set your secrets/keys | instant |
| 3 | Clone your GitHub repo | ~10 sec |
| 4 | Verify T4 GPU is active | instant |
| 5 | Login to HuggingFace | instant |
| 6 | Start server + ngrok tunnel | ~2 min (first load downloads Mistral ~4GB) |
| 7 | Test the AI with a query | ~30 sec |
| 8 | Health check | instant |

In [None]:
# ============================================================
# CELL 1: Install all dependencies
# ============================================================
# Core web framework
!pip install -q fastapi uvicorn[standard] pydantic python-multipart

# Database
!pip install -q "supabase>=2.0.0,<2.28.0" psycopg2-binary

# AI Models (Mistral 7B + T5)
!pip install -q torch transformers accelerate bitsandbytes sentencepiece

# HuggingFace login
!pip install -q huggingface_hub

# Utilities
!pip install -q python-dotenv requests loguru sentry-sdk python-json-logger
!pip install -q sqlparse apscheduler

# ngrok for public URL
!pip install -q pyngrok

print('\n\u2705 All dependencies installed!')

In [None]:
# ============================================================
# CELL 2: Set your secrets
# ============================================================
# Option A: Use Colab Secrets (recommended - click the key icon in left sidebar)
# Option B: Paste values directly below

import os

# --- Try Colab Secrets first, fall back to manual values ---
try:
    from google.colab import userdata
    SUPABASE_URL    = userdata.get('SUPABASE_URL')
    SUPABASE_KEY    = userdata.get('SUPABASE_KEY')
    NGROK_TOKEN     = userdata.get('NGROK_TOKEN')
    HF_TOKEN        = userdata.get('HF_TOKEN')
    GITHUB_TOKEN    = userdata.get('GITHUB_TOKEN')
    print('\u2705 Loaded secrets from Colab Secrets')
except Exception:
    # --- Option B: Paste your values here ---
    SUPABASE_URL    = 'https://osbgvhkrpnpptaduhwdy.supabase.co'
    SUPABASE_KEY    = 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6Im9zYmd2aGtycG5wcHRhZHVod2R5Iiwicm9sZSI6ImFub24iLCJpYXQiOjE3NjQ0ODc0NDMsImV4cCI6MjA4MDA2MzQ0M30.a3kXnOpaZi8a7uw_X5gkrCylMie3-Io6dG0VCGKsAKw'
    NGROK_TOKEN     = '3A7bwToSPflS2SZnDP02DJvan8r_2z85mYrYWQeKDVNXne1mz'
    HF_TOKEN        = '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'
    GITHUB_TOKEN    = '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'
    print('\u2705 Using manual secret values (make sure you filled them in!)')

# --- Set environment variables for the app ---
os.environ['SUPABASE_URL']         = SUPABASE_URL
os.environ['SUPABASE_KEY']         = SUPABASE_KEY
os.environ['MISTRAL_MODEL']        = 'mistralai/Mistral-7B-Instruct-v0.2'
os.environ['MISTRAL_QUANTIZATION'] = '4bit'
os.environ['T5_MODEL_PATH']        = 'cssupport/t5-small-awesome-text-to-sql'
os.environ['ALLOWED_TABLES']       = 'ai_documents,Project,conversations'
os.environ['API_PORT']             = '8000'
os.environ['API_HOST']             = '0.0.0.0'
os.environ['ENVIRONMENT']          = 'production'
os.environ['CORS_ALLOW_ALL']       = 'true'  # Allow ngrok URLs

# Validate
assert SUPABASE_URL and not SUPABASE_URL.startswith('YOUR'), '\u274c Fill in SUPABASE_URL!'
assert SUPABASE_KEY and not SUPABASE_KEY.startswith('YOUR'), '\u274c Fill in SUPABASE_KEY!'
assert NGROK_TOKEN and not NGROK_TOKEN.startswith('YOUR'),   '\u274c Fill in NGROK_TOKEN!'
assert HF_TOKEN and not HF_TOKEN.startswith('YOUR'),         '\u274c Fill in HF_TOKEN!'
assert GITHUB_TOKEN and not GITHUB_TOKEN.startswith('YOUR'), '\u274c Fill in GITHUB_TOKEN!'

print(f'Supabase URL: {SUPABASE_URL[:40]}...')
print('\u2705 All secrets validated!')

In [None]:
# ============================================================
# CELL 3: Clone your GitHub repo
# ============================================================
import os

# Build authenticated clone URL (private repo needs token)
GITHUB_REPO = f'https://{GITHUB_TOKEN}@github.com/espinajc2004-max/auggregates-ai-data-look-up.git'

REPO_DIR = '/content/auggregates-ai-data-look-up'

if not os.path.exists(REPO_DIR):
    !git clone {GITHUB_REPO} {REPO_DIR}
else:
    !cd {REPO_DIR} && git pull
    print(f'\u2705 Updated existing repo at {REPO_DIR}')

os.chdir(REPO_DIR)
print(f'Working directory: {os.getcwd()}')

# Verify key files exist
for f in ['app/main.py', 'app/services/mistral_service.py', 'app/config/prompt_templates.py']:
    assert os.path.exists(f), f'\u274c Missing file: {f}'
print('\u2705 All key files present!')

In [None]:
# ============================================================
# CELL 4: Verify GPU
# ============================================================
import torch

print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available:  {torch.cuda.is_available()}')

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    vram_gb = torch.cuda.get_device_properties(0).total_mem / 1024**3
    print(f'GPU:             {gpu_name}')
    print(f'VRAM:            {vram_gb:.1f} GB')
    print('\u2705 GPU ready! Mistral 4-bit needs ~5GB VRAM, T4 has 15GB. Good to go.')
else:
    print('\u274c No GPU detected!')
    print('Go to: Runtime \u2192 Change runtime type \u2192 T4 GPU')
    print('Then re-run all cells from the top.')

In [None]:
# ============================================================
# CELL 5: Login to HuggingFace
# ============================================================
# Needed to download Mistral-7B-Instruct-v0.2 from HuggingFace
# This model is NOT gated â€” no license agreement needed!

from huggingface_hub import login
login(token=HF_TOKEN)
print('\u2705 HuggingFace login successful!')

In [None]:
# ============================================================
# CELL 6: Start FastAPI server + ngrok tunnel
# ============================================================
import subprocess
import threading
import time
import requests as req
from pyngrok import ngrok, conf

# Make sure we're in the repo directory
import os
os.chdir('/content/auggregates-ai-data-look-up')

# Configure ngrok
conf.get_default().auth_token = NGROK_TOKEN

# Start FastAPI in a background thread
def run_server():
    subprocess.run([
        'python', '-m', 'uvicorn', 'app.main:app',
        '--host', '0.0.0.0',
        '--port', '8000',
        '--workers', '1',
        '--timeout-keep-alive', '120'
    ])

server_thread = threading.Thread(target=run_server, daemon=True)
server_thread.start()

# Wait for server to be ready (with retry)
print('\u23f3 Starting FastAPI server...')
print('   (First run downloads Mistral ~4GB + T5 ~250MB, may take a few minutes)')

server_ready = False
for i in range(60):  # Wait up to 5 minutes
    try:
        r = req.get('http://localhost:8000/api/health', timeout=3)
        if r.status_code == 200:
            server_ready = True
            break
    except:
        pass
    if i % 6 == 0 and i > 0:
        print(f'   Still loading... ({i*5}s elapsed)')
    time.sleep(5)

if not server_ready:
    print('\u26a0\ufe0f Server not responding yet, but starting ngrok anyway...')
    print('   The server may still be loading models. Try Cell 8 (health check) in a minute.')
else:
    print('\u2705 Server is running on port 8000')

# Open ngrok tunnel
tunnel = ngrok.connect(8000)
public_url = tunnel.public_url

print()
print('=' * 60)
print(f'\ud83d\ude80 SERVER IS LIVE!')
print(f'\ud83c\udf10 Public URL:     {public_url}')
print(f'\ud83d\udcac Chat endpoint:  {public_url}/api/chat/hybrid')
print(f'\ud83c\udfe5 Health check:   {public_url}/api/health')
print('=' * 60)
print()
print('\u261d\ufe0f Copy the Public URL above and use it in your frontend!')
print('   Replace your localhost:8000 with this URL.')
print()
print('\u26a0\ufe0f  This URL changes every time you restart. For a stable URL,')
print('   upgrade to ngrok paid plan or use a custom domain.')

In [None]:
# ============================================================
# CELL 7: Test the AI with a sample query
# ============================================================
import requests as req
import json

print('\ud83e\uddea Sending test query: "pakita lahat ng expenses"')
print('   (First query may be slow ~30-60s while models warm up)\n')

try:
    response = req.post(
        f'{public_url}/api/chat/hybrid',
        json={'query': 'pakita lahat ng expenses'},
        timeout=180  # 3 min timeout for first query
    )

    print(f'Status: {response.status_code}')
    data = response.json()
    print(f'Pipeline: {data.get("metadata", {}).get("pipeline", "unknown")}')
    print(f'Message: {data.get("message", "")}')
    print(f'Results: {data.get("metadata", {}).get("row_count", 0)} rows')
    
    if data.get('metadata', {}).get('pipeline') == 'mistral+t5':
        print('\n\u2705 Full AI pipeline is working! (Mistral + T5)')
    elif data.get('metadata', {}).get('pipeline') == 'rule-based':
        print('\n\u26a0\ufe0f Using rule-based fallback (Mistral may still be loading)')
        print('   Wait 1-2 minutes and try again.')
    
    print(f'\nFull response:\n{json.dumps(data, indent=2, ensure_ascii=False)}')

except req.exceptions.Timeout:
    print('\u23f0 Request timed out. Models may still be loading.')
    print('   Wait 2 minutes and re-run this cell.')
except Exception as e:
    print(f'\u274c Error: {e}')
    print('   Make sure Cell 6 completed successfully.')

In [None]:
# ============================================================
# CELL 8: Health check
# ============================================================
import requests as req

try:
    r = req.get(f'{public_url}/api/health', timeout=10)
    print(f'Status: {r.status_code}')
    print(r.json())
    print('\n\u2705 Server is healthy!')
except Exception as e:
    print(f'\u274c Health check failed: {e}')

---
## 📝 Notes

**Keeping the server alive:**
- Colab disconnects after ~90 min of inactivity (free tier)
- Keep this tab open and active to prevent disconnection
- If disconnected, just re-run all cells from Cell 1

**Connecting your frontend:**
- Use the ngrok Public URL from Cell 6 as your API base URL
- Example: `fetch('https://xxxx.ngrok-free.app/api/chat/hybrid', { method: 'POST', ... })`
- The URL changes every restart — update your frontend config each time

**Performance:**
- First query after startup: ~30-60 seconds (model warmup)
- Subsequent queries: ~5-15 seconds
- T4 GPU has 15GB VRAM — Mistral 4-bit uses ~5GB, plenty of headroom

**Troubleshooting:**
- `CUDA out of memory` → Runtime → Restart runtime, then re-run all cells
- `ngrok error` → Check your auth token at https://dashboard.ngrok.com
- `rule-based fallback` → Models still loading, wait 1-2 min and retry