# üéØ ResuMate ‚Äî Ollama LLM Server on Google Colab

This notebook runs **Ollama with LLaMA 3.2** on Google Colab's free GPU and exposes it via **ngrok** so your local ResuMate project can use it.

### Steps:
1. **Run Cell 1** ‚Äî Install Ollama & ngrok
2. **Run Cell 2** ‚Äî Start Ollama & pull the model
3. **Run Cell 3** ‚Äî Expose Ollama via ngrok (copy the URL!)
4. **Paste the ngrok URL** in your local `.env` file

### ‚ö†Ô∏è Before Starting:
- Go to **Runtime ‚Üí Change runtime type ‚Üí T4 GPU**
- Get a free ngrok auth token from https://dashboard.ngrok.com/signup

## Cell 1: Install Ollama & Dependencies

In [None]:
# Install zstd (required for Ollama extraction)
!apt-get update -y
!apt-get install -y zstd curl ca-certificates

# Install Ollama on Colab
!curl -fsSL https://ollama.com/install.sh | sh

# Install ngrok for exposing the server
!pip install pyngrok -q

print("\n‚úÖ Ollama and ngrok installed successfully!")

## Cell 2: Start Ollama Server & Pull Model

This starts the Ollama server in the background and downloads the LLaMA 3.2:3b model.

**First time will take 2-3 minutes** to download the model (~2GB).

In [None]:
import subprocess
import time
import shutil
import os

# Find ollama binary (handles PATH issues on Colab)
ollama_path = shutil.which("ollama") or "/usr/local/bin/ollama"

if not os.path.isfile(ollama_path):
    raise FileNotFoundError(
        "Ollama binary not found! Make sure you ran Cell 1 first.\n"
        "If you did, try: Runtime ‚Üí Restart runtime, then re-run Cell 1."
    )

print(f"‚úÖ Found Ollama at: {ollama_path}")

# Start Ollama server in background
process = subprocess.Popen(
    [ollama_path, "serve"],
    stdout=subprocess.DEVNULL,
    stderr=subprocess.DEVNULL
)
print("‚è≥ Starting Ollama server...")
time.sleep(5)

# Pull the model (change to llama3.1:8b if you want better quality)
MODEL = "llama3.2:3b"  # Options: llama3.2:3b, llama3.2:1b, llama3.1:8b, mistral:7b
print(f"\nüì• Pulling {MODEL}... (this takes 2-3 min on first run)")
subprocess.run([ollama_path, "pull", MODEL], check=True)

# Verify
print("\nüìã Installed models:")
subprocess.run([ollama_path, "list"], check=True)
print(f"\n‚úÖ {MODEL} is ready!")

## Cell 3: Expose Ollama via ngrok

### üîë Get your ngrok auth token:
1. Go to https://dashboard.ngrok.com/signup (free account)
2. Copy your auth token from https://dashboard.ngrok.com/get-started/your-authtoken
3. Paste it below where it says `YOUR_NGROK_AUTH_TOKEN`

### After running this cell:
- Copy the **Public URL** printed below
- Paste it in your local `model-server/.env` file as `OLLAMA_URL=<paste_here>`

In [None]:
from pyngrok import ngrok

# ‚¨áÔ∏è PASTE YOUR NGROK AUTH TOKEN HERE ‚¨áÔ∏è
NGROK_AUTH_TOKEN = "YOUR_NGROK_AUTH_TOKEN"  # <-- Replace this!

# Set auth token
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Kill any existing tunnels
ngrok.kill()

# Create tunnel to Ollama (port 11434)
public_url = ngrok.connect(11434)

print("="*60)
print("üöÄ OLLAMA IS LIVE ON GOOGLE COLAB!")
print("="*60)
print(f"\nüîó Your Public URL: {public_url}")
print(f"\nüìã Copy this and paste in your local .env file:")
print(f"   OLLAMA_URL={public_url}")
print(f"\n‚ö†Ô∏è  Keep this Colab tab OPEN ‚Äî it disconnects after ~90 min of inactivity")
print("="*60)

## Cell 4: Test the Connection (Optional)

Run this to verify Ollama is working correctly.

In [None]:
import requests
import json
import time

# Test locally first
print("üîç Testing Ollama locally...")
try:
    resp = requests.get("http://localhost:11434/api/tags", timeout=5)
    models = [m['name'] for m in resp.json().get('models', [])]
    print(f"‚úÖ Ollama running! Models: {models}")
except Exception as e:
    print(f"‚ùå Ollama not responding: {e}")

# Test a quick generation
print("\nüß™ Testing model generation...")
try:
    start = time.time()
    resp = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "llama3.2:3b",
            "prompt": "Say 'Hello ResuMate!' in one line.",
            "stream": False,
            "options": {"temperature": 0.1, "num_predict": 50}
        },
        timeout=60
    )
    elapsed = round(time.time() - start, 2)
    result = resp.json().get('response', '')
    print(f"‚úÖ Model response ({elapsed}s): {result.strip()}")
except Exception as e:
    print(f"‚ùå Generation failed: {e}")

# Test via ngrok (if tunnel is active)
print("\nüåê Testing via ngrok tunnel...")
try:
    from pyngrok import ngrok
    tunnels = ngrok.get_tunnels()
    if tunnels:
        tunnel_url = tunnels[0].public_url
        resp = requests.get(f"{tunnel_url}/api/tags", timeout=10)
        print(f"‚úÖ ngrok tunnel working! URL: {tunnel_url}")
    else:
        print("‚ö†Ô∏è  No ngrok tunnel found. Run Cell 3 first.")
except Exception as e:
    print(f"‚ùå ngrok test failed: {e}")

## Cell 5: Keep Alive (Run this to prevent disconnect)

Google Colab disconnects after ~90 minutes of inactivity. Run this cell to keep the session alive.

**Press the Stop button when you're done using ResuMate.**

In [None]:
import time
from IPython.display import clear_output

print("üîÑ Keep-alive running... (press Stop to end)")
print("   This prevents Colab from disconnecting.\n")

counter = 0
while True:
    counter += 1
    time.sleep(60)  # Ping every 60 seconds
    clear_output(wait=True)
    print(f"üîÑ Keep-alive running... ({counter} min elapsed)")
    print(f"   Session is active. Press Stop button to end.")