# Ollama AI Server for AIoT E-Learning Platform

This notebook sets up an Ollama server on Google Colab (Free T4 GPU) and exposes it via Ngrok tunnel.

**Models:**
- `deepseek-coder:1.3b` - Fast code autocomplete (FIM)
- `codellama:13b-instruct` - Chat, code generation, review

**Usage:**
1. Run all cells in order
2. Copy the Ngrok URL printed at the end
3. Add it to your `.env.local` as `OLLAMA_BASE_URL`

**Important:** Colab sessions expire after ~12 hours. Re-run when expired.

**Chạy 24/7 miễn phí?** Xem [RUN-OLLAMA-24-7-FREE.md](./RUN-OLLAMA-24-7-FREE.md) – chạy Ollama trên PC hoặc Oracle Cloud Free Tier.

## Step 1: Check GPU Availability

In [None]:
# Verify GPU is available
!nvidia-smi
print("\n" + "="*60)
print("GPU check complete. Make sure T4 GPU is assigned.")
print("If no GPU shown, go to Runtime > Change runtime type > T4 GPU")
print("="*60)

## Step 2: Install Ollama

In [None]:
# Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh
print("\n" + "="*60)
print("Ollama installed successfully!")
print("="*60)

## Step 3: Start Ollama Server

In [None]:
import subprocess
import time
import requests

# Start Ollama server in background
ollama_process = subprocess.Popen(
    ["ollama", "serve"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    env={**__import__('os').environ, "OLLAMA_HOST": "0.0.0.0:11434"}
)

# Wait for server to start
print("Starting Ollama server...")
for i in range(30):
    try:
        r = requests.get("http://localhost:11434/api/tags", timeout=2)
        if r.status_code == 200:
            print(f"Ollama server is running! (took {i+1}s)")
            break
    except:
        time.sleep(1)
else:
    print("WARNING: Ollama server may not have started properly.")
    print("Check stderr:", ollama_process.stderr.read().decode()[:500])

## Step 4: Pull AI Models

This will download the models to the Colab VM. Takes ~2-5 minutes.

In [None]:
# Pull deepseek-coder:1.3b for fast autocomplete
print("Pulling deepseek-coder:1.3b (autocomplete model)...")
print("This model is ~776MB, should take 1-2 minutes.")
print("="*60)
!ollama pull deepseek-coder:1.3b
print("\ndeepseek-coder:1.3b ready!")

In [None]:
# Pull codellama:13b-instruct for chat/generation
print("Pulling codellama:13b-instruct (chat model)...")
print("This model is ~7.4GB, should take 3-5 minutes.")
print("="*60)
!ollama pull codellama:13b-instruct
print("\ncodellama:13b-instruct ready!")

In [None]:
# Verify models are available
print("\nInstalled models:")
print("="*60)
!ollama list

# Quick test
import requests
r = requests.get("http://localhost:11434/api/tags")
models = [m['name'] for m in r.json().get('models', [])]
print(f"\nAPI reports {len(models)} models: {models}")

required = ['deepseek-coder:1.3b', 'codellama:13b-instruct']
for model in required:
    found = any(model in m for m in models)
    status = 'OK' if found else 'MISSING'
    print(f"  {model}: {status}")

## Step 5: Setup Ngrok Tunnel

**First time setup:**
1. Sign up at https://ngrok.com (free)
2. Get your auth token from https://dashboard.ngrok.com/get-started/your-authtoken
3. Paste it below

In [None]:
# Install pyngrok
!pip install pyngrok -q
print("pyngrok installed!")

In [None]:
from pyngrok import ngrok, conf
import getpass

# Set your Ngrok auth token
# Option 1: Paste directly (less secure)
# NGROK_AUTH_TOKEN = "your-token-here"

# Option 2: Input securely (recommended)
NGROK_AUTH_TOKEN = getpass.getpass("Enter your Ngrok auth token: ")

# Configure and authenticate
ngrok.set_auth_token(NGROK_AUTH_TOKEN)
print("Ngrok authenticated!")

In [None]:
from pyngrok import ngrok
import requests

# Kill any existing tunnels
ngrok.kill()

# Create tunnel to Ollama server
public_url = ngrok.connect(11434, "http")
OLLAMA_URL = public_url.public_url

print("\n" + "="*60)
print("NGROK TUNNEL ACTIVE")
print("="*60)
print(f"\nPublic URL: {OLLAMA_URL}")
print(f"\nAdd this to your .env.local:")
print(f"\n  OLLAMA_BASE_URL={OLLAMA_URL}")
print(f"  OLLAMA_COMPLETION_MODEL=deepseek-coder:1.3b")
print(f"  OLLAMA_CHAT_MODEL=codellama:13b-instruct")
print("\n" + "="*60)

# Verify tunnel works
try:
    # Ngrok free tier adds a warning page, use header to bypass
    r = requests.get(
        f"{OLLAMA_URL}/api/tags",
        headers={"ngrok-skip-browser-warning": "true"},
        timeout=10
    )
    print(f"\nTunnel verification: {r.status_code} OK")
    models = [m['name'] for m in r.json().get('models', [])]
    print(f"Models accessible via tunnel: {models}")
except Exception as e:
    print(f"\nTunnel verification failed: {e}")
    print("The tunnel may still work - try accessing the URL in your browser.")

## Step 6: Quick Model Test

In [None]:
import requests
import json

# Test deepseek-coder autocomplete (FIM)
print("Testing deepseek-coder:1.3b (autocomplete)...")
print("-"*40)

fim_prompt = "<\uff5cfim\u2581begin\uff5c>def fibonacci(n):\n    if n <= 1:\n        return n\n    <\uff5cfim\u2581hole\uff5c>\n\nprint(fibonacci(10))<\uff5cfim\u2581end\uff5c>"

r = requests.post("http://localhost:11434/api/generate", json={
    "model": "deepseek-coder:1.3b",
    "prompt": fim_prompt,
    "stream": False,
    "options": {
        "temperature": 0.2,
        "num_predict": 64,
        "stop": ["\n\n", "<\uff5cfim\u2581begin\uff5c>", "<\uff5cfim\u2581hole\uff5c>", "<\uff5cfim\u2581end\uff5c>"]
    }
})
result = r.json()
print(f"Completion: {result.get('response', 'ERROR')}")
print(f"Time: {result.get('total_duration', 0) / 1e9:.2f}s")

print("\n")

# Test codellama chat
print("Testing codellama:13b-instruct (chat)...")
print("-"*40)

r = requests.post("http://localhost:11434/api/chat", json={
    "model": "codellama:13b-instruct",
    "messages": [
        {"role": "user", "content": "Write a Python function to check if a string is a palindrome. Be concise."}
    ],
    "stream": False,
    "options": {
        "temperature": 0.3,
        "num_predict": 256
    }
})
result = r.json()
print(f"Response: {result.get('message', {}).get('content', 'ERROR')[:500]}")
print(f"Time: {result.get('total_duration', 0) / 1e9:.2f}s")

print("\n" + "="*60)
print("All tests complete! Server is ready.")
print("="*60)

## Step 7: Keep Server Alive

Run this cell to keep the server running. It will:
- Ping the server every 60 seconds
- Auto-restart Ollama if it crashes
- Auto-reconnect Ngrok if tunnel drops

**Keep this cell running!** Stopping it won't kill the server, but the health monitoring will stop.

In [None]:
import time
import subprocess
import requests
from datetime import datetime
from pyngrok import ngrok

def check_ollama_health():
    """Check if Ollama server is responding."""
    try:
        r = requests.get("http://localhost:11434/api/tags", timeout=5)
        return r.status_code == 200
    except:
        return False

def restart_ollama():
    """Restart Ollama server."""
    print(f"[{datetime.now().strftime('%H:%M:%S')}] Restarting Ollama...")
    subprocess.run(["pkill", "-f", "ollama serve"], capture_output=True)
    time.sleep(2)
    subprocess.Popen(
        ["ollama", "serve"],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        env={**__import__('os').environ, "OLLAMA_HOST": "0.0.0.0:11434"}
    )
    # Wait for server
    for i in range(15):
        if check_ollama_health():
            print(f"  Ollama restarted successfully! (took {i+1}s)")
            return True
        time.sleep(1)
    print("  WARNING: Ollama restart may have failed.")
    return False

def check_ngrok_tunnel():
    """Check if Ngrok tunnel is still active."""
    tunnels = ngrok.get_tunnels()
    return len(tunnels) > 0

# Main keep-alive loop
print("="*60)
print("KEEP-ALIVE MONITOR STARTED")
print(f"Ngrok URL: {OLLAMA_URL}")
print("="*60)
print("\nMonitoring server health every 60 seconds...")
print("Press Stop (square button) to end monitoring.\n")

consecutive_failures = 0
check_count = 0

while True:
    try:
        check_count += 1
        now = datetime.now().strftime('%H:%M:%S')

        # Check Ollama health
        ollama_ok = check_ollama_health()
        if not ollama_ok:
            consecutive_failures += 1
            print(f"[{now}] Ollama DOWN (failure #{consecutive_failures})")
            if consecutive_failures >= 2:
                restart_ollama()
                consecutive_failures = 0
        else:
            consecutive_failures = 0

        # Check Ngrok tunnel
        ngrok_ok = check_ngrok_tunnel()
        if not ngrok_ok:
            print(f"[{now}] Ngrok tunnel lost! Reconnecting...")
            try:
                public_url = ngrok.connect(11434, "http")
                OLLAMA_URL = public_url.public_url
                print(f"  New URL: {OLLAMA_URL}")
                print(f"  Update OLLAMA_BASE_URL in .env.local!")
            except Exception as e:
                print(f"  Ngrok reconnect failed: {e}")

        # Status report every 10 checks
        if check_count % 10 == 0:
            status = "OK" if ollama_ok else "DOWN"
            tunnel = "OK" if ngrok_ok else "DOWN"
            print(f"[{now}] Status report #{check_count}: Ollama={status}, Ngrok={tunnel}, URL={OLLAMA_URL}")

        time.sleep(60)

    except KeyboardInterrupt:
        print("\nMonitoring stopped by user.")
        print("Server is still running. Re-run this cell to resume monitoring.")
        break
    except Exception as e:
        print(f"[{datetime.now().strftime('%H:%M:%S')}] Monitor error: {e}")
        time.sleep(30)