# GPT-OSS-20B on Google Colab for Chartelier

This notebook demonstrates running Chartelier with GPT-OSS-20B on Google Colab using an A100 GPU.

## Prerequisites
- Google Colab Pro+ account (for A100 access)
- GPU runtime enabled (Runtime -> Change runtime type -> A100 GPU)

## Step 1: Clone Repository and Setup Environment

In [None]:
# Clone the Chartelier repository
!git clone https://github.com/sog4be/chartelier.git
%cd chartelier

# Check current branch (should be feature/gpt-oss-20b-colab-support)
!git checkout feature/gpt-oss-20b-colab-support

In [None]:
# Run the setup script
!python colab/setup_gpt_oss.py

## Step 2: Start vLLM Server (Run in Background)

**Important**: This cell will keep running. Start it and then proceed to the next cells while it runs.

In [None]:
# Start vLLM server in the background
# This will download the model (~14GB) on first run
import subprocess
import time

# Start server in background
server_process = subprocess.Popen(
    ["python", "colab/start_vllm_server.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True
)

print("🚀 Starting vLLM server...")
print("⏳ This may take 2-5 minutes on first run while downloading the model")

# Wait a bit for server to start
time.sleep(10)

# Check if server is starting
import requests

max_attempts = 60  # 5 minutes max
attempt = 0

while attempt < max_attempts:
    try:
        response = requests.get("http://localhost:8000/health", timeout=5)
        if response.status_code == 200:
            print("\n✅ vLLM server is ready!")
            break
    except:
        pass

    if attempt % 6 == 0:  # Print status every 30 seconds
        print(f"⏳ Waiting for server... ({attempt * 5}s elapsed)")

    time.sleep(5)
    attempt += 1

if attempt >= max_attempts:
    print("❌ Server failed to start. Check the logs above.")
else:
    # Check if model is loaded
    try:
        response = requests.get("http://localhost:8000/v1/models", timeout=5)
        if response.status_code == 200:
            models = response.json().get("data", [])
            if models:
                print(f"✅ Model loaded: {models[0]['id']}")
                print("\n🎉 You can now run the test in the next cell!")
    except:
        print("⚠️ Could not verify model loading")

## Step 3: Test the Setup

First, let's verify the server is working with a simple test:

In [None]:
# Quick test of the vLLM server
import json

import requests

# Test the OpenAI-compatible endpoint
url = "http://localhost:8000/v1/chat/completions"
headers = {"Content-Type": "application/json"}
data = {
    "model": "openai/gpt-oss-20b",
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is 2+2?"},
    ],
    "temperature": 0.0,
    "max_tokens": 50,
}

try:
    response = requests.post(url, headers=headers, json=data, timeout=30)
    if response.status_code == 200:
        result = response.json()
        print("✅ Server test successful!")
        print(f"Response: {result['choices'][0]['message']['content']}")
    else:
        print(f"❌ Server returned error: {response.status_code}")
        print(response.text)
except Exception as e:
    print(f"❌ Failed to connect to server: {e}")

## Step 4: Run Chartelier E2E Test

In [None]:
# Update the test script to use the Colab environment
import os

# Set environment variables for Chartelier
os.environ["CHARTELIER_LLM_MODEL"] = "openai/gpt-oss-20b"
os.environ["CHARTELIER_LLM_API_BASE"] = "http://localhost:8000/v1"
os.environ["CHARTELIER_LLM_API_KEY"] = "dummy"  # vLLM doesn't need a real key for local
os.environ["CHARTELIER_LLM_TIMEOUT"] = "30"

print("Environment configured:")
print(f"  Model: {os.environ['CHARTELIER_LLM_MODEL']}")
print(f"  API Base: {os.environ['CHARTELIER_LLM_API_BASE']}")
print(f"  Timeout: {os.environ['CHARTELIER_LLM_TIMEOUT']}s")

In [None]:
# Run the E2E test
!python temp/test_e2e.py

## Step 5: View Generated Chart

If the test was successful, display the generated chart:

In [None]:
# Display the generated SVG chart
import os

from IPython.display import SVG, display

output_path = "temp/output.svg"
if os.path.exists(output_path):
    print("📊 Generated Chart:")
    display(SVG(filename=output_path))
else:
    print("❌ No output file found. The test may have failed.")

## Optional: Custom Visualization Test

Try creating your own visualization:

In [None]:
# Custom visualization test
import sys

sys.path.insert(0, "src")

from chartelier.interfaces.mcp.handler import MCPHandler
from chartelier.interfaces.mcp.protocol import JSONRPCRequest, MCPMethod

# Sample data - different from the default test
custom_data = """date,temperature,city
2024-01-01,5,Tokyo
2024-01-02,7,Tokyo
2024-01-03,6,Tokyo
2024-01-04,8,Tokyo
2024-01-01,10,Osaka
2024-01-02,12,Osaka
2024-01-03,11,Osaka
2024-01-04,13,Osaka"""

# Create request
handler = MCPHandler()
request = JSONRPCRequest(
    id=2,
    method=MCPMethod.TOOLS_CALL,
    params={
        "name": "chartelier_visualize",
        "arguments": {
            "data": custom_data,
            "query": "Compare daily temperature trends between Tokyo and Osaka",
            "options": {
                "format": "svg",
                "width": 800,
                "height": 600,
            },
        },
    },
)

print("🎨 Creating custom visualization...")
print("Query: 'Compare daily temperature trends between Tokyo and Osaka'")

try:
    response_str = handler.handle_message(json.dumps(request.model_dump()))
    response = json.loads(response_str)

    if response.get("result", {}).get("isError"):
        print("❌ Visualization failed")
        print(response["result"]["content"][0]["text"])
    else:
        print("✅ Visualization successful!")

        # Save and display the result
        if "content" in response["result"] and len(response["result"]["content"]) > 0:
            content = response["result"]["content"][0]
            if content["type"] == "image" and "svg" in content.get("mimeType", ""):
                svg_data = content["data"]

                # Save to file
                with open("temp/custom_output.svg", "w") as f:
                    f.write(svg_data)

                # Display
                from IPython.display import SVG, display

                display(SVG(data=svg_data))

                # Show metadata
                if "structuredContent" in response["result"]:
                    metadata = response["result"]["structuredContent"].get("metadata", {})
                    print("\n📊 Metadata:")
                    print(f"   Pattern: {metadata.get('pattern_id')}")
                    print(f"   Template: {metadata.get('template_id')}")
                    if metadata.get("processing_time_ms"):
                        print(f"   Processing time: {metadata['processing_time_ms']}ms")

except Exception as e:
    print(f"❌ Error: {e}")
    import traceback

    traceback.print_exc()

## Cleanup

Stop the vLLM server when done:

In [None]:
# Stop the vLLM server
try:
    server_process.terminate()
    server_process.wait(timeout=5)
    print("✅ vLLM server stopped")
except:
    print("Server process was not running or already stopped")

## Troubleshooting

### Common Issues:

1. **No GPU available**: Make sure you've selected GPU runtime (Runtime -> Change runtime type -> GPU)
2. **Out of memory**: The A100 40GB should be sufficient, but if you get OOM errors, try restarting the runtime
3. **Model download slow**: First run downloads ~14GB model. This is normal and will be cached for future runs
4. **Server not starting**: Check the server logs in the cell output for specific errors
5. **Connection refused**: Make sure the vLLM server cell is still running