## Step 1: Deploy Gemma 3

Choose your model size based on your needs:

```bash
# Fast iteration, good quality
make deploy-gemma3-4b

# Better quality, still reasonable speed
make deploy-gemma3-12b

# Best quality, slower and more expensive
make deploy-gemma3-27b
```

## 1. Configure Your Project

Set your GCP project ID and Hugging Face token below:

In [None]:
import os

# Configure your GCP project
PROJECT_ID = "your-gcp-project-id"  # <-- CHANGE THIS

# Hugging Face token (required for Gemma - gated model)
# Get yours at: https://huggingface.co/settings/tokens
# Then accept Gemma terms at: https://huggingface.co/google/gemma-3-4b-it
HF_TOKEN = "hf_..."  # <-- CHANGE THIS

# Optional: Override region (default: us-central1)
REGION = "us-central1"

# Choose model size: gemma3-4b, gemma3-12b, or gemma3-27b
MODEL_PRESET = "gemma3-4b"

os.environ["TF_VAR_project_id"] = PROJECT_ID
os.environ["TF_VAR_region"] = REGION
os.environ["TF_VAR_hf_token"] = HF_TOKEN

print(f"Project: {PROJECT_ID}")
print(f"Region:  {REGION}")
print(f"Model:   {MODEL_PRESET}")
print(f"HF Token: {'Set' if HF_TOKEN.startswith('hf_') else 'Not set'}")

## 2. Deploy Infrastructure

Initialize Terraform and deploy the Gemma preset:

In [None]:
%%bash
cd ../infrastructure/gcp

# Initialize Terraform (only needed once)
terraform init

In [None]:
%%bash -s "$MODEL_PRESET"
cd ../infrastructure/gcp

# Deploy using selected Gemma preset
terraform apply -var-file=presets/$1.tfvars -auto-approve

## 3. Get API Endpoint

Retrieve the vLLM server URL:

In [None]:
import json
import subprocess

# Get Terraform outputs
result = subprocess.run(
    ["terraform", "output", "-json"],
    check=False,
    cwd="../infrastructure/gcp",
    capture_output=True,
    text=True,
)
outputs = json.loads(result.stdout)

API_ENDPOINT = outputs["api_endpoint"]["value"]
print(f"API Endpoint: {API_ENDPOINT}")

## 4. Wait for Server Ready

The vLLM server takes ~3-5 minutes to download and load the model. Run this cell to wait:

In [None]:
import time

import requests


def wait_for_server(endpoint: str, timeout: int = 600) -> bool:
    """Wait for vLLM server to become ready."""
    health_url = f"{endpoint}/health"
    start = time.time()

    print(f"Waiting for server at {health_url}...")
    while time.time() - start < timeout:
        try:
            resp = requests.get(health_url, timeout=5)
            if resp.status_code == 200:
                print(f"\nServer ready! ({int(time.time() - start)}s)")
                return True
        except requests.RequestException:
            pass
        print(".", end="", flush=True)
        time.sleep(10)

    print(f"\nTimeout after {timeout}s")
    return False


wait_for_server(API_ENDPOINT)

## 5. Run Inference with kanoa

Now let's use the kanoa library to interact with Gemma 3:

In [None]:
from kanoa.backends.vllm import VLLMBackend

# Model names by preset
MODEL_NAMES = {
    "gemma3-4b": "google/gemma-3-4b-it",
    "gemma3-12b": "google/gemma-3-12b-it",
    "gemma3-27b": "google/gemma-3-27b-it",
}

# Initialize the backend
backend = VLLMBackend(api_base=API_ENDPOINT, model=MODEL_NAMES[MODEL_PRESET])

print(f"Connected to: {backend.model}")

In [None]:
# Simple text query
response = backend.generate(
    prompt="What is machine learning? Explain in 2 sentences.", max_tokens=100
)
print(response)

In [None]:
# Code generation
response = backend.generate(
    prompt="Write a Python function to calculate the Fibonacci sequence up to n terms.",
    max_tokens=300,
)
print(response)

In [None]:
# Reasoning task
response = backend.generate(
    prompt="""Solve this step by step:
A train travels from City A to City B at 60 mph.
The return trip is at 40 mph.
What is the average speed for the entire round trip?""",
    max_tokens=300,
)
print(response)

## 6. Cost Tracking

| Model | GPU | Cost/Hour |
|-------|-----|----------|
| gemma3-4b | L4 (24GB) | ~$0.70 |
| gemma3-12b | L4 (24GB) | ~$0.70 |
| gemma3-27b | A100 (80GB) | ~$3.00 |

The server has a 30-minute idle timeout - if no requests are made, it will shut down automatically.

In [None]:
# Check instance status
!gcloud compute instances describe vllm-server --zone={REGION}-a --format="value(status)" 2>/dev/null || echo "Instance not found"

## 7. Cleanup

**Important**: Destroy the infrastructure when done to avoid charges!

In [None]:
%%bash -s "$MODEL_PRESET"
cd ../infrastructure/gcp

# Destroy all resources
terraform destroy -var-file=presets/$1.tfvars -auto-approve

---

## Next Steps

- Try **Molmo 7B** for multimodal tasks: [quickstart-molmo.ipynb](./quickstart-molmo.ipynb)
- See [infrastructure/gcp/README.md](../infrastructure/gcp/README.md) for more configuration options
- Check [docs/User_Setup_Guide.md](../docs/User_Setup_Guide.md) for detailed setup instructions

In [None]:
# Check deployment status
!cd /home/lhzn/Projects/lhzn-io/kanoa-mlops && make status

## Step 2: Configure Connection

In [None]:
# UPDATE THESE after deployment!
VLLM_API_BASE = "http://YOUR_EXTERNAL_IP:8000/v1"

# Choose the model you deployed:
MODEL_NAME = "google/gemma-3-4b-it"  # For deploy-gemma3-4b
# MODEL_NAME = "google/gemma-3-12b-it"   # For deploy-gemma3-12b
# MODEL_NAME = "google/gemma-3-27b-it"   # For deploy-gemma3-27b

In [None]:
# Test connection

health_url = VLLM_API_BASE.replace("/v1", "/health")
try:
    response = requests.get(health_url, timeout=5)
    print(
        "✓ vLLM server is healthy!"
        if response.ok
        else f"⚠ Status: {response.status_code}"
    )
except Exception as e:
    print(f"✗ Connection failed: {e}")

## Step 3: Create Sample Data & Visualization

Let's create a more complex business scenario.

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Generate realistic business metrics
np.random.seed(123)
dates = pd.date_range("2024-01-01", periods=90, freq="D")

# Simulate user engagement metrics
base_users = 10000
daily_active_users = (
    base_users
    + np.cumsum(np.random.randn(90) * 100)
    + np.sin(np.arange(90) * 2 * np.pi / 7) * 500
)
session_duration = 5 + np.random.randn(90) * 0.5 + np.linspace(0, 1, 90)  # Trending up
conversion_rate = 0.03 + np.random.randn(90) * 0.005

# Create DataFrame
df = pd.DataFrame(
    {
        "date": dates,
        "daily_active_users": daily_active_users.astype(int),
        "avg_session_minutes": session_duration,
        "conversion_rate": conversion_rate.clip(0, 0.1),
    }
)

df.head()

In [None]:
# Create a comprehensive dashboard
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Panel 1: DAU with 7-day rolling average
ax1 = axes[0, 0]
ax1.plot(df["date"], df["daily_active_users"], alpha=0.4, label="Daily")
ax1.plot(
    df["date"],
    df["daily_active_users"].rolling(7).mean(),
    linewidth=2,
    label="7-day avg",
)
ax1.set_title("Daily Active Users")
ax1.set_ylabel("Users")
ax1.legend()
ax1.tick_params(axis="x", rotation=45)

# Panel 2: Session duration trend
ax2 = axes[0, 1]
ax2.fill_between(df["date"], df["avg_session_minutes"], alpha=0.3)
ax2.plot(
    df["date"], df["avg_session_minutes"].rolling(7).mean(), linewidth=2, color="green"
)
ax2.set_title("Avg Session Duration (minutes)")
ax2.set_ylabel("Minutes")
ax2.tick_params(axis="x", rotation=45)

# Panel 3: Conversion rate
ax3 = axes[1, 0]
colors = [
    "green" if x > df["conversion_rate"].mean() else "red"
    for x in df["conversion_rate"]
]
ax3.bar(df["date"], df["conversion_rate"] * 100, color=colors, alpha=0.6, width=1)
ax3.axhline(
    df["conversion_rate"].mean() * 100,
    color="black",
    linestyle="--",
    label=f"Avg: {df['conversion_rate'].mean() * 100:.2f}%",
)
ax3.set_title("Conversion Rate (%)")
ax3.set_ylabel("Percent")
ax3.legend()
ax3.tick_params(axis="x", rotation=45)

# Panel 4: Correlation heatmap-style scatter
ax4 = axes[1, 1]
scatter = ax4.scatter(
    df["daily_active_users"],
    df["conversion_rate"] * 100,
    c=df["avg_session_minutes"],
    cmap="viridis",
    alpha=0.7,
)
plt.colorbar(scatter, ax=ax4, label="Session Duration (min)")
ax4.set_xlabel("Daily Active Users")
ax4.set_ylabel("Conversion Rate (%)")
ax4.set_title("DAU vs Conversion (colored by session length)")

plt.suptitle("Product Engagement Dashboard - Q1 2024", fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

## Step 4: Analyze with Gemma 3

In [None]:
from kanoa.backends import VLLMBackend

backend = VLLMBackend(
    api_base=VLLM_API_BASE,
    model=MODEL_NAME,
    temperature=0.7,
)

print(f"✓ Connected to {MODEL_NAME}")

In [None]:
# Comprehensive dashboard analysis
result = backend.interpret(
    fig=fig,
    data=df.describe(),  # Pass summary statistics
    context="Product engagement dashboard for a SaaS application. Q1 2024 data.",
    focus="Provide executive summary: key trends, concerns, and recommendations.",
    kb_context=None,
    custom_prompt=None,
)

print("Executive Summary")
print("=" * 60)
print(result.text)

## Step 5: Specialized Analysis Prompts

Gemma 3 excels at following specific instructions. Let's try different analysis angles.

In [None]:
# Statistical analysis focus
result = backend.interpret(
    fig=fig,
    data=df.describe(),
    context="Product metrics dashboard.",
    focus="Identify statistical patterns: seasonality, trends, correlations, outliers.",
    kb_context=None,
    custom_prompt=None,
)

print("Statistical Analysis")
print("-" * 40)
print(result.text)

In [None]:
# Custom prompt for specific format
custom_prompt = """
Analyze this dashboard and provide your response in this exact format:

## Key Metrics Summary
- DAU: [trend direction] from [start] to [end]
- Session Duration: [trend direction]
- Conversion: [assessment]

## Top 3 Insights
1. [Most important finding]
2. [Second finding]
3. [Third finding]

## Recommended Actions
1. [Immediate action]
2. [Short-term action]
3. [Long-term strategy]
"""

result = backend.interpret(
    fig=fig,
    data=None,
    context="SaaS product engagement metrics.",
    focus=None,
    kb_context=None,
    custom_prompt=custom_prompt,
)

print(result.text)

## Step 6: Compare with Different Model Sizes

If you want to compare outputs from different model sizes:

In [None]:
# Function to compare models (deploy each first)
def compare_models(fig, models_and_endpoints):
    """Compare interpretations from different models."""
    results = {}

    for name, (api_base, model_name) in models_and_endpoints.items():
        try:
            backend = VLLMBackend(api_base=api_base, model=model_name)
            result = backend.interpret(
                fig=fig,
                data=None,
                context="Product metrics dashboard.",
                focus="Summarize in 2-3 sentences.",
                kb_context=None,
                custom_prompt=None,
            )
            results[name] = result.text
        except Exception as e:
            results[name] = f"Error: {e}"

    return results


# Example usage (uncomment and modify endpoints):
# models = {
#     "Gemma 3 4B": ("http://IP1:8000/v1", "google/gemma-3-4b-it"),
#     "Gemma 3 12B": ("http://IP2:8000/v1", "google/gemma-3-12b-it"),
# }
# results = compare_models(fig, models)
# for name, text in results.items():
#     print(f"\n{name}:")
#     print(text)

## Step 7: Working with Real Data

Example of loading and visualizing your own data.

In [None]:
# Template for loading your own data
def analyze_csv(filepath, x_col, y_cols, title="My Data"):
    """Load CSV and create visualization for analysis."""
    # Load data
    df = pd.read_csv(filepath)

    # Create plot
    fig, ax = plt.subplots(figsize=(10, 6))
    for col in y_cols:
        ax.plot(df[x_col], df[col], label=col, marker="o")
    ax.set_title(title)
    ax.legend()
    ax.grid(True, alpha=0.3)
    plt.xticks(rotation=45)
    plt.tight_layout()

    # Get interpretation
    result = backend.interpret(
        fig=fig,
        data=df.describe(),
        context=f"Analysis of {title}",
        focus="Describe the key patterns and provide insights.",
        kb_context=None,
        custom_prompt=None,
    )

    plt.show()
    return result.text


# Example:
# analysis = analyze_csv("my_data.csv", x_col="date", y_cols=["revenue", "costs"], title="Financial Data")
# print(analysis)

## Cleanup

```bash
# Stop (keeps disk for quick restart)
make stop

# Or fully destroy (no more charges)
make destroy

# Destroy ALL deployments
make clean-infra
```

In [None]:
# Uncomment to destroy
# !cd /home/lhzn/Projects/lhzn-io/kanoa-mlops && make destroy

## Tips for Best Results with Gemma 3

1. **Be specific in `focus`**: "Identify outliers" works better than "analyze this"
2. **Use `custom_prompt` for formatting**: Gemma 3 follows structured prompts well
3. **Include `data` parameter**: Summary stats help ground the analysis
4. **Temperature**: Lower (0.3) for factual, higher (0.8) for creative interpretations

## Model Selection Guide

| Use Case | Recommended |
|----------|-------------|
| Quick iteration, prototyping | Gemma 3 4B |
| Production analysis | Gemma 3 12B |
| Complex reasoning, reports | Gemma 3 27B |
| Chart understanding | Molmo-7B |