# Run prompts on Ollama from Kaggle
    
This notebook shows **two ways** to use Ollama from Kaggle:
    
**A) Connect to a _remote_ Ollama endpoint (recommended).**  
Point the code to an Ollama server you control (e.g., your workstation with a tunnel, a cloud VM behind HTTPS+auth, etc.). Kaggle can then call it with `requests`.

**B) (Experimental) Run Ollama _inside Kaggle_.**  
Install Ollama, start the server in the notebook, and run a **small** model (e.g., `llama3.1:8b` or `phi3:mini`). Downloading a 20B model is likely too large/slow for Kaggle session limits.
    
> **Note:** Kaggle notebooks run in an isolated VM. They **cannot reach your local machine** unless you expose it (e.g., via reverse proxy/tunnel). Also, make sure to enable **Internet** in the notebook settings when needed.


## A) Connect to a remote Ollama endpoint (recommended)

1. Expose your local or cloud Ollama at a secure URL (e.g., behind nginx + basic auth or a tunnel like Cloudflare Tunnel/ngrok).  
2. Set the base URL and model name below.  
3. Run the cell to send prompts.

> If you're exposing your **local** machine: start Ollama on your box and publish `http://localhost:11434` through a tunnel to a public **HTTPS** endpoint.


In [None]:
import os
import requests
from typing import Optional

# --- CONFIG ---
# Point to your REMOTE Ollama endpoint (HTTPS strongly recommended)
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL", "https://your-ollama.example.com")  # e.g., from a tunnel or reverse proxy
OLLAMA_API = f"{OLLAMA_BASE_URL}/api/generate"
MODEL = os.getenv("OLLAMA_MODEL", "gpt-oss:20b")  # or any model available on your server

# Optional: basic auth or bearer token, if you secured your endpoint
BASIC_AUTH = None  # e.g., ('user', 'pass')
BEARER_TOKEN: Optional[str] = None

def run_prompt_remote(prompt: str, model: str = MODEL, stream: bool = False) -> str:
    headers = {}
    if BEARER_TOKEN:
        headers["Authorization"] = f"Bearer {BEARER_TOKEN}"
    payload = {"model": model, "prompt": prompt, "stream": stream}
    r = requests.post(OLLAMA_API, json=payload, headers=headers, auth=BASIC_AUTH, timeout=120)
    r.raise_for_status()
    if stream:
        # If your endpoint streams, you'll need to iterate over lines/chunks here.
        # For simplicity we assume non-streaming for now.
        raise NotImplementedError("Streaming parse not implemented in this snippet.")
    return r.json()["response"]

# Example usage:
resp = run_prompt_remote("Give me 3 fun facts about llamas.")
print(resp[:1000])  # print first 1000 chars


## B) (Experimental) Run Ollama inside Kaggle

This may or may not work smoothly depending on session limits. **Use a small model** to avoid long downloads and disk/memory issues.

### Steps
1. Enable **Internet** for the notebook.
2. Run the install cell.
3. Start the Ollama server in the background.
4. Pull a small model (e.g., `llama3.1:8b`).
5. Call the local API at `http://127.0.0.1:11434`.

> Tip: If the server isn't ready yet, add a short sleep/retry loop before calling the API.


In [None]:
# 1) Install Ollama (requires Internet). If this fails, re-run after enabling Internet in settings.
# You may be running as root in Kaggle; sudo may not be necessary.
!curl -fsSL https://ollama.com/install.sh | sh || echo "Install script failed — check Internet setting."


In [None]:
# 2) Start the Ollama server in the background for this session
import subprocess, time, os, signal

# Start server
server = subprocess.Popen(["ollama", "serve"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True)

# Give it a moment to come up
time.sleep(3)

# Optional: print a few lines of server logs (non-blocking peek)
for _ in range(5):
    line = server.stdout.readline().strip()
    if not line:
        break
    print(line)

print("Ollama server started (attempted).")


In [None]:
# 3) Pull a SMALL model to keep things quick. (20B models are usually impractical in Kaggle sessions.)
# Choose ONE of the below:
!ollama pull llama3.1:8b || echo "Pull failed — check disk/Internet."
# !ollama pull phi3:mini || echo "Pull failed — check disk/Internet."


In [None]:
# 4) Call the local endpoint from Python
import requests, time

LOCAL_API = "http://127.0.0.1:11434/api/generate"

def run_prompt_local(prompt: str, model: str = "llama3.1:8b", stream: bool = False) -> str:
    payload = {"model": model, "prompt": prompt, "stream": stream}
    for attempt in range(10):
        try:
            r = requests.post(LOCAL_API, json=payload, timeout=120)
            r.raise_for_status()
            return r.json()["response"]
        except Exception as e:
            print(f"Attempt {attempt+1} failed: {e}. Retrying in 3s...")
            time.sleep(3)
    raise RuntimeError("Could not reach local Ollama server. Check logs above.")

print(run_prompt_local("Summarize Kaggle in one paragraph."))

## Notes & Tips

- **Why remote is recommended:** Kaggle VMs are ephemeral and have strict time/disk constraints. Hosting Ollama elsewhere ensures faster startup and predictable performance.
- **Securing your endpoint:** Put Ollama behind a reverse proxy (nginx/Caddy) with HTTPS and auth; or use a tunnel (Cloudflare Tunnel) with access policies.
- **Using your 20B model:** If you must use `gpt-oss:20b`, host it **remotely** and call it from Kaggle (Path A).
- **Streaming:** If your endpoint streams, adapt the request to `stream=True` and iterate over `r.iter_lines()` (server must support streaming).