# Evaluating Open Source Models with Ollama

This notebook shows how to evaluate open-source models locally with Ollama.

We'll use [Nemotron-3-Nano-30B-A3B](https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF) as an example - NVIDIA's 30B parameter model with only 3B active parameters.

## Install Ollama

**macOS:**

In [None]:
!curl -L -o /tmp/Ollama.zip "https://ollama.com/download/Ollama-darwin.zip" && \
    unzip -o /tmp/Ollama.zip -d /Applications/ && \
    rm /tmp/Ollama.zip

**Linux:** `curl -fsSL https://ollama.com/install.sh | sh`

## Start Ollama Server

In [None]:
!open /Applications/Ollama.app  # macOS
# Linux: ollama serve

In [None]:
# Verify server is running
import time

time.sleep(5)
!curl -s http://localhost:11434/api/tags | python3 -c "import sys,json; print('Ollama server is running!' if json.load(sys.stdin) else 'Server not ready')"

## Download GGUF Model

Choose a quantization based on your available RAM:

| Quantization | Size | RAM Required |
|-------------|------|-------|
| IQ4_XS | 18.2 GB | 16GB |
| Q4_K_M | 24.6 GB | 32GB |
| Q8_0 | 33.6 GB | 64GB |

In [None]:
import os

os.makedirs("/tmp/nemotron", exist_ok=True)

# Download IQ4_XS quantization (18.2 GB)
!curl -L -o /tmp/nemotron/Nemotron-3-Nano-30B-A3B-IQ4_XS.gguf \
    "https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF/resolve/main/Nemotron-3-Nano-30B-A3B-IQ4_XS.gguf"

## Create Modelfile and Register with Ollama

The Modelfile defines the model's chat template and stop tokens.

In [None]:
modelfile = '''FROM /tmp/nemotron/Nemotron-3-Nano-30B-A3B-IQ4_XS.gguf

TEMPLATE """{{- if .System }}{{ .System }}
{{- end }}
{{- range .Messages }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{ .Content }}<|eot_id|>
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{ .Content }}<|eot_id|>
{{- end }}
{{- end }}
<|start_header_id|>assistant<|end_header_id|>
"""

PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|end_of_text|>"
'''

with open("/tmp/nemotron/Modelfile", "w") as f:
    f.write(modelfile)

In [None]:
# Register model with Ollama
!ollama create nemotron-nano -f /tmp/nemotron/Modelfile

In [None]:
!ollama list

## Configure Environment for Inspect AI

Set the Ollama base URL so Inspect AI can connect to the local server.

In [None]:
import os

os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434/v1"

# Or add to your .env file:
# OLLAMA_BASE_URL=http://localhost:11434/v1

## Load Model into Memory

Pre-load the model to avoid timeout during evaluation.

In [None]:
# Warm up the model (keeps it loaded for 60 minutes)
!ollama run nemotron-nano --keepalive 60m "hi"

## Run Evaluation

Run all Open Telco benchmarks using Inspect AI with the `ollama/` model prefix.

In [None]:
MODEL = "ollama/nemotron-nano"
LIMIT = 1  # samples per benchmark for testing

In [None]:
!~/.local/bin/uv run inspect eval \
    src/open_telco/teleqna/teleqna.py \
    src/open_telco/telemath/telemath.py \
    src/open_telco/telelogs/telelogs.py \
    src/open_telco/three_gpp/three_gpp.py \
    --model {MODEL} \
    --limit {LIMIT}

## View Results

In [None]:
!~/.local/bin/uv run inspect view start --log-dir logs/