# RAG Fundamentals - with Ollama

This notebook is designed for **VS Code** and uses **Haystack 2.x** with **Ollama** for local LLM + embeddings, and **Chroma** as the vector store.

**What you’ll do**
- Understand RAG and prompt engineering basics
- Run a local LLM via Ollama from Haystack
- Index small docs into Chroma and query them with embeddings
- Build a minimal RAG pipeline using `PromptBuilder` + `OllamaGenerator`
- Save your task‑specific prompt for next week’s LLM‑as‑judge evaluation


## Prerequisites

### Setting up Virtual Environment with UV

- Install **uv** python package manager using homebrew:
    ```bash
    brew install uv
- Create a virutal environment and download the requirements.txt using:
    ```bash
    uv venv
    uv pip install -r requirements.txt


#### Installing Ollama
- Install **Ollama** `https://ollama.com/download` or use:
  ```bash
  brew install ollama
- Start Ollama from applications or by running the following in your terminal:
  ```bash
  ollama start
- Ensure it’s running on `http://localhost:11434`.
- Pull one chat model and one embedding model:
  ```bash
  ollama pull llama3.2
  ollama pull nomic-embed-text

In [2]:
# Configure models via environment variables for easy swapping
import os
os.environ.setdefault("OLLAMA_ENDPOINT", "http://localhost:11434")
os.environ.setdefault("OLLAMA_MODEL", "llama3.2")
os.environ.setdefault("EMBED_MODEL", "nomic-embed-text")
print({k: os.environ[k] for k in ["OLLAMA_ENDPOINT","OLLAMA_MODEL","EMBED_MODEL"]})


{'OLLAMA_ENDPOINT': 'http://localhost:11434', 'OLLAMA_MODEL': 'llama3.2', 'EMBED_MODEL': 'nomic-embed-text'}


In [3]:
# Quick reachability check to Ollama
import requests, os
url = os.environ["OLLAMA_ENDPOINT"].rstrip('/') + "/api/tags"
r = requests.get(url, timeout=10)
print("Ollama reachable:", r.status_code, "models:", [m.get('name') for m in r.json().get('models',[])][:5])




Ollama reachable: 200 models: ['nomic-embed-text:latest', 'qwen2.5:14b']


## What is RAG and why teams use it
RAG retrieves context from a vector store, and the LLM answers **grounded** in that context—improving accuracy, transparency, and domain fit.

**Pipeline**: Index (embed docs → store) → Retrieve (top‑k by similarity) → Generate (LLM with prompt + context).


In [4]:
import importlib
for pkg in ["haystack","haystack_integrations","chroma_haystack","ollama"]:
    try:
        m = importlib.import_module(pkg)
        print(pkg, "OK", getattr(m, "__version__", ""))
    except Exception as e:
        print(pkg, "missing → pip install -r requirements.txt", e)


haystack OK 2.21.0
haystack_integrations OK 
chroma_haystack missing → pip install -r requirements.txt No module named 'chroma_haystack'
ollama OK 


## Getting a response from a local model with Ollama

Run the following command in terminal:
```bash
ollama run llama3.2

Prompt the model with a basic prompt after the trailing arrows e.g.
```bash
>>> What is the capital of Italy?


## Generating structured Output

Now prompt your locally running Ollama model to produce a **strict JSON** object that satisfies the specification below—no prose, no markdown, no trailing commentary.

### Specification
Produce a single JSON object with the following shape:
```json
{
  "products": [
    {
      "id": <integer>,
      "name": "<string>",
      "price": <float>,
      "tags": ["<string>", "..."]
    },
    "..."
  ]
}
``


Rules:

Include at least 3 products.
id must be integer and unique.
name is non-empty string.
price is a float (not string) and > 0.
tags is a non-empty array of strings (no empty strings).
Output must be valid JSON with no extra text before/after the JSON block.
Do not include comments or explanations.

Rubric (10 points total)

- Valid JSON (2 pts): Parses without errors; no extra commentary.
- Shape compliance (3 pts): Keys exist (products, id, name, price, tags); correct nesting.
- Type & content (3 pts): Integer ids (unique), float price, non-empty tags strings.
- Quantity (1 pt): At least 3 products.
- Cleanliness (1 pt): No additional fields beyond spec (strict mode).


Prompting tips:

Use delimiters for the JSON (e.g., “Return only the JSON. Do not include markdown or commentary.”).
Set model behavior (role/tone) and format constraints explicitly.
Consider adding few-shot exemplars (mini valid/invalid examples) inside your prompt to steer outputs.

### Generating longer pieces of text