# Session 2: RAG basics with Haystack and Opensearch

This notebook is designed for **VS Code** and uses **Ollama** to run local LLM models.

**What you’ll do**
- Recap the concept of RAG
- Use the Haystack library to explore RAG components
- Build a vector store using open source text documents
- Build a RAG pipeline with an in-memory document store
- Explore Opensearch and Hybrid retrieval


### Prerequisites
- Ollama running locally (`http://localhost:11434`) with a chat model (e.g., `llama3.2`) and an embedding model (e.g., `nomic-embed-text`).
- Recommended Python 3.11
- Install requirements.txt

```bash
uv python3


In [None]:
**Prereqs**
- Ollama running locally (`http://localhost:11434`) with a chat model (e.g., `llama3.2`) and an embedding model (e.g., `nomic-embed-text`).
- Recommended Python 3.11

In [0]:

import os, requests
OLLAMA_ENDPOINT=os.getenv("OLLAMA_ENDPOINT","http://localhost:11434").rstrip('/')
OLLAMA_MODEL=os.getenv("OLLAMA_MODEL","llama3.2")
EMBED_MODEL=os.getenv("EMBED_MODEL","nomic-embed-text")
print({"OLLAMA_ENDPOINT":OLLAMA_ENDPOINT,"OLLAMA_MODEL":OLLAMA_MODEL,"EMBED_MODEL":EMBED_MODEL})

# Ping Ollama
try:
    r=requests.get(OLLAMA_ENDPOINT+"/api/tags", timeout=10)
    print("Ollama reachable:", r.status_code, [m.get('name') for m in r.json().get('models',[])][:5])
except Exception as e:
    print("Warning: Ollama not reachable:", e)


In [0]:

from haystack import Pipeline, Document
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever, InMemoryBM25Retriever
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.joiners import DocumentJoiner

from haystack_integrations.components.generators.ollama import OllamaGenerator
from haystack_integrations.components.embedders.ollama import OllamaDocumentEmbedder, OllamaTextEmbedder

print("Haystack & integrations imported.")


## Part A — LLM‑as‑Judge (evaluate last week's outputs)
Paste any text you generated last week and score it against a rubric using the local model. This illustrates *prompt‑as‑program* patterns and **structured outputs**.

In [0]:

# Define a rubric and ask the model to return STRICT JSON only
rubric = {
  "length": "~300 words",
  "tone": "professional, informative",
  "structure": ["intro", "2-3 body paragraphs", "conclusion"],
  "must_include": ["definition of RAG", "one concrete example"],
  "must_avoid": ["jargon without explanation", "undefined acronyms"]
}

JUDGE_PROMPT = """
You are a meticulous evaluator. Score the SUBMISSION against the RUBRIC.

Rules:
- Return ONLY valid JSON (no commentary, no markdown)
- Fields: {"score_total": int (0-10), "feedback": ["..."], "violations": ["..."], "passes": ["..."]}

RUBRIC = {rubric}
SUBMISSION = """

submission = "Paste prior text here and run this cell."

llm = OllamaGenerator(model=OLLAMA_MODEL, url=OLLAMA_ENDPOINT)
judge_prompt = JUDGE_PROMPT.replace('{rubric}', str(rubric)) + submission
out = llm.run(prompt=judge_prompt)
print(out["replies"][0])


## Part B — Build a Small Corpus (Public Domain)
We’ll load short excerpts from public‑domain texts (Project Gutenberg) to keep the demo fast. You can replace, extend, or ingest your own local files later.

In [0]:

import requests, textwrap

PUBLIC_URLS = {
    "alice": "https://www.gutenberg.org/ebooks/11.txt.utf-8",
    "pride": "https://www.gutenberg.org/ebooks/1342.txt.utf-8",
    "artofwar": "https://www.gutenberg.org/files/17405/17405-h/17405-h.htm",
}

RAW_DOCS = {}
for k,u in PUBLIC_URLS.items():
    try:
        t = requests.get(u, timeout=20).text
        RAW_DOCS[k] = t[:20000]  # first 20k chars per title for speed
    except Exception as e:
        print("Fetch failed for", k, e)
        RAW_DOCS[k] = ""

print({k: len(v) for k,v in RAW_DOCS.items()})


In [0]:

# Create Haystack Documents (small chunks)
from textwrap import wrap

DOCS=[]
for name, text in RAW_DOCS.items():
    if not text: continue
    chunks = wrap(text, 1200)  # ~1200 char chunks
    for i, ch in enumerate(chunks[:30]):  # cap per title
        DOCS.append(Document(content=ch, meta={"source": name, "chunk": i}))

len(DOCS), DOCS[0].meta


## Part C — Component Demos (In‑Memory Store)
We’ll demonstrate **Ollama embeddings** + **InMemoryEmbeddingRetriever** (dense), and **InMemoryBM25Retriever** (sparse). Then we combine them with a **DocumentJoiner** and optional **ranker** for a simple hybrid.

In [0]:

store = InMemoryDocumentStore()
writer = DocumentWriter(store)

# Embed documents with Ollama and write
embedder = OllamaDocumentEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT)
indexing = Pipeline()
indexing.add_component("embedder", embedder)
indexing.add_component("writer", writer)
indexing.connect("embedder.documents", "writer.documents")
indexing.run({"embedder": {"documents": DOCS}})
print("Docs in store:", store.count_documents())


In [0]:

# Two retrievers
emb_retriever = InMemoryEmbeddingRetriever(document_store=store, top_k=5)
bm25_retriever = InMemoryBM25Retriever(document_store=store, top_k=5)

# Query examples
QUERIES = [
    "What is Retrieval-Augmented Generation?",
    "Who is Alice and what happens at the start of the story?",
    "What does Sun Tzu say about deception?",
]

for q in QUERIES:
    print("
=== Query:", q)
    r1 = emb_retriever.run(query_embedding=OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT).run(text=q)["embedding"])  # dense
    r2 = bm25_retriever.run(query=q)  # sparse
    print("Dense top sources:", [d.meta["source"] for d in r1["documents"]])
    print("Sparse top sources:", [d.meta["source"] for d in r2["documents"]])


In [0]:

# Simple hybrid: join results (RRF‑style ranker optional)
joiner = DocumentJoiner(join_mode="reciprocal_rank_fusion")
ranker = TransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")

hybrid = Pipeline()
hybrid.add_component("q_embedder", OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT))
hybrid.add_component("dense", emb_retriever)
hybrid.add_component("sparse", bm25_retriever)
hybrid.add_component("join", joiner)
hybrid.add_component("rerank", ranker)

hybrid.connect("q_embedder.embedding", "dense.query_embedding")
hybrid.connect("sparse", "join.documents")
hybrid.connect("dense", "join.documents")
hybrid.connect("join.documents", "rerank.documents")

for q in QUERIES:
    out = hybrid.run({"q_embedder": {"text": q}})
    print("
=== Hybrid top sources for:", q)
    print([d.meta["source"] for d in out["rerank"]["documents"][:5]])


## Part D — RAG Pipeline (In‑Memory)
We wire **retrieval → prompt construction → generation** using **PromptBuilder** and **OllamaGenerator**.

In [0]:

prompt_tmpl = """
You are a precise assistant. Answer using ONLY the provided context.
Cite the source names in brackets.

Context:
{% for d in documents %}- [{{ d.meta.source }}] {{ d.content[:300] }}
{% endfor %}

Question: {{ query }}
Answer in 3-5 sentences.
"""

rag = Pipeline()
rag.add_component("q_embedder", OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT))
rag.add_component("dense", InMemoryEmbeddingRetriever(document_store=store, top_k=5))
rag.add_component("prompt", PromptBuilder(template=prompt_tmpl))
rag.add_component("llm", OllamaGenerator(model=OLLAMA_MODEL, url=OLLAMA_ENDPOINT))

rag.connect("q_embedder.embedding", "dense.query_embedding")
rag.connect("dense.documents", "prompt.documents")
rag.connect("prompt", "llm")

answer = rag.run({"q_embedder": {"text": "Summarise Sun Tzu's stance on deception."},
                  "prompt": {"query": "Summarise Sun Tzu's stance on deception."}})
print(answer["llm"]["replies"][0])


## Part E — Vector DBs & OpenSearch (Docker) — Hybrid Retrieval
We’ll now use **OpenSearch** as the document store, then compare **BM25**, **dense embeddings**, and the **OpenSearchHybridRetriever**.

> Quickstart (local dev):
```bash
# Single node, security disabled for local testing (see official docs for options)
docker run -p 9200:9200 -p 9600:9600   -e "discovery.type=single-node"   -e "DISABLE_SECURITY_PLUGIN=true"   --name opensearch   -d opensearchproject/opensearch:latest
```
OpenSearch Haystack integration: `pip install opensearch-haystack`.


In [0]:

from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
from haystack_integrations.components.retrievers.opensearch import (
    OpenSearchBM25Retriever,
    OpenSearchEmbeddingRetriever,
    OpenSearchHybridRetriever,
)

# Adjust embedding_dim to your embedding model; nomic-embed-text -> 768
OPENSEARCH = {
    "hosts": ["http://localhost:9200"],
    "index": "demo_docs",
    "embedding_dim": 768,
}

doc_store = OpenSearchDocumentStore(**OPENSEARCH)

# Embed with Ollama and write
op_embedder = OllamaDocumentEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT)
docs_emb = op_embedder.run(DOCS)
doc_store.write_documents(docs_emb["documents"])  # index
print("OpenSearch indexed docs.")


In [0]:

# Three retrievers
os_bm25 = OpenSearchBM25Retriever(document_store=doc_store, top_k=5)
os_emb  = OpenSearchEmbeddingRetriever(document_store=doc_store, top_k=5)
# Hybrid retriever combines both under the hood
os_hybrid = OpenSearchHybridRetriever(document_store=doc_store,
                                     embedder=OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT),
                                     top_k=5)

query = "What is Retrieval-Augmented Generation?"
print("
BM25:")
print([d.meta.get("source") for d in os_bm25.run(query=query)["documents"]])
print("
Embedding:")
print([d.meta.get("source") for d in os_emb.run(query_embedding=OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT).run(text=query)["embedding"])]["documents"])  # noqa
print("
Hybrid:")
print([d.meta.get("source") for d in os_hybrid.run(query=query)["documents"]])


### Compare Retrieval Quality with an LLM‑as‑Judge
We’ll ask the model to pick which result set (BM25 vs Dense vs Hybrid) best matches the query based on top‑k snippets. This is a quick, *didactic* judge; for production you’d also compute offline metrics.

In [0]:

from itertools import islice

def top_snippets(docs, n=3):
    return [f"[{d.meta.get('source')}] {d.content[:220].replace('
',' ')}" for d in islice(docs, n)]

COMPARE_PROMPT = """
You are a fair, rigorous judge. Given a QUERY and three candidate result sets, choose which set is MOST relevant.
Return ONLY JSON: {"winner": "bm25|dense|hybrid", "rationale": "..."}

QUERY: {query}

BM25_TOP: {bm25}
DENSE_TOP: {dense}
HYBRID_TOP: {hybrid}
"""

q = "Who is Alice and what happens at the start of the story?"
res_bm25 = os_bm25.run(query=q)["documents"]
res_dense = os_emb.run(query_embedding=OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT).run(text=q)["embedding"]) ["documents"]
res_hybrid= os_hybrid.run(query=q)["documents"]

prompt = COMPARE_PROMPT.format(query=q,
                               bm25="
".join(top_snippets(res_bm25)),
                               dense="
".join(top_snippets(res_dense)),
                               hybrid="
".join(top_snippets(res_hybrid)))
judge = OllamaGenerator(model=OLLAMA_MODEL, url=OLLAMA_ENDPOINT)
print(judge.run(prompt)["replies"][0])


## What We Learned
- How to use Haystack components individually, then compose into a RAG pipeline.
- Why vector databases matter and how OpenSearch supports BM25, vector search, and **hybrid** retrieval in Haystack.
- How an **LLM‑as‑judge** can help you compare retrieval strategies for teaching and iteration.

**Next:** Persist vector stores, add chunking/cleaning pipelines, and evaluate with more rigorous metrics (e.g., recall@k with labelled sets).