# Session 2: RAG basics with Haystack and Opensearch

This notebook is designed for **VS Code** and uses **Ollama** to run local LLM models.

**What you’ll do**
- Recap the concept of RAG
- Use the Haystack library to explore RAG components
- Build a vector store using open source text documents
- Build a RAG pipeline with an in-memory document store
- Explore Opensearch and Hybrid retrieval


### Prerequisites
- Ollama running locally (`http://localhost:11434`) with a chat model (e.g., `llama3.2`) and an embedding model (e.g., `nomic-embed-text`).
- Recommended Python 3.11
- Install requirements.txt

```bash
uv python install 3.11
uv venv --python 3.11



### Import all the packages we will need

In [0]:

from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder

from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever, InMemoryBM25Retriever
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.components.joiners import DocumentJoiner

from haystack_integrations.components.generators.ollama import OllamaGenerator
from haystack_integrations.components.embedders.ollama import OllamaDocumentEmbedder, OllamaTextEmbedder

print("Haystack & integrations imported.")


In [None]:
EMBED_MODEL = "nomic-embed-text"
OLLAMA_MODEL = "llama3.2"
OLLAMA_ENDPOINT = "http://localhost:11434"

## Part 1 — LLM‑as‑Judge (evaluate last week's outputs)
Paste any text you generated last week and score it against a rubric using the local model. This illustrates *prompt‑as‑program* patterns and **structured outputs**.

In [4]:
from haystack_integrations.components.generators.ollama import OllamaChatGenerator
from haystack.dataclasses import ChatMessage

generator = OllamaChatGenerator(model="llama3.2",
                            url = "http://localhost:11434",
                            generation_kwargs={
                              "temperature": 0.7,
                              })

system_message = "You are a helpful, respectful and honest assistant. Always answer as \
    helpfully as possible, while being safe. Your answers should not include any harmful,\
    unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your/ \
    responses are socially unbiased and positive in nature."
user_message = "What's Natural Language Processing?"
messages = [ChatMessage.from_system(system_message),
ChatMessage.from_user(user_message)]

output = generator.run(messages=messages)
print(output)

## Can you make the output print nicely?

{'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that deals with the interaction between computers and humans in natural language. It enables computers to understand, interpret, and generate human language, such as text or speech.\n\nNLP involves a range of techniques and algorithms that allow computers to:\n\n1. **Text Analysis**: Extract insights from unstructured text data, like sentiment analysis (e.g., determining whether a piece of text is positive or negative) or named entity recognition (identifying specific entities like names or locations).\n2. **Language Understanding**: Interpret the meaning of language, including syntax, semantics, and pragmatics.\n3. **Machine Translation**: Translate text from one language to another.\n4. **Speech Recognition**: Convert spoken words into written text.\n5. **Chatbots and Virtual Assistants**: Develop conversational 

## Part 2 — Build a Small Vector Database (Public Domain)
We’ll load short excerpts from public‑domain texts (Project Gutenberg) to keep the demo fast. You can replace, extend, or ingest your own local files later.

In [9]:

import requests
# you can swap out these URLs with any public text URLs you like
PUBLIC_URLS = {
    "alice": "https://www.gutenberg.org/ebooks/11.txt.utf-8",
    "pride": "https://www.gutenberg.org/ebooks/1342.txt.utf-8",
    "artofwar": "https://www.gutenberg.org/files/17405/17405-h/17405-h.htm",
}

RAW_DOCS = {}
for k,u in PUBLIC_URLS.items():
    try:
        # fetch the text from the url
        t = requests.get(u, timeout=20).text
        RAW_DOCS[k] = t[:20000]  # first 20k chars per title for speed
    except Exception as e:
        print("Fetch failed for", k, e)
        RAW_DOCS[k] = ""

print({k: len(v) for k,v in RAW_DOCS.items()})


{'alice': 20000, 'pride': 20000, 'artofwar': 20000}


In [11]:
from haystack import Document
# Create a list of Haystack Documents 
DOCS=[]
for name, text in RAW_DOCS.items():
    if not text: continue
    DOCS.append(Document(content=text, meta={"source": name}))

print(DOCS[0])


Document(id=338b7a2313cf2dc652f635b7b78377cc7882cb441f3a53f00f76496be1f313c6, content: '﻿The Project Gutenberg eBook of Alice's Adventures in Wonderland
    
This ebook is for the use of...', meta: {'source': 'alice'})


## Part 3 — Component Demos (In‑Memory Store)
We’ll demonstrate **Ollama embeddings** + **InMemoryEmbeddingRetriever** (dense), and **InMemoryBM25Retriever** (sparse). Then we combine them with a **DocumentJoiner** and optional **ranker** for a simple hybrid.

In [None]:
from haystack.components.preprocessors import DocumentSplitter
splitter = DocumentSplitter(split_by="word", split_length=300, split_overlap=50)
chunked_docs = splitter.run(documents=DOCS)

In [None]:

from haystack_integrations.components.embedders.ollama import OllamaDocumentEmbedder

# Embed documents with Ollama and write
embedder = OllamaDocumentEmbedder(model="nomic-embed-text", url="http://localhost:11434")

# Pass the list of haystack documents and run the embedder
embedded_docs = embedder.run(chunked_docs)


In [None]:
from haystack.document_stores.in_memory import InMemoryDocumentStore

# Initialize In-Memory Document Store (Vector Database)
document_store = InMemoryDocumentStore()

# Write the embedded documents to the document store
document_store.write_documents([embedded_docs])

### Retrieval (Dense + Sparse Retrieval)
https://haystack.deepset.ai/blog/hybrid-retrieval

In [0]:

# Two retrievers
emb_retriever = InMemoryEmbeddingRetriever(document_store=document_store, top_k=5) # dense retreiver
bm25_retriever = InMemoryBM25Retriever(document_store=document_store, top_k=5) # sparse retriever

# Query examples
QUERIES = [
    "What is Retrieval-Augmented Generation?",
    "Who is Alice and what happens at the start of the story?",
    "What does Sun Tzu say about deception?",
]

for q in QUERIES:
    print("\nQuery:", q)
    r1 = emb_retriever.run(query_embedding=OllamaTextEmbedder(model="nomic-embed-text", url="http://localhost:11434").run(text=q)["embedding"])  # dense
    r2 = bm25_retriever.run(query=q)  # sparse
    print("Dense top sources:", [d.meta["source"] for d in r1["documents"]])
    print("Sparse top sources:", [d.meta["source"] for d in r2["documents"]])


In [0]:

# Simple hybrid: join results (RRF‑style ranker optional)
joiner = DocumentJoiner(join_mode="reciprocal_rank_fusion")
ranker = TransformersSimilarityRanker(model="cross-encoder/ms-marco-MiniLM-L-6-v2")

hybrid = Pipeline()
hybrid.add_component("q_embedder", OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT))
hybrid.add_component("dense", emb_retriever)
hybrid.add_component("sparse", bm25_retriever)
hybrid.add_component("join", joiner)
hybrid.add_component("rerank", ranker)

hybrid.connect("q_embedder.embedding", "dense.query_embedding")
hybrid.connect("sparse", "join.documents")
hybrid.connect("dense", "join.documents")
hybrid.connect("join.documents", "rerank.documents")

for q in QUERIES:
    out = hybrid.run({"q_embedder": {"text": q}})
    print("\nHybrid top sources for:", q)
    print([d.meta["source"] for d in out["rerank"]["documents"][:5]])


## Part 4 — RAG Pipeline (In‑Memory)
We wire **retrieval → prompt construction → generation** using **PromptBuilder** and **OllamaGenerator**.

In [0]:

prompt_tmpl = """
You are a precise assistant. Answer using ONLY the provided context.
Cite the source names in brackets.

Context:
{% for d in documents %}- [{{ d.meta.source }}] {{ d.content[:300] }}
{% endfor %}

Question: {{ query }}
Answer in 3-5 sentences.
"""

rag = Pipeline()
rag.add_component("q_embedder", OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT))
rag.add_component("dense", InMemoryEmbeddingRetriever(document_store=store, top_k=5))
rag.add_component("prompt", PromptBuilder(template=prompt_tmpl))
rag.add_component("llm", OllamaGenerator(model=OLLAMA_MODEL, url=OLLAMA_ENDPOINT))

rag.connect("q_embedder.embedding", "dense.query_embedding")
rag.connect("dense.documents", "prompt.documents")
rag.connect("prompt", "llm")

answer = rag.run({"q_embedder": {"text": "Summarise Sun Tzu's stance on deception."},
                  "prompt": {"query": "Summarise Sun Tzu's stance on deception."}})
print(answer["llm"]["replies"][0])


## Part 5 — Vector DBs & OpenSearch (Docker) — Hybrid Retrieval
We’ll now use **OpenSearch** as the document store, then compare **BM25**, **dense embeddings**, and the **OpenSearchHybridRetriever**.

> Quickstart (local dev):
```bash
# Single node, security disabled for local testing (see official docs for options)
docker run -p 9200:9200 -p 9600:9600   -e "discovery.type=single-node"   -e "DISABLE_SECURITY_PLUGIN=true"   --name opensearch   -d opensearchproject/opensearch:latest
```
OpenSearch Haystack integration: `pip install opensearch-haystack`.


In [0]:

from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
from haystack_integrations.components.retrievers.opensearch import (
    OpenSearchBM25Retriever,
    OpenSearchEmbeddingRetriever,
    OpenSearchHybridRetriever,
)

# Adjust embedding_dim to your embedding model; nomic-embed-text -> 768
OPENSEARCH = {
    "hosts": ["http://localhost:9200"],
    "index": "demo_docs",
    "embedding_dim": 768,
}

doc_store = OpenSearchDocumentStore(**OPENSEARCH)

# Embed with Ollama and write
op_embedder = OllamaDocumentEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT)
docs_emb = op_embedder.run(DOCS)
doc_store.write_documents(docs_emb["documents"])  # index
print("OpenSearch indexed docs.")


In [0]:

# Three retrievers
os_bm25 = OpenSearchBM25Retriever(document_store=doc_store, top_k=5)
os_emb  = OpenSearchEmbeddingRetriever(document_store=doc_store, top_k=5)
# Hybrid retriever combines both under the hood
os_hybrid = OpenSearchHybridRetriever(document_store=doc_store,
                                     embedder=OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT),
                                     top_k=5)

query = "What is Retrieval-Augmented Generation?"
print("BM25:")
print([d.meta.get("source") for d in os_bm25.run(query=query)["documents"]])
print("Embedding:")
print([d.meta.get("source") for d in os_emb.run(query_embedding=OllamaTextEmbedder(model=EMBED_MODEL, url=OLLAMA_ENDPOINT).run(text=query)["embedding"])]["documents"])  # noqa
print("Hybrid:")
print([d.meta.get("source") for d in os_hybrid.run(query=query)["documents"]])
