# Retrieval Augmented Generation (RAG)

We can think of large language models (LLMs) as people who have read every book in a library up until a certain date. 📚 They've remembered the content with varying degrees of accuracy, but their knowledge base consists of everything that was in that library.

In the case of LLMs, the "library" is actually a massive collection of internet documents—heavily filtered for quality—along with some curated private datasets, such as newspaper archives and academic papers.

If we need the model to understand or use information that was not (or only barely) included in its training data, we have to provide that information in the prompt. However, there's a limit to how much information we can give at once. This is due to something called a context window—a maximum number of tokens (e.g., 8,096 tokens) that the model can process in a single prompt.

## Augmented

In [5]:
from ollama import chat
from IPython.display import Markdown, display

chat_response = chat(model='gemma3:1b', messages=[
    {
    'role': 'user',
    'content': "Hey, I am looking for a mother's day gift, please suggest something in a price range from 20 to 50 Euros.",
    },
])

display(Markdown(chat_response.message.content))

Okay, let's brainstorm some Mother's Day gifts in the 20-50 Euro range! Here’s a breakdown of suggestions, categorized by interest, to help you find the perfect one:

**1. Cozy & Relaxing (Around 20-30 Euros):**

* **Luxury Bath Bomb or Bubble Bath Set:** A beautifully scented bath bomb or a set of lovely bubble bath is always appreciated. (Think Lavender, Citrus, or Rose scents).
* **Scented Candle:** A small, nicely scented candle (soy or beeswax candles are a good choice) is a classic and comforting gift.
* **Cozy Socks & Blanket:** A pair of plush, warm socks with a pretty design or a small, soft blanket.

**2. Food & Drink Focused (Around 20-40 Euros):**

* **Gourmet Tea or Coffee Gift Set:** A selection of premium teas or coffee beans and a nice mug makes a great treat.
* **Small Jar of Gourmet Jam or Honey:**  A delightful spread for toast or scones.
* **Chocolate Box:** A beautifully packaged box of artisanal chocolates - a splurge, but a thoughtful choice.
* **Specialty Biscotti:**  A box of delicious biscotti is a comforting and sophisticated treat.


**3.  Personalized & Thoughtful (Around 30-45 Euros):**

* **Photo Calendar:** A calendar featuring family photos – a lovely and sentimental gift. (You can find really affordable ones!)
* **Personalized Mug:**  A mug with a photo, a short message, or a cute design.
* **Small Succulent or Plant:**  A little greenery adds a touch of life to any space. (Consider a low-maintenance variety.)
* **Customized Stationery:**  If she loves to write, a set of personalized notecards or a notebook with her name is sweet.


**4.  Experiences (Worth considering if you have a budget):**

* **Coffee or Tea Tasting:** A voucher for a local coffee or tea shop to enjoy.
* **Small Spa Treatment Gift Voucher:**  A voucher for a manicure/pedicure or a facial. (Could be a little higher, but still within the range.)


**To help me narrow it down further and give you more specific recommendations, could you tell me:**

*   **What are her interests?** (e.g., gardening, cooking, reading, fashion, travel, etc.)
*   **What's her personality like?** (e.g., practical, sentimental, luxurious, humorous?)
*   **What’s your preferred style of gift?** (e.g., small and thoughtful, bigger and more extravagant?)

---

~~Let's now do the same with the context that it is currently Easter and see how the model adapts its gift recommendations.~~ 

Let's be a little more generous and say 40 to 80 euros and no flowers.

In [11]:
extra_content = "Set a price range of 40 to 80 Euros. Do not chose flowers as gift."

chat_response = chat(model='gemma3:1b', messages=[
    {
    'role': 'user',
    'content': "Hey, I am looking for some mother's day gifts, please recommend some." + extra_content,
    },
])

display(Markdown(chat_response.message.content))

Okay, let's brainstorm some Mother’s Day gifts for around €40-€80! Here's a breakdown of ideas, categorized by vibe, with estimated price ranges:

**1. Practical & Luxurious (Around €40-€60)**

* **High-Quality Tea & Gourmet Treats Basket:** A beautifully packaged selection of artisan teas (like Earl Grey, herbal blends), a nice box of chocolates or macarons, and a small, luxurious item like a scented candle or a lovely hand cream. (€40-€60)
* **Cozy Throw Blanket:** A super soft and stylish throw blanket is always appreciated. Look for something in a neutral color or with a subtle pattern. (€40-€60)
* **Portable Espresso Maker:** If your mom enjoys coffee, a small, portable espresso maker is a fantastic and practical gift. (€40-€60)
* **Leather Keychain with a Meaningful Initials:** A beautifully crafted leather keychain with her initials or a short, meaningful message is a thoughtful and durable gift. (€20-€40)


**2. Relaxation & Self-Care (Around €40-€70)**

* **Bath Bombs & Essential Oil Set:** A luxurious set of bath bombs with calming scents, along with a few essential oils (lavender, eucalyptus, etc.) – creates a spa-like experience. (€25-€45)
* **Massage Oil & a Luxurious Body Lotion:** A nice massage oil (particularly a moisturizing one) combined with a high-quality body lotion is a great way to encourage relaxation. (€30-€50)
* **Luxurious Face Mask Set:** A collection of different face masks caters to different skin types, offering a pampering experience. (€30-€50) 
* **Adult Coloring Book & Nice Colored Pencils:**  A relaxing activity can be wonderful, especially with a beautiful coloring book and some quality colored pencils. (€20-€40)


**3. Hobby-Related (Around €40-€70)**

* **Knitting/Crochet Supplies:** If she enjoys crafting, a new stitch, a small yarn, or a helpful pattern can be a welcome gift. (€20-€40)
* **Cooking/Baking Kit:** If she loves to cook, a mini kit with some new spices, or a unique baking tool could be very appreciated. (€30-€50) 
* **Puzzle Book (Her Favorite Genre):**  A challenging puzzle book related to her interests – gardening, history, or mysteries – can provide hours of entertainment. (€20-€40)


**4.  Personalized/Sentimental (Around €40-€80)**

* **Custom Photo Album or Calendar:** Fill it with family photos - this is a heartfelt and lasting gift. (€30-€60)
* **Personalized Mug with a Photo:**  A mug featuring a cherished photo creates a constant reminder of loved ones. (€20-€40)
* **Handwritten Letter:** Sometimes, a heartfelt, handwritten letter expressing your appreciation is the most meaningful gift. (€0 - €20, depending on how elaborate you want it)



**To help me narrow it down even further and give you even *better* recommendations, could you tell me:**

*   **What are her main interests/hobbies?** (e.g., gardening, reading, cooking, crafting, travel, etc.)
*   **What's her personal style like?** (e.g., minimalist, cozy, colorful, classic, bohemian?)
*   **What kind of gifts does she *usually* appreciate?** (e.g., thoughtful, practical, fun, luxurious?)

---

The notebook is named **RAG (Retrieval-Augmented Generation)**. In this context, "augmentation" refers to enriching the generation process with additional, task-specific information. However, when the volume of available data is too large to insert in full, we must **retrieve** only the most relevant parts to use as context.

**Note**: Having a context window of for example 128.000 tokens does not mean that we should fill it up until that. Adding more and more tokens leads to more latency (i.e. it takes longer for the model to serve the user) and there could be also an information overload i.e. that the information fits into the context window but there is just too much information for the model may overlook important details. (This in the end could be only found out by testing, but personally I think in terms of information density and complexity. The german "Bild Zeitung" has for example a very low information density and complexity while a medical textbook has a very high information density and complexity. Hence doing Q&A over 30.000 tokens of Bild articles is definitively not the same as doing Q&A over 30.000 tokens of medical textbook pages.

## Retrieval

For actually retrieving something we need data. Let us consider for now this simplified database of ten different entries/documents about talks given on a future festival. We want to build a small RAG chatbot for festival visitors where they can ask for their specific interests what talks would fits the most. 

**Note** The dataset is so small that I would personally just completly put it always in the context but for demonstration purposes let's use the talks as single documents. Read through them to get a grasp of the data below:

##### 1. 🌿 **The Next Nature: Designing with the Future in Mind**  
*Explore how biomimicry, regenerative design, and synthetic biology are reshaping architecture, materials, and cities. Where does design end and nature begin?*

---

##### 2. 🧠 **Neural Frontiers: The Brain-Computer Interface Revolution**  
*From thought-controlled devices to memory enhancement, dive into the fast-evolving world of brain tech — and the ethical mazes it brings.*

---

##### 3. 🎨🤖 **AI & the Soul of Art: Who Really Owns Creativity?**  
*Artists, coders, and philosophers debate the rise of generative art. Can machines make meaning? And where does human intuition still reign supreme?*

---

##### 4. 💼🚫 **Post-Work: Imagining a Life Beyond Jobs**  
*As automation reshapes labor, what comes next? UBI, digital nomadism, reputation economies — a candid discussion about freedom, purpose, and survival.*

---

##### 5. 🪐 **Planet B: Terraforming Ideas for Earth 2.0**  
*Science fiction meets climate urgency. This talk blends real research with wild speculation — from Mars domes to floating cities in the clouds of Venus.*

---

##### 6. 🕶️🌐 **Reality is Optional: The Rise of Immersive Worlds**  
*VR, AR, XR — and whatever’s next. What happens when our digital spaces feel more real than reality itself? And who gets to write the rules?*

---

##### 7. 🧬⏳ **The Ethics of Immortality: Living Forever in a Mortal World**  
*Cryonics, gene editing, mind uploading — tech is chasing eternal life. But what would it mean for love, loss, and the human condition?*

---

##### 8. 💻⚖️ **Code as Culture: Programming the Future We Want**  
*Code is not neutral — it shapes societies. This talk explores how developers are becoming the new lawmakers, and how we hold them accountable.*

---

##### 9. 🕸️🛠️ **The Wild Web: Reclaiming the Internet from Algorithms**  
*Can we rebuild the web for people, not profit? Meet the rebels, hackers, and dreamers creating decentralized, community-first digital spaces.*

---

##### 10. 🕰️🚀 **Time Travelers Welcome: Building the Long Now**  
*Futurists, historians, and deep-time thinkers gather to explore projects that think in centuries — from 10,000-year clocks to interstellar archives.*


---

Okay, we see depending on your interests there might different talks to be the most interesting ones.

First we need to prepare the talks 

In [13]:
festival_talks = [
    "🌿 The Next Nature: Designing with the Future in Mind — Explore how biomimicry, regenerative design, and synthetic biology are reshaping architecture, materials, and cities. Where does design end and nature begin?",
    "🧠 Neural Frontiers: The Brain-Computer Interface Revolution — From thought-controlled devices to memory enhancement, dive into the fast-evolving world of brain tech — and the ethical mazes it brings.",
    "🎨🤖 AI & the Soul of Art: Who Really Owns Creativity? — Artists, coders, and philosophers debate the rise of generative art. Can machines make meaning? And where does human intuition still reign supreme?",
    "💼🚫 Post-Work: Imagining a Life Beyond Jobs — As automation reshapes labor, what comes next? UBI, digital nomadism, reputation economies — a candid discussion about freedom, purpose, and survival.",
    "🪐 Planet B: Terraforming Ideas for Earth 2.0 — Science fiction meets climate urgency. This talk blends real research with wild speculation — from Mars domes to floating cities in the clouds of Venus.",
    "🕶️🌐 Reality is Optional: The Rise of Immersive Worlds — VR, AR, XR — and whatever’s next. What happens when our digital spaces feel more real than reality itself? And who gets to write the rules?",
    "🧬⏳ The Ethics of Immortality: Living Forever in a Mortal World — Cryonics, gene editing, mind uploading — tech is chasing eternal life. But what would it mean for love, loss, and the human condition?",
    "💻⚖️ Code as Culture: Programming the Future We Want — Code is not neutral — it shapes societies. This talk explores how developers are becoming the new lawmakers, and how we hold them accountable.",
    "🕸️🛠️ The Wild Web: Reclaiming the Internet from Algorithms — Can we rebuild the web for people, not profit? Meet the rebels, hackers, and dreamers creating decentralized, community-first digital spaces.",
    "🕰️🚀 Time Travelers Welcome: Building the Long Now — Futurists, historians, and deep-time thinkers gather to explore projects that think in centuries — from 10,000-year clocks to interstellar archives."
]


For the retrieval part we need a kind of search based on the question over talks. The standard way was to do a key word search but with the upcoming AI era that default changed to a vector search. Therefore we need another type of model: An embedding model. An embedding model takes a text and transforms it into a vector of a certain size (e.g. 1*1048). Every text is mapped to a the same vector size. The embedding models are pretrained that they map similar texts close to each other in the vector space (so LLMs aren't the only cool models). Luckily ollama and also all major providers like openai and huggingface also provide embedding models. We will now take the currently best embedding models from ollama (lukily the embedding models are usually really small < 1GB)

Run **ollama pull nomic-embed-text** in your terminal

Ollama is able to serve the LLM and the embedding model at the same time (given enough (V)RAM)

In [16]:
from ollama import embed

# Generate an embedding for a single input
response = embed(model='nomic-embed-text', input='opencampus is the best!')

# Access the embedding vector
embedding = response['embeddings']
print(embedding)
print(len(embedding[0]))

[[-0.041261844, 0.06610261, -0.17822084, -0.016629975, 0.021285325, -0.026365353, -0.011967031, 0.017026164, -0.0102254655, -0.031980272, 0.050730683, -0.019034838, 0.045592844, -0.008341599, 0.012121793, 0.020955326, -0.040141687, -0.062111855, -0.030746346, -0.0075697703, -0.016780589, -0.02938528, -0.0114756655, 0.03243297, 0.089692056, 0.053597085, -0.027188728, 0.019495934, -0.0077963886, 0.058877368, 0.07543706, -0.028498396, 0.021128139, 0.010190087, 0.04324336, 0.033847433, -0.005320718, 0.011530684, -0.005175544, 0.001702866, 0.041846916, 0.009595384, 0.031900715, -0.005481884, 0.048807073, -0.07657718, 0.020511717, 0.04466181, 0.050690647, -0.071095616, -0.020409374, -0.028782988, -0.02661789, 0.008561729, 0.06537089, -0.019210912, -0.018011179, -0.011410761, 0.007950822, 0.016235167, 0.08522264, 0.030174375, 0.0014069363, 0.06269761, 0.019721037, 0.05231696, -0.012256007, 0.07123785, -0.008217562, -0.012078117, 0.057234813, 0.0058020097, -0.0028707415, 0.012909217, -0.019607

As we can see the text is converted into a vector with a size 1*768

In [18]:
# Generate an embedding for a batch input
response = embed(model='nomic-embed-text', input=festival_talks)

In [20]:
# Access the embedding vectors
embedding = response['embeddings']

print(len(embedding), len(embedding[0]), len(embedding[0]))

10 768 768


So we can see now that we have a list of 10 entries each with a vector with 768 entries

In [23]:
pip install numpy

Note: you may need to restart the kernel to use updated packages.


In [24]:
import numpy as np

vectors = np.array(embedding)

query = np.array(embed(model='nomic-embed-text', input='I am interested in AI talks')['embeddings'])

# Compute Euclidean distances
differences = vectors - query  # shape: (n_vectors, vector_dim)
distances = np.linalg.norm(differences, axis=1)

# Index of closest vector
closest_index = np.argmin(distances)

# Sorted indices by distance
sorted_indices = np.argsort(distances)

print("Closest vector index:", closest_index)
print("Distance:", distances[closest_index])
print("Sorted indices by distance:", sorted_indices)


Closest vector index: 2
Distance: 0.8336664842668186
Sorted indices by distance: [2 1 4 3 7 5 9 8 0 6]


In [27]:
festival_talks[closest_index]

'🎨🤖 AI & the Soul of Art: Who Really Owns Creativity? — Artists, coders, and philosophers debate the rise of generative art. Can machines make meaning? And where does human intuition still reign supreme?'

In [29]:
# Sort festival_talks according to the sorted_indices
sorted_festival_talks = [festival_talks[i] for i in sorted_indices]

# Optionally print it
for num, talk in enumerate(sorted_festival_talks):
    print(num+1, talk)
    print()

1 🎨🤖 AI & the Soul of Art: Who Really Owns Creativity? — Artists, coders, and philosophers debate the rise of generative art. Can machines make meaning? And where does human intuition still reign supreme?

2 🧠 Neural Frontiers: The Brain-Computer Interface Revolution — From thought-controlled devices to memory enhancement, dive into the fast-evolving world of brain tech — and the ethical mazes it brings.

3 🪐 Planet B: Terraforming Ideas for Earth 2.0 — Science fiction meets climate urgency. This talk blends real research with wild speculation — from Mars domes to floating cities in the clouds of Venus.

4 💼🚫 Post-Work: Imagining a Life Beyond Jobs — As automation reshapes labor, what comes next? UBI, digital nomadism, reputation economies — a candid discussion about freedom, purpose, and survival.

5 💻⚖️ Code as Culture: Programming the Future We Want — Code is not neutral — it shapes societies. This talk explores how developers are becoming the new lawmakers, and how we hold them acc

So for real retrieval we have to set a retrieval count of how many documents we want to retrieve e.g. **k=3**

In [32]:
retrieved_festival_talks = sorted_festival_talks[:3]

Let's now make our final call

In [36]:
retrieved_content = '\n\n'.join(retrieved_festival_talks)

chat_response = chat(model='gemma3:1b', messages=[
    {
    'role': 'user',
    'content': 'User Question:\nI am interested in AI talks\n\n' + 'Answer the user question on festival talks as best as you can based on the retrieved festival talks\n\nRetrieved content:\n\n' + retrieved_content,
    },
])

display(Markdown(chat_response.message.content))

Okay, based on the retrieved content, here’s a breakdown of potential AI talks at a festival, categorized and with a focus on what might be engaging:

**1. Focus on AI & the Soul of Art:**

*   **“🎨🤖 AI & the Soul of Art: Who Really Owns Creativity?”** – This is a strong candidate. It directly addresses a significant and evolving topic: the role of AI in creative expression.  It probes the question of meaning and ownership in art – a really compelling conversation.

**2. Focus on Brain Tech:**

*   **🧠 Neural Frontiers: The Brain-Computer Interface Revolution** – This talk is highly likely to be relevant.  It touches on the core themes of AI and the brain, which is ripe for discussion and potentially offers fascinating insights.

**3. Focus on Climate Change & Earth Transformation:**

*   **🪐 Planet B: Terraforming Ideas for Earth 2.0** –  This talk, while more speculative, could be captivating. The combination of real science and imaginative world-building makes it potentially highly engaging for a festival audience.


---

**To help me refine this further and suggest *more* relevant talks, could you tell me:**

*   **What kind of festival is it?** (e.g., Tech, Science, Arts, etc.?)
*   **What is the overall tone of the festival?** (e.g., Serious and academic, lighthearted and exploratory?)

## Conclusion

Okay we saw that the whole idea of retrieval augmented generation is quite simple and depends on augmenting the LLM with extra context in the prompt. The context could also be retrieved if there is a lot of content and we are not able to just stick everything in the prompt. Now we already have a nice RAG chatbot coded by hand!!