# Limitations with Large Language Models

Large Language Models are objectively great. They are flexible, surprisingly cunning, and have a considerable amount of knowledge by themselves. They do come short in some cases, especially when it comes to adapting to new contextual information. Let's say you're trying to build an LLM that answers all the questions you may have about BeCode rules. What does ChatGPT know about BeCode rules, was there any of it in its training data? Probably not much.

## How could we have a LLM answer BeCode questions?

An LLM has a very long context window, as of the writing of this notebook close to 1 million words for ChatGPT, that means close to two books from Game of Thrones can be given to it and it would still be able to answer. We could give it all of BeCode rules as a text in the prompt and have it answer questions based on them. But it still comes with many caveats, mostly that giving a lot of content to an LLM is quite costly in resources and money.

Wouldn't it be better if we could just give it the parts of the document useful in the answer to help it on the prompt at hand? The LLM doesn't need to be told about the way moodle works in order to explain when the holidays of the bootcamp happen. The document with Becode Rules is given in `data/becode_rules.txt`.

Make it so that the LLM can answer the following prompt by giving it the paragraph from becode_rules that will allow it to answer the prompt, insert this in the code snippet underneath:

In [1]:
from dotenv import load_dotenv
import os, re, textwrap, pathlib
from google import genai

# Load environment variables from .env file
load_dotenv()

# Get the API key from the environment
API_KEY = os.getenv("GEMINI_API_KEY")

# Check if the API key is loaded properly
if not API_KEY:
    raise Exception('GEMINI_API_KEY not found. Please set it in your .env file.')

# Use the API key
client = genai.Client(api_key=API_KEY)

question = 'when the holidays of the bootcamp happen?'
# Path to the BeCode rules file (relative to the notebook folder)
rules_path = pathlib.Path("data") / "becode_rules.txt"
if not rules_path.exists():
    # fallback if you're running from repo root:
    alt = pathlib.Path("03-TheMountain/DataScience/GenAIText/data/becode_rules.txt")
    rules_path = alt if alt.exists() else rules_path
if not rules_path.exists():
    raise FileNotFoundError(f"Cannot find becode_rules.txt at {rules_path.absolute()}")
context = rules_path.read_text(encoding="utf-8", errors="ignore")
prompt = f'Use the following snippet:\n {context}\n\n To answer this question: {question}'
print("Prompt:\n",prompt)

response = client.models.generate_content(
    model="gemini-2.0-flash-lite", contents=prompt
)
print("\n\nAnswer:\n", response.text)

Prompt:
 Use the following snippet:
 #📝Moodle

You have to go on moodle every day 4 times (ideally in these slots):
9:00, morning check-in
12:30, morning check-out
13:30, afternoon check-in
17:00, afternoon check-out

Check-in, up to 10 minutes before the time. For example, morning check in can happen from 8:50 to 9:00. 
Check out up to 10 minutes after the time. For example, morning check out can happen from 12:30 to 12:40.

If you forget to check-in or check-out, warn your coach. DO NOT check-in whenever you are not on campus or in a discord room. This will be considered unjustified absence.

#💬Discord

Discord is where we have our home working days happen.

When you are in discord, you have to be on a table. You can mute yourself, but you cannot deafen yourself because people have to be able to reach you.

Tech talks happen in the main-room channel. Download: https://discord.com/download

Do not use the @everyone tag because it will notify everyone on the server (approx 500 people).

## Word2Vec makes a comeback

You just had to give the answer to ChatGPT to have it tell you what to do. Not very handy, might as well go search in the document yourself. But what if there was a way to make a program perform that search automatically?

In the previous notebooks, you may have read that we used to turn some words into vectors to encode meaning about them, and that words with similar meanings had similar vectors. Well what if you could do this instead with many words? What if you could do it with paragraphs? Wouldn't that be great.

Well as it turns out, you can, you can make [embeddings for paragraphs](https://ai.google.dev/gemini-api/docs/embeddings). You can do that with an entire document, and then use the paragraph who's vectors are similar to your prompt to augment it!

In [2]:
prompt_embed = client.models.embed_content(
        model="gemini-embedding-001",
        contents='when the holidays of the bootcamp happen?')

print(prompt_embed.embeddings)

[ContentEmbedding(
  values=[
    -5.8070676e-05,
    0.009062129,
    -0.010935352,
    -0.06167249,
    -0.033063993,
    <... 3067 more items ...>,
  ]
)]


Edit this:

In [5]:
doc_content=""
with open('data/becode_rules.txt','r',encoding="utf-8") as f:
    doc_content=f.read()

chunk_size=500
chunks=[]

# Normalize whitespace
text = re.sub(r"\s+", " ", doc_content).strip()
# Split the document in chunks of 500 Characters and put it in the chunks array so that we have
for i in range(0, len(text), chunk_size):
    chunks.append(text[i:i + chunk_size])

print(chunks)


['#📝Moodle You have to go on moodle every day 4 times (ideally in these slots): 9:00, morning check-in 12:30, morning check-out 13:30, afternoon check-in 17:00, afternoon check-out Check-in, up to 10 minutes before the time. For example, morning check in can happen from 8:50 to 9:00. Check out up to 10 minutes after the time. For example, morning check out can happen from 12:30 to 12:40. If you forget to check-in or check-out, warn your coach. DO NOT check-in whenever you are not on campus or in a', ' discord room. This will be considered unjustified absence. #💬Discord Discord is where we have our home working days happen. When you are in discord, you have to be on a table. You can mute yourself, but you cannot deafen yourself because people have to be able to reach you. Tech talks happen in the main-room channel. Download: https://discord.com/download Do not use the @everyone tag because it will notify everyone on the server (approx 500 people). If you want to tag everyone just tag yo

Run this:

In [7]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Prompt embedding
prompt_vector = np.array(prompt_embed.embeddings[0].values).reshape(1, -1)

# Chunk 3 and 4 embeddings
chunk_3_vector = np.array(
    client.models.embed_content(
        model="gemini-embedding-001",
        contents=chunks[3]
    ).embeddings[0].values
).reshape(1, -1)

chunk_4_vector = np.array(
    client.models.embed_content(
        model="gemini-embedding-001",
        contents=chunks[4]
    ).embeddings[0].values
).reshape(1, -1)

# Print the text
print("Chunk 3:")
print(chunks[3])

print("\nChunk 4:")
print(chunks[4])

# Compute cosine similarities
sim_3 = cosine_similarity(prompt_vector, chunk_3_vector)[0][0]
sim_4 = cosine_similarity(prompt_vector, chunk_4_vector)[0][0]

# Display results
print(f"\nSimilarity between the prompt and Chunk 3: {sim_3:.4f}")
print(f"Similarity between the prompt and Chunk 4: {sim_4:.4f}")

print("Is this giving you any ideas?")

Chunk 3:
Tech Talks You can make a tech talk about any tech topic you want, example subjects: - What is AI image generation - How does AI learn - Quiz on Python - What is web development - Data Analysis vs Data Engineering (Invite a friend who works in the field to explain his job) These are just silly examples. You can do whichever one you want as long as it is related to tech. The schedule can be found in this spreadsheet which you can edit: [Insert Tech Talk SpreadSheet Link Here] #🌡️You cannot attend

Chunk 4:
 class If you cannot attend class for any reason, usually when you are sick. There are 2 steps: Warn your day coach (main coach or co-coach) and campus coordinator by email. Example emails: main.coach@becode.org co.coach@becode.org campus.coordinator@becode.org You have to have a justification paper, upload it to moodle. If it is too late to upload it, send it to your campus coordinator and your main coach. #💼Internships Internships at BeCode Wallonia last between 1 and 3 mon

Take time to understand the example given above. Then try to make a system which can answer any question by selecting the best chunks from the content to answer the prompt. You may explore different chunk sizes, some overlap between chunks and a criterion for minimum required similarity. Give it a try!

In [8]:
#Your code here
from google import genai
from dotenv import load_dotenv
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import os, textwrap

# ----------------- Setup -----------------
load_dotenv()
API_KEY = os.getenv("GEMINI_API_KEY")
if not API_KEY:
    raise RuntimeError("Set GEMINI_API_KEY in your .env")

client = genai.Client(api_key=API_KEY)

EMBED_MODEL = "text-embedding-004"   # or "gemini-embedding-001" if that's what you have
GEN_MODEL   = "gemini-2.0-flash-lite"

# ----------------- Load BeCode rules -----------------
doc_content=""
with open('data/becode_rules.txt','r',encoding="utf-8") as f:
    doc_content=f.read()
question = "I am sick, I sent an email to my main coach and my campus coordinator, what else should I do?"

# ----------------- Helpers -----------------
def to_vec(resp) -> np.ndarray:
    """Normalize embedding response to a 1D float32 vector."""
    if hasattr(resp, "embedding") and hasattr(resp.embedding, "values"):
        vals = resp.embedding.values
    elif hasattr(resp, "embeddings") and resp.embeddings and hasattr(resp.embeddings[0], "values"):
        vals = resp.embeddings[0].values
    else:
        raise RuntimeError(f"Unexpected embedding response: {type(resp)} -> {resp}")
    return np.asarray(vals, dtype=np.float32)

def embed_text(txt: str) -> np.ndarray:
    r = client.models.embed_content(model=EMBED_MODEL, contents=txt)
    return to_vec(r)

# ----------------- Retrieval -----------------
# Embed question
q_vec = embed_text(question).reshape(1, -1)

# Embed all chunks (use the chunks list you already built)
chunk_vecs = [embed_text(c) for c in chunks]
mat = np.stack(chunk_vecs, axis=0)          # (N, D)

# Rank chunks by cosine similarity
sims = cosine_similarity(q_vec, mat)[0]     # (N,)
top_k = 3
top_idx = sims.argsort()[::-1][:top_k]

print("Top retrieved chunks:\n")
for r, i in enumerate(top_idx, 1):
    print(f"[{r}] score={sims[i]:.4f}")
    print(textwrap.shorten(chunks[i].replace("\n", " "), width=280))
    print()

# Optional: gate by a minimum similarity so we avoid hallucination
MIN_SIM = 0.35
selected = [i for i in top_idx if sims[i] >= MIN_SIM]
if not selected:
    print("No chunk passes the similarity threshold. I will answer: 'I don't know based on the rules.'")
    selected = top_idx[:1]

context = "\n\n---\n\n".join(chunks[i] for i in selected)

# ----------------- Generation -----------------
grounded_prompt = f"""
Answer STRICTLY from the provided BeCode rules context. 
If the answer is not in the context, say you don't know.

# Context:
{context}

# Question:
{question}

Provide a concise, actionable answer. If relevant, cite or paraphrase the specific rule.
""".strip()

resp = client.models.generate_content(
    model=GEN_MODEL,
    contents=[{"role": "user", "parts": [{"text": grounded_prompt}]}],
)

# Robust text extraction
def extract_text(r):
    try:
        return r.candidates[0].content.parts[0].text.strip()
    except Exception:
        pass
    for attr in ("text", "output_text"):
        if hasattr(r, attr) and getattr(r, attr):
            return getattr(r, attr).strip()
    return str(r)

answer = extract_text(resp)

print("\n" + "="*80)
print("FINAL ANSWER")
print("="*80)
print(answer)

Top retrieved chunks:

[1] score=0.6469
class If you cannot attend class for any reason, usually when you are sick. There are 2 steps: Warn your day coach (main coach or co-coach) and campus coordinator by email. Example emails: main.coach@becode.org co.coach@becode.org campus.coordinator@becode.org You have to [...]

[2] score=0.5250
#📝Moodle You have to go on moodle every day 4 times (ideally in these slots): 9:00, morning check-in 12:30, morning check-out 13:30, afternoon check-in 17:00, afternoon check-out Check-in, up to 10 minutes before the time. For example, morning check in can happen from 8:50 [...]

[3] score=0.4693
discord room. This will be considered unjustified absence. #💬Discord Discord is where we have our home working days happen. When you are in discord, you have to be on a table. You can mute yourself, but you cannot deafen yourself because people have to be able to reach you. [...]


FINAL ANSWER
You should upload your justification paper to Moodle. If it's too lat

## Where is this going

I hope you'll quickly realise how powerful this approach can be when augmenting the prompts of LLMS. You just need to find the segments with the highest similarity and feed them in the prompt. This is the beauty of Retrieval Augmented Generation (RAG). There are many things to explore from here and I invite you to go look for these topics in whichever way you prefer. Here is an example list of ideas:

- You can look into different embedding models for sentences and paragraphs. This is the current overall [leaderboard](https://huggingface.co/spaces/mteb/leaderboard)
- You can look into ways google has of using sentence embeddings which may be a little less clunky than what we did above
- You can have a peek into ways of augmenting RAG with a classic keywords search
- You can try to replicate this example with something you care about

Have fun with it!