## From RAG to Agents: Building Smart AI Assistants
An attempt to recover my lost notes after Codespaces crashed on me 😫

### What is RAG?
RAG (Retrieval-Augmented Generation) is a technique that combines retrieval of information with generation by a Large Language Model (LLM). It is particularly effective when you have a specific knowledge base and want the LLM to answer questions only using that context.

RAG consists of 3 parts:
- Search → Finds relevant docs (e.g., FAQ entries about course enrollment).
- Prompt → Combines docs + query into a structured template.
- LLM → Generates a grounded answer (e.g., "Yes, you can join late (see FAQ).").

### AI Agents
These are autonomous systems that interact with environments, make decisions, and perform actions (e.g., search, answer, modify data). 
Agents are AI systems that can:

- Make decisions about what actions to take
- Use tools to accomplish tasks
- Maintain state and context
- Learn from previous interactions
- Work towards specific goals

Agentic flow is not necessarily a completely independent agent, but it can still make some decisions during the flow execution

A typical agentic flow consists of:
- Receiving a user request
- Analyzing the request and available tools
- Deciding on the next action
- Executing the action using appropriate tools
- Evaluating the results
- Either completing the task or continuing with more actions

The key difference from basic RAG is that agents can:
- Make multiple search queries
- Combine information from different sources
- Decide when to stop searching
- Use their own knowledge when appropriate
- Chain multiple actions together

So in agentic RAG, the system
- has access to the history of previous actions
- makes decisions independently based on the current information and the previous actions

### Agentic RAG (Decision-Making)
Agentic RAG enhances basic RAG by allowing the AI assistant to decide whether to answer a question directly using its own knowledge or to perform a search in the FAQ database. This is achieved by modifying the prompt to include instructions and output templates for different actions:

- SEARCH: If the context is empty and the LLM decides it needs more information from the FAQ database
- ANSWER (source: CONTEXT): If the LLM can answer the question using the provided context from a search
- ANSWER (source: OWN_KNOWLEDGE): If the context does not contain the answer, or if the question can be answered without needing a search, the LLM uses its internal knowledge

### Difference bbetween Basic RAG and Agentic RAG

| Feature            | Basic RAG                          | Agentic RAG                          |
|--------------------|------------------------------------|--------------------------------------|
| **Decision-Making** | Always retrieves before answering  | Decides whether to retrieve (e.g., skips search for simple queries) |
| **Flexibility**    | Linear flow (search → prompt → LLM) | Dynamic loops (iterative queries, multi-source combining) |
| **Memory**         | No history of past actions         | Tracks previous searches/queries to avoid redundancy |
| **Tool Use**       | Single retrieval tool              | Chains multiple tools (search + edit + APIs) |
| **Autonomy**       | Follows fixed instructions         | Makes independent decisions (e.g., stops after max iterations) |



### Agentic search
Agentic search extends RAG with multi-query exploration and iterative refinement.

**How it works:**
- Reformulates queries (e.g., "How to excel in Module 1?" → "Docker best practices").
- Combines results from multiple searches.
- Stops when sufficient context is gathered or max iterations reached.

Advantage: Deeper topic coverage than single-query RAG.



In [1]:
# import the files and document needed

import requests 

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

In [2]:
# import minsearch, the toy search engine for this
# use AppendableIndex instead of Index such that new documents can be added
# Create the index
# declare text fields, and the keywords field for search
# fit the documents to the index

from minsearch import AppendableIndex

index = AppendableIndex(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.append.AppendableIndex at 0x7848dd4c9a90>

In [3]:
# Create the search function with a boost param to ensure question is ranked high
# Also add a filer for course to search
# Declare the number of results to display

def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5,
        output_ids=True
    )

    return results

In [4]:
question = 'Can I still join the course?'

In [5]:
# create our basic RAG prompt template 

prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

<QUESTION>
{question}
</QUESTION>

<CONTEXT>
{context}
</CONTEXT>
""".strip()

In [6]:
def build_prompt(query, search_results):
    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

In [7]:
search_results = search(question)

In [8]:
prompt = build_prompt(question, search_results)

In [9]:
import os
from dotenv import load_dotenv
load_dotenv()
api_key = os.getenv("GEMINI_KEY")

In [18]:
#pip install google-genai

In [10]:
from google import genai
def llm(prompt):
    gemini_client = genai.Client(api_key=api_key)
    response = gemini_client.models.generate_content(
        model="gemini-2.0-flash",
        contents=prompt,
        config={
        "response_mime_type": "application/json"
    }
    )
    
    return response.text

In [11]:
answer = llm(prompt)

In [12]:
print(answer)

{
"answer": "Yes, even if you don't register, you're still eligible to submit the homeworks. Be aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute."
}


In [26]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [29]:
# Asking a known answer
rag("Can i still join the class?")

'{\n "answer": "Yes, even if you don\'t register, you\'re still eligible to submit the homeworks. Be aware, however, that there will be deadlines for turning in the final projects. So don\'t leave everything for the last minute."\n}'

In [30]:
# Asking unknown to prove the efficiency of agentic RAG
rag("How do I patch KDE under FreeBSD?")

'"I am sorry, but I cannot answer the question because the context is empty."'

### "Agentic" RAG

First, we'll take the prompt we have so far and make it a little more "agentic":

- Tell the LLM that it can answer the question directly or look up context
- Provide output templates
- Show clearly what's the source of the answer

In [48]:
# Now we are tooling the agent usign prompt which gives it multiple options
# At the onset, the context is empty since there is no history. The LLM now lookuo the answer in the FAQ db

prompt_template = """
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.
At the beginning the context is EMPTY.

<QUESTION>
{question}
</QUESTION>

<CONTEXT> 
{context}
</CONTEXT>

If CONTEXT is EMPTY, you can use our FAQ database.
In this case, use the following output template:

{{
"action": "SEARCH",
"reasoning": "<add your reasoning here>"
}}

If you can answer the QUESTION using CONTEXT, use this template:

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "CONTEXT"
}}

If the context doesn't contain the answer, use your own knowledge to answer the question

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}}
""".strip()

In [49]:
question = 'Can I still join the course?'
context = 'EMPTY'

In [50]:
prompt = prompt_template.format(question=question, context=context)

In [51]:
print(prompt)

You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.
At the beginning the context is EMPTY.

<QUESTION>
Can I still join the course?
</QUESTION>

<CONTEXT> 
EMPTY
</CONTEXT>

If CONTEXT is EMPTY, you can use our FAQ database.
In this case, use the following output template:

{
"action": "SEARCH",
"reasoning": "<add your reasoning here>"
}

If you can answer the QUESTION using CONTEXT, use this template:

{
"action": "ANSWER",
"answer": "<your answer>",
"source": "CONTEXT"
}

If the context doesn't contain the answer, use your own knowledge to answer the question

{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}


In [52]:
# By default Gemini displays the normal answer.
# To get the JSOn such that I can get all the response objects, I have to add config param to the response template above.
# now i can fetch the ACTION and REASONING as JSON response

answer_json = llm(prompt)
print(answer)

{'action': 'SEARCH', 'reasoning': 'The question asks about course enrollment, and without any context, I need to consult the FAQ database to find information on enrollment deadlines and procedures.'}


In [53]:
import json
answer = json.loads(answer_json)
answer

{'action': 'SEARCH',
 'reasoning': "Since the context is empty, I don't have any information to determine if the student can join the course. I need to consult the FAQ or course information to find the answer."}

In [54]:
answer['action']

'SEARCH'

In [55]:
answer['reasoning']

"Since the context is empty, I don't have any information to determine if the student can join the course. I need to consult the FAQ or course information to find the answer."

In [56]:
# At the onset, the CONTEXT is empty. Now the LLM decides to check the FAQ DB
# To add CONTEXT now, we build a context that includes the responses from the search we executed.
# That is, Throw the question at the search engine, fetch the result and use it to build a CONTEXT

def build_context(search_results):
    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    return context.strip()

In [57]:
# search_results below referes to the search fxn defines above to fetch results from our search engine
# context = build a context based on the FAQ records fetched
# Then build a final prompt which updates the prompt_template we defined above with the context gotten

search_results = search(question)
context = build_context(search_results)
prompt = prompt_template.format(question=question, context=context)

In [58]:
# See the updated prompt below which now has the initial prompt + the question + the context from previous step

print(prompt)

You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.
At the beginning the context is EMPTY.

<QUESTION>
Can I still join the course?
</QUESTION>

<CONTEXT> 
section: General course-related questions
question: Course - Can I still join the course after the start date?
answer: Yes, even if you don't register, you're still eligible to submit the homeworks.
Be aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.

section: General course-related questions
question: Certificate - Can I follow the course in a self-paced mode and get a certificate?
answer: No, you can only get a certificate if you finish the course with a “live” cohort. We don't award certificates for the self-paced mode. The reason is you need to peer-review capstone(s) after submitting a project. You can only peer-review projects at the time the cour

In [59]:
answer_json = llm(prompt)
print(answer_json)

{
  "action": "ANSWER",
  "answer": "Yes, even if you don't register, you're still eligible to submit the homeworks.\nBe aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.",
  "source": "CONTEXT"
}


In [69]:
# Making it a function
# First attempt to answer it with our know knowledge
# If needed, do the lookup and then answer

def agentic_rag_v1(question):
    context = "EMPTY"
    prompt = prompt_template.format(question=question, context=context)
    answer_json = llm(prompt)
    answer = json.loads(answer_json)
    print(answer)

    if answer['action'] == 'SEARCH':
        print('need to perform search...')
        search_results = search(question)
        context = build_context(search_results)
        
        prompt = prompt_template.format(question=question, context=context)
        answer_json = llm(prompt)
        answer = json.loads(answer_json)
        #print(answer)

    return answer

In [70]:
# Testing it on two questions
agentic_rag_v1('how do I join the course?') #clearly in the KB, so it should fetch this from CONTEXT

{'action': 'SEARCH', 'reasoning': "The question is about joining the course. Since I don't have any context, I should consult the FAQ database to see if there's information about course enrollment or registration."}
need to perform search...


{'action': 'ANSWER',
 'answer': "Based on the provided context, here are a few ways to join the course:\n\n*   **Register before the course starts:** Use the registration link mentioned in the context.\n*   **Join the Telegram channel:** Subscribe to the course's Telegram channel for announcements.\n*   **Join the Slack channel:** Register in DataTalks.Club's Slack and join the dedicated channel for the course.\n*   **Even if you don't register:** You are still eligible to submit the homeworks.",
 'source': 'CONTEXT'}

In [67]:
# Second question
agentic_rag_v1('how patch KDE under FreeBSD?') # this isnt in the KB, so it should answer it from its OWN_KNOWLEDGE

{'action': 'ANSWER',
 'answer': "To patch KDE under FreeBSD, you'll typically need to download the patch file (usually a `.diff` or `.patch` file), navigate to the KDE source directory, and then apply the patch using the `patch` command. Here's a general outline of the steps:\n\n1.  **Obtain the Patch:** Download the patch file from the source where it's provided (e.g., a bug report, a mailing list, or a software repository).\n\n2.  **Locate the KDE Source Directory:** You need to find where KDE is installed and where its source files are located. If you installed KDE via ports or packages, the source might not be readily available. You may need to download the source separately. If you built KDE from source, you already know the directory.\n\n3.  **Apply the Patch:** Open a terminal, navigate to the root directory of the KDE source tree, and use the `patch` command:\n\n    ```sh\n    patch -p1 < /path/to/your/patchfile.patch\n    ```\n\n    *   `-p1` tells patch to remove the first di

### Agentic Search
This is like agentic RAG but then we can perform multiple iterations. This is unlike the previous one where we did just SEARCH and ANSWER.
This is like performing DEEP RESEARCH in LLMs

Let's build a prompt:
- List available actions:
    - Search in FAQ
    - Answer using own knowledge
    - Answer using information extracted from FAQ
- Provide access to the previous actions
- Have clear stop criteria (no more than X iterations)
- We also specify the output format, so it's easier to parse it

In [57]:
def dedup(seq):
    seen = set()
    result = []
    for el in seq:
        _id = el['_id']
        if _id in seen:
            continue
        seen.add(_id)
        result.append(el)
    return result

In [58]:
prompt_template = """
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.

The CONTEXT is build with the documents from our FAQ database.
SEARCH_QUERIES contains the queries that were used to retrieve the documents
from FAQ to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

You can perform the following actions:

- Search in the FAQ database to get more data for the CONTEXT
- Answer the question using the CONTEXT
- Answer the question using your own knowledge

For the SEARCH action, build search requests based on the CONTEXT and the QUESTION.
Carefully analyze the CONTEXT and generate the requests to deeply explore the topic. 

Don't use search queries used at the previous iterations.

Don't repeat previously performed actions.

Don't perform more than {max_iterations} iterations for a given student question.
The current iteration number: {iteration_number}. If we exceed the allowed number 
of iterations, give the best possible answer with the provided information.

Output templates:

If you want to perform search, use this template:

{{
"action": "SEARCH",
"reasoning": "<add your reasoning here>",
"keywords": ["search query 1", "search query 2", ...]
}}

If you can answer the QUESTION using CONTEXT, use this template:

{{
"action": "ANSWER_CONTEXT",
"answer": "<your answer>",
"source": "CONTEXT"
}}

If the context doesn't contain the answer, use your own knowledge to answer the question

{{
"action": "ANSWER",
"answer": "<your answer>",
"source": "OWN_KNOWLEDGE"
}}

<QUESTION>
{question}
</QUESTION>

<SEARCH_QUERIES>
{search_queries}
</SEARCH_QUERIES>

<CONTEXT> 
{context}
</CONTEXT>

<PREVIOUS_ACTIONS>
{previous_actions}
</PREVIOUS_ACTIONS>
""".strip()

In [59]:
question = 'how do I do well on module 1'
max_iterations = 3
iteration_number = 0
search_queries = []
search_results  = []
previous_actions = []

In [60]:
context = build_context(search_results)

prompt = prompt_template.format(
    question=question,
    context=context,
    search_queries="\n".join(search_queries),
    previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
    max_iterations=max_iterations,
    iteration_number=iteration_number
)

In [61]:
answer_json = llm(prompt)
answer = json.loads(answer_json)

In [62]:
previous_actions.append(answer)

In [64]:
keywords = answer['keywords']

for kw in keywords:
    search_queries.append(kw)
    sr = search(kw)
    search_results.extend(sr)

In [65]:
search_results = dedup(search_results)

In [66]:
iteration_number = 2

context = build_context(search_results)

prompt = prompt_template.format(
    question=question,
    context=context,
    search_queries="\n".join(search_queries),
    previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
    max_iterations=max_iterations,
    iteration_number=iteration_number
)

In [67]:
answer_json = llm(prompt)

In [69]:
question = "what do I need to do to be successful at module 1?"

search_queries = []
search_results = []
previous_actions = []

iteration = 0

while True:
    print(f'ITERATION #{iteration}...')

    context = build_context(search_results)
    prompt = prompt_template.format(
        question=question,
        context=context,
        search_queries="\n".join(search_queries),
        previous_actions='\n'.join([json.dumps(a) for a in previous_actions]),
        max_iterations=3,
        iteration_number=iteration
    )

    print(prompt)

    answer_json = llm(prompt)
    answer = json.loads(answer_json)
    print(json.dumps(answer, indent=2))

    previous_actions.append(answer)

    action = answer['action']
    if action != 'SEARCH':
        break

    keywords = answer['keywords']
    search_queries = list(set(search_queries) | set(keywords))
    
    for k in keywords:
        res = search(k)
        search_results.extend(res)

    search_results = dedup(search_results)
    
    iteration = iteration + 1
    if iteration >= 4:
        break

    print()

ITERATION #0...
You're a course teaching assistant.

You're given a QUESTION from a course student and that you need to answer with your own knowledge and provided CONTEXT.

The CONTEXT is build with the documents from our FAQ database.
SEARCH_QUERIES contains the queries that were used to retrieve the documents
from FAQ to and add them to the context.
PREVIOUS_ACTIONS contains the actions you already performed.

At the beginning the CONTEXT is empty.

You can perform the following actions:

- Search in the FAQ database to get more data for the CONTEXT
- Answer the question using the CONTEXT
- Answer the question using your own knowledge

For the SEARCH action, build search requests based on the CONTEXT and the QUESTION.
Carefully analyze the CONTEXT and generate the requests to deeply explore the topic. 

Don't use search queries used at the previous iterations.

Don't repeat previously performed actions.

Don't perform more than 3 iterations for a given student question.
The current 

In [72]:
answer

{'action': 'ANSWER',
 'answer': "Unfortunately, the information I have doesn't specify what's needed to succeed specifically in Module 1. However, the course provides resources and materials that you can follow at your own pace. Generally, success involves actively engaging with the course materials, participating in exercises, and seeking help when needed. Make sure to check the course calendar and announcements for specific deadlines and requirements.",
 'source': 'OWN_KNOWLEDGE'}

In [73]:
print(answer['answer'])

Unfortunately, the information I have doesn't specify what's needed to succeed specifically in Module 1. However, the course provides resources and materials that you can follow at your own pace. Generally, success involves actively engaging with the course materials, participating in exercises, and seeking help when needed. Make sure to check the course calendar and announcements for specific deadlines and requirements.


### Function calling ("tool use")