# Agentic RAG: Agents with OpenAI API and Function Calls

## Preparation

In [1]:
from openai import OpenAI

openai_client = OpenAI()

In [2]:
import requests 

docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

In [3]:
from minsearch import AppendableIndex

index = AppendableIndex(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)


<minsearch.append.AppendableIndex at 0x142729d30>

## Agentic RAG

In [13]:
# This will be a tool
def search(query):
    boost = { 'question': 3.0, 'section': 0.5 }

    results = index.search(
        query=query,
        filter_dict={ 'course': 'data-engineering-zoomcamp' },
        boost_dict=boost,
        num_results=5
    )

    return results

A traditional rag is like this:

```python
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer
```

### Function calling

First, we define the function that the agent can call:

In [5]:
search_tool = {
    "type": "function",
    "name": "search",
    "description": "Search the FAQ database",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query text to look up in the course FAQ."
            }
        },
        "required": ["query"],
        "additionalProperties": False
    }
}

In [None]:
instructions = """
You're a course teaching assistant. 
You're given a question from a course student and your task is to answer it.
""".strip()

# In our case is one tool but it can be multiple tools
tools = [search_tool]

question = 'I just discovered the course. Can I still join it?'

chat_messages = [
    {"role": "developer", "content": instructions},
    {"role": "user", "content": question}
]

response = openai_client.responses.create(
    model='gpt-4o-mini',
    input=chat_messages,
    tools=tools
)

### Processing Function Calls

Now we can see the output is a ResponseFunctionToolCall:

In [10]:
call = response.output[0]

In [11]:
call

ResponseFunctionToolCall(arguments='{"query":"join course late"}', call_id='call_zF0JNjTuRClQs1Nsg014uwSl', name='search', type='function_call', id='fc_0bf4dea73580b9c90068f07cf18378819ca9fc7be6322a1f18', status='completed')

In [16]:
chat_messages.append(call)

Inside we have:
- The name of the function
- The type, wich is function_call
- The arguments we have to pass to the function

We need to execute this function.

In [17]:
search_results = search(query="join course late")

In [18]:
search_results

[{'text': 'No, late submissions are not allowed. But if the form is still not closed and it’s after the due date, you can still submit the homework. confirm your submission by the date-timestamp on the Course page.y\nOlder news:[source1] [source2]',
  'section': 'General course-related questions',
  'question': 'Homework - Are late submissions of homework allowed?',
  'course': 'data-engineering-zoomcamp'},
 {'text': "Yes, even if you don't register, you're still eligible to submit the homeworks.\nBe aware, however, that there will be deadlines for turning in the final projects. So don't leave everything for the last minute.",
  'section': 'General course-related questions',
  'question': 'Course - Can I still join the course after the start date?',
  'course': 'data-engineering-zoomcamp'},
 {'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first 

In [19]:
import json

search_results_json = json.dumps(search_results)

In [20]:
function_call_output = {
    "type": "function_call_output",
    "call_id": call.call_id,
    "output": search_results_json
}

In [21]:
chat_messages.append(function_call_output)

Our Context:

In [22]:
chat_messages

[{'role': 'developer',
  'content': "You're a course teaching assistant. \nYou're given a question from a course student and your task is to answer it."},
 {'role': 'user',
  'content': 'I just discovered the course. Can I still join it?'},
 ResponseFunctionToolCall(arguments='{"query":"join course late"}', call_id='call_zF0JNjTuRClQs1Nsg014uwSl', name='search', type='function_call', id='fc_0bf4dea73580b9c90068f07cf18378819ca9fc7be6322a1f18', status='completed'),
 {'type': 'function_call_output',
  'call_id': 'call_zF0JNjTuRClQs1Nsg014uwSl',
  'output': '[{"text": "No, late submissions are not allowed. But if the form is still not closed and it\\u2019s after the due date, you can still submit the homework. confirm your submission by the date-timestamp on the Course page.y\\nOlder news:[source1] [source2]", "section": "General course-related questions", "question": "Homework - Are late submissions of homework allowed?", "course": "data-engineering-zoomcamp"}, {"text": "Yes, even if you 

In [23]:
response = openai_client.responses.create(
    model="gpt-4o-mini",
    input=chat_messages,
    tools=tools
)

In [24]:
response.output_text

"Yes, you can still join the course even after the start date. It's important to note that while you can submit homework, there will be deadlines for turning in final projects. Just make sure not to leave everything until the last minute! If you have further questions, feel free to ask."

### Adding Explanations

Giving the LLM the instruction to explain why it wants to call a function

In [25]:
instructions = """
You're a course teaching assistant. 
You're given a question from a course student and your task is to answer it.

If you want to look up the answer, explain why before making the call
""".strip()

In [26]:
tools = [search_tool]

question = 'I just discovered the course. Can I still join it?'

chat_messages = [
    {"role": "developer", "content": instructions},
    {"role": "user", "content": question}
]

response = openai_client.responses.create(
    model='gpt-4o-mini',
    input=chat_messages,
    tools=tools
)

In [27]:
response.output[0].content[0].text

"I'll look up the course FAQ to see if there are any details about late enrollment or joining the course after it has started. This will help provide you with accurate and up-to-date information. Let me check that for you."

In [28]:
response.output

[ResponseOutputMessage(id='msg_02d40262773a7a410068f080eadde481a099d03f133ad0bbf4', content=[ResponseOutputText(annotations=[], text="I'll look up the course FAQ to see if there are any details about late enrollment or joining the course after it has started. This will help provide you with accurate and up-to-date information. Let me check that for you.", type='output_text', logprobs=[])], role='assistant', status='completed', type='message'),
 ResponseFunctionToolCall(arguments='{"query":"join the course late enrollment"}', call_id='call_zOKdbDgJiUVGdI0GZIC0t15e', name='search', type='function_call', id='fc_02d40262773a7a410068f080ebc1d881a0a9a8a7c1a5279b2c', status='completed')]

### Agentic Loop

In [31]:
def make_call(call):
    args = json.loads(call.arguments)
    f_name = call.name

    if f_name == 'search':
        result = search(**args)
    else:
        raise NameError(f'unknown function {f_name}')
    
    return {
        "type": "function_call_output",
        "call_id": call.call_id,
        "output": json.dumps(result),
    }

In [None]:
has_function_calls = False

chat_messages.extend(response.output)

for item in response.output:
    if item.type == 'message':
        print('Assistant:')
        print(item.content[0].text)
        print()
    
    if item.type == 'function_call':
        has_function_calls = True
        print("Function call:")
        print("  ", item.name, item.arguments)
        print()
        function_call_output = make_call(item)
        chat_messages.append(function_call_output)

Assistant:
I'll look up the course FAQ to see if there are any details about late enrollment or joining the course after it has started. This will help provide you with accurate and up-to-date information. Let me check that for you.

Function call:
   search {"query":"join the course late enrollment"}



In [37]:
response = openai_client.responses.create(
    model="gpt-4o-mini",
    input=chat_messages,
    tools=tools
)

In [41]:
for item in response.output:
    if item.type == 'message':
        print('Assistant:')
        print(item.content[0].text)
        print()
    
    if item.type == 'function_call':
        print("Function call:")
        print("  ", item.name, item.arguments)
        print()
        function_call_output = make_call(item)
        chat_messages.append(function_call_output)

Assistant:
Yes, you can still join the course even if you just discovered it after it started. While late submissions for assignments are generally not accepted, you are still eligible to submit the homework as long as the course registration is open. Just be aware that there will be deadlines for final projects, so it’s advisable not to procrastinate.

If you’re interested, make sure to register as soon as you can!



#### Implementation

In [None]:
question = 'I just discovered the course. Can I still join it?'

chat_messages = [
    {"role": "developer", "content": instructions},
    {"role": "user", "content": question}
]

# The "agent" loop
while True:
    response = openai_client.responses.create(
        model='gpt-4o-mini',
        input=chat_messages,
        tools=tools
    )

    has_function_calls = False

    # Add response to chat history for LLM's "memory"
    chat_messages.extend(response.output)

    for entry in response.output:
        if entry.type == "function_call":
            print('Function call:')
            print(entry)
            result = make_call(entry)
            print('   ', 'Output:')
            print('   ', result['output'])
            chat_messages.append(result)
            has_function_calls = True
            print()

        elif entry.type == "message":
            print('Assistant:')
            print(entry.content[0].text)
            print()

    if not has_function_calls:
        break 


Assistant:
To provide the best answer regarding whether a student can still join the course, I'll look up the relevant details in the FAQ database. This will help clarify any deadlines, requirements, or conditions for joining the course at this point in the term. I'll check that information now.

Function call:
ResponseFunctionToolCall(arguments='{"query":"join course late enrollment"}', call_id='call_uf7Defqeog1UE804SxEtHxni', name='search', type='function_call', id='fc_0750bb6c2d7f95360068f084a48c0481908087b297fbe29ea7', status='completed')
    Output:
    [{"text": "No, late submissions are not allowed. But if the form is still not closed and it\u2019s after the due date, you can still submit the homework. confirm your submission by the date-timestamp on the Course page.y\nOlder news:[source1] [source2]", "section": "General course-related questions", "question": "Homework - Are late submissions of homework allowed?", "course": "data-engineering-zoomcamp"}, {"text": "Yes, even if yo