# OpenAI Assistants - Building Agentic RAG with the Function Calling, Retrieval, and Code Interpreter Tools

## here's my twist to the default scenario

- I used a quantized version of Whisper to create a [speech transcript of Andrej Karpathy's recent "Let's reproduce GPT-2" YouTube video](https://huggingface.co/datasets/dwb2023/yt-transcripts-v3/viewer/default/train?q=Andrej+Karpathy)
  - I'm using a Hugging Face dataset for Whisper transcripts - I'll be doing a YouTube video about this in the next couple weeks
- the transcript and some code extracts were used to populate the vector db (used by the file search tool)

### Lessons Leaarned
- 2a. The Assistant with File Search tool is smart like Andre Karpathy thanks to OpenAI Vector Store functionality and the Whisper transcript
- 2b. The Assistant with Code Interpreter tool is analyzing the code to build GPT-2 from scratch
- 2c. Our amazing Function Calling assistant figured out how to output results in JSON format (with some help from friends)

### Lessons not yet learned
- it's okay to complete assignments without "perfect code" (and document it under lessons not yet learned)
- ability for an assistant to use more than one tool.  (really wanted 2b to have both the code and file search tools on their toolbelt)
- Effective evaluation and metrics for OpenAI outside of fine tuning (only area that's supported out of the box with WandB)

## now back to the regularly scheduled programming...

Today we'll explore using OpenAI's Python SDK to create, manage, and use the OpenAI Assistant API!

We'll be doing the following in today's notebook:

1. Task 1: Simple Assistant
2. Task 2: Adding Tools
  - Task 2a: Creating an Assistant with File Search Tool
  - Task 2b: Creating an Assistant with Code Interpreter Tool
  - Task 2c: Creating an Assistant with Function Calling Tool

#### OpenAI Reference:

**Function Calling**
- [OpenAI Python Library - helpers.md](https://github.com/openai/openai-python/blob/main/helpers.md)
- [OpenAI Assistants docs](https://platform.openai.com/docs/assistants/overview?lang=python)

**APIs**
- [OpenAI Python Library - api.md](https://github.com/openai/openai-python/blob/main/api.md)

## Dependencies

We'll start, as we usually do, with some dependiencies and our API key!

In [1]:
!pip install -qU openai

In [2]:
import os

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")

## Task 1: Simple Assistant

Let's create a simple Assistant to understand more about how the API works to start!

### OpenAI Client

At the core of the OpenAI Python SDK is the Client!

> NOTE: For ease of use, we'll start with the synchronous `OpenAI()`. OpenAI does provide an `AsyncOpenAI()` that you could leverage as well!

In [3]:
from openai import OpenAI

client = OpenAI()

### Creating An Assistant

Leveraging what we know about the OpenAI API from previous sessions - we're going to start by simply initializing an Assistant.

Before we begin, we need to think about a few customization options we have:

- `name` - Straight forward enough, this is what our Assistant's name will be
- `instructions` - similar to a system message, but applied at an Assistant level, this is how we can guide the Assistant's tone, behaviour, functionality, and more!
- `model` - this will allow us to choose which model we would prefer to use for our Assistant

Let's start by setting some instructions for our Assistant.



In [4]:
# @markdown #### 🏗️ Build Activity 🏗️
# @markdown Fill out the fields below to add your Assistant's name, instructions, and desired model!

name = "Andre Karpathy's GPT From Scratch Teacher Assistant" # @param {type: "string"}
instructions = "You are Andre Karpathy's AI teacher assistant created to help people learn to code GPT2 from scratch with the same passion and dedication as Andrej himself. Provide clear explanations and helpful examples to the user as they work through the concepts. Be friendly, curious and extremely knowledgeable about ML and generative AI concepts.  You bring an irresistble GenZ flair to the discussion." # @param {type: "string"}
model = "gpt-4o" # @param ["gpt-3.5-turbo", "gpt-4-turbo-preview", "gpt-4", "gpt-4o"]

In [5]:
# @markdown #### 🏗️ Build Activity 🏗️
# @markdown We can also override the Assistant's instructions when we run a thread.

# @markdown Use one of the [Prompt Principles for Instruction](https://arxiv.org/pdf/2312.16171v1.pdf) to improve the likeliehood of a correct or valuable response from your Assistant.

additional_instructions = "You are Andre Karpathy's AI teacher assistant created to help people learn to code GPT-2 from scratch. Provide clear, step-by-step explanations and helpful examples to the user. Be friendly, curious, and extremely knowledgeable about ML and generative AI concepts. Use an engaging and approachable tone to make the learning process enjoyable." # @param {type: "string"}

# for simplicity setting instruction equal to additional_instructions
instructions = additional_instructions

#### Principles from the paper that were used to improve the quality of the instructions:

1. **Prompt Structure and Clarity**: Ensure the instructions are clear, concise, and free of ambiguity. For instance, define specific tasks and desired outcomes clearly.

2. **Specificity and Information**: Provide detailed examples and context to illustrate concepts. This can help users better understand complex ideas.

3. **User Interaction and Engagement**: Create interactive prompts that encourage user engagement. Ask questions that require the user to think and respond actively.

4. **Content and Language Style**: Assign a clear role to the assistant (e.g., "You are Andrej Karpathy's AI teacher assistant") and use direct, engaging language.

5. **Complex Tasks and Coding Prompts**: Break down complex coding tasks into simpler, manageable steps. Use step-by-step instructions to guide users through learning GPT-2 from scratch.

**Reasoning**: This approach ensures the instructions are clear and engaging while providing detailed guidance and fostering user interaction. It aligns with the principles of clarity, specificity, engagement, and structured learning.

### Initialize Assistants

Now that we have our desired name, instruction, and model - we can initialize our Assistants!

There are a number of useful parameters here, but we'll call out a few:

- `id` - since we may have multiple Assistant's, knowing which Assistant we're interacting with will help us ensure the desired user experience!
- `description` - A natrual language description of our Assistant could help others understand what it's supposed to do!
- `file_ids` - if we wanted to use the Retrieval tool, this would let us know what files we had given our Assistant

#### Assistant without tools

In [6]:
assistant = client.beta.assistants.create(
    name=name,
    instructions=instructions,
    model=model,
)

Let's examine our `assistant` object and see what we find!

In [7]:
assistant

Assistant(id='asst_Au06w4YEO2tE75GGBwFYTHke', created_at=1718121766, description=None, instructions="You are Andre Karpathy's AI teacher assistant created to help people learn to code GPT-2 from scratch. Provide clear, step-by-step explanations and helpful examples to the user. Be friendly, curious, and extremely knowledgeable about ML and generative AI concepts. Use an engaging and approachable tone to make the learning process enjoyable.", metadata={}, model='gpt-4o', name="Andre Karpathy's GPT From Scratch Teacher Assistant", object='assistant', tools=[], response_format='auto', temperature=1.0, tool_resources=ToolResources(code_interpreter=None, file_search=None), top_p=1.0)

#### Assistant with File Search tools

In [8]:
fs_assistant = client.beta.assistants.create(
  name=name + " File Search",
  instructions=instructions,
  model=model,
  tools=[{"type": "file_search"}],
)

#### Assistant with Code Interpreter

In [9]:
ci_assistant = client.beta.assistants.create(
  name=name + " + Code Interpreter",
  instructions=instructions,
  model=model,
  tools=[{"type": "code_interpreter"}],
)

### Creating a Thread

Behind the scenes our Assistant is powered by the idea of "threads".

You can think of threads as individual conversations that interact with the Assistant.

Let's create a thread now!

In [10]:
thread = client.beta.threads.create()

Let's look at our `thread` object.

In [11]:
thread

Thread(id='thread_y0JC4fKUZ5rWJhqNRTQunOlU', created_at=1718121767, metadata={}, object='thread', tool_resources=ToolResources(code_interpreter=None, file_search=None))

Notice some key attributes:

- `id` - since each Thread is like a conversation, we need some way to specify which thread we're dealing with when interacting with them
- `tool_resources` - this will become more relevant as we add tools since we'll need a way to verify which tools we have access to when interacting with our Assistant

### Adding Messages to Our Thread

Now that we have our Thread (or conversation) we can start adding messages to it!

Let's add a simple message that asks about how our Assistant is feeling.

Notice the parameters we're leveraging:

- `thread_id` - since each Thread is like a conversation, we need some way to address a specific conversation. We can use `thread.id` to do this.
- `role` - similar to when we used our chat completions endpoint, this parameter specifies who the message is coming from. You can leverage this in the same ways you would through the chat completions endpoint.
- `content` - this is where we can place the actual text our Assistant will interact with

> NOTE: Feel free to substitute a relevant message based on the Assistant you created

In [12]:
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content=f"What is GPT2?  Can I build it with legos?  Explain it to me like I'm 5 years old."
)

Again, let's examine our `message` object!

In [13]:
message

Message(id='msg_t0m1UKG1k3KWHxCzxgLch3Az', assistant_id=None, attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value="What is GPT2?  Can I build it with legos?  Explain it to me like I'm 5 years old."), type='text')], created_at=1718121767, incomplete_at=None, incomplete_details=None, metadata={}, object='thread.message', role='user', run_id=None, status=None, thread_id='thread_y0JC4fKUZ5rWJhqNRTQunOlU')

### Running Our Thread

Now that we have an Assistant, and we've given that Assistant a Thread, and we've added a Message to that Thread - we're ready to run our Assistant!

Notice that this process lets us add (potentially) multiple messages to our Assistant. We can leverage that behaviour for few/many-shot examples, and more!

Let's run our Thread!

In [14]:
run = client.beta.threads.runs.create(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions=additional_instructions
)

Now that we've run our thread, let's look at the object!

In [15]:
run

Run(id='run_sgmDEn2FP3esrVKV1gqPxTRW', assistant_id='asst_Au06w4YEO2tE75GGBwFYTHke', cancelled_at=None, completed_at=None, created_at=1718121767, expires_at=1718122367, failed_at=None, incomplete_details=None, instructions="You are Andre Karpathy's AI teacher assistant created to help people learn to code GPT-2 from scratch. Provide clear, step-by-step explanations and helpful examples to the user. Be friendly, curious, and extremely knowledgeable about ML and generative AI concepts. Use an engaging and approachable tone to make the learning process enjoyable.", last_error=None, max_completion_tokens=None, max_prompt_tokens=None, metadata={}, model='gpt-4o', object='thread.run', parallel_tool_calls=True, required_action=None, response_format='auto', started_at=None, status='queued', thread_id='thread_y0JC4fKUZ5rWJhqNRTQunOlU', tool_choice='auto', tools=[], truncation_strategy=TruncationStrategy(type='auto', last_messages=None), usage=None, temperature=1.0, top_p=1.0, tool_resources={})

Notice we have access to a few very powerful parameters in this `run` object.

- `completed_at` - this will help us determine when we can expect to retrieve a response
- `failed_at` - this can highlight any issues our run ran into
- `status` - is another way we can understand how the flow is going

### Retrieving Our Run

Now that we've created our run, let's retrieve it.

We're going to wrap this in a simple loop to make sure we're not retrieving it too early.

In [16]:
import time

while run.status == "in_progress" or run.status == "queued":
  time.sleep(1)
  run = client.beta.threads.runs.retrieve(
    thread_id=thread.id,
    run_id=run.id
  )

In [17]:
print(run.status)

completed


Now that our run is completed - we can retieve the messages from our thread!

Notice that our run helps us understand how things are going - but it isn't where we're going to find our responses or messages. Those are added on the backend into our thread.

This leads to a simple, but important, flow:

1. We add messages to a thread.
2. We create a run on that thread.
3. We wait until the run is finished.
4. We check our thread for the new messages.

### Checking Our Thread

Now we can get a list of messages from our thread!

In [18]:
messages = client.beta.threads.messages.list(
  thread_id=thread.id
)

In [19]:
messages

SyncCursorPage[Message](data=[Message(id='msg_NWVjLEsuh03FfVvghbP2Ajlr', assistant_id='asst_Au06w4YEO2tE75GGBwFYTHke', attachments=[], completed_at=None, content=[TextContentBlock(text=Text(annotations=[], value="Alright, imagine you have a giant box of Legos and you want to build a fun castle. You can picture each brick in the box as a piece of a much bigger structure. Now, if you follow the instructions step by step, you can build that castle, right?\n\nGPT-2 is a bit like that castle, but instead of using Lego bricks, it uses words and logic to build sentences, paragraphs, and even entire articles! GPT-2 is a type of computer brain called an AI (artificial intelligence) model, which is designed to understand and generate human language. \n\nWhen you're using GPT-2, it's like asking it to make up stories using the words and rules it has learned from reading lots and lots of books and articles. It knows which words fit together well, just like knowing which Lego bricks snap together t

### Streaming Our Runs

With recent upgrades to the Assistant API - we can now *stream* our outputs!

In order to do this - we'll need something called an `EventHandler` which will help us to decide on what actions to take based on the output of the LLM.

Let's build it below!

In [20]:
from typing_extensions import override
from openai import AssistantEventHandler

class EventHandler(AssistantEventHandler):
  @override
  def on_text_created(self, text) -> None:
    print(f"\nassistant > ", end="", flush=True)

  @override
  def on_text_delta(self, delta, snapshot):
    print(delta.value, end="", flush=True)

Now we can create our `run` and stream the output as it comes in!

In [21]:
with client.beta.threads.runs.stream(
  thread_id=thread.id,
  assistant_id=assistant.id,
  instructions=additional_instructions,
  event_handler=EventHandler(),
) as stream:
  stream.until_done()


assistant > Alright, imagine you have a giant box of Legos and you want to build a fun castle. You can picture each brick in the box as a piece of a much bigger structure. Now, if you follow the instructions step by step, you can build that castle, right?

GPT-2 is a bit like that castle, but instead of using Lego bricks, it uses words and logic to build sentences, paragraphs, and even entire articles! GPT-2 is a type of computer brain called an AI (artificial intelligence) model, which is designed to understand and generate human language. 

When you're using GPT-2, it's like asking it to make up stories using the words and rules it has learned from reading lots and lots of books and articles. It knows which words fit together well, just like knowing which Lego bricks snap together to make the cool castle.

However, unlike Legos, you can't build GPT-2 with physical blocks. Instead, we use computer code and lots of data to train it, kind of like teaching it to be really, really good a

## Task 2: Adding Tools

Now that we have an understanding of how Assistant works, we can start thinking about adding tools.

We'll go through 3 separate tools and explore how we can leverage them!

Let's start with the most familiar tool - the Retriever!


### Task 2a: Creating an Assistant with the File Search Tool

The first thing we'll want to do is create an assistant with the File Search tool.

This is also going to require some data. We'll provided data - but you're very much encouraged to use your own files to explore how the Assistant works for your use case.

#### Collect and Add Data to Vector Store

1. First, we need some data.
2. Second, we need to add the data to our Assistant!

Let's start with grabbing some data!

Now we can upload our files to our Vector Store!

Pay attention to [this](https://platform.openai.com/docs/assistants/tools/file-search/supported-files) documentation to see what kinds of files can be uploaded.

> NOTE: Per the OpenAI [docs](https://platform.openai.com/docs/assistants/tools/file-search/vector-stores) The maximum file size is 512 MB and no more than 2,000,000 tokens (computed automatically when you attach a file)

In [22]:
vector_store = client.beta.vector_stores.create(name="Andrej Karpathy GPT2 from Scratch Compilation")

In [23]:
# set up the file paths
file_paths = ["karpathy/karpathy_build_nanogpt_code.md", "karpathy/karpathy_gpt2_from_scratch_audio_transcript.md", "karpathy/karpathy_gpt2_from_scratch_overview.md", "karpathy/karpathy_llmc_code.md"]
file_streams = [open(path, "rb") for path in file_paths]

file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
  vector_store_id=vector_store.id, files=file_streams
)

#### add vector store id to assistants

In [24]:
fs_assistant = client.beta.assistants.update(
  assistant_id=fs_assistant.id,
  tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

In [25]:
ci_assistant = client.beta.assistants.update(
  assistant_id=ci_assistant.id,
  tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},
)

Let's look at what our `file_batch` contains!

In [26]:
while file_batch.status != "completed":
  time(1)

#### Create and Use Assistant

Now that we have our file - we can attach it to an Assistant, and we can give that Assistant the ability to use it for retrieval through the Retrieval tool!

> NOTE: Your first GB is free and beyond that, usage is billed at $0.10/GB/day of vector storage. There are no other costs associated with vector store operations.

In [27]:
fs_thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "What will we learn from building GPT2 from scratch?  Why not just use a textbook?",
      
    }
  ]
)

We can use an extension of the `EventHandler` that we created above to stream our `run`!

Let's add a few things:

1. A `on_tool_call_created` function which tells us which tool is being used.
2. A `on_message_done` that includes citations that were used by our File Search tool - this is like return the context *and* the response that we saw in our Pythonic RAG implementation.

In [28]:
class FSEventHandler(AssistantEventHandler):
  @override
  def on_text_created(self, text) -> None:
    print(f"\nassistant > ", end="", flush=True)

  @override
  def on_tool_call_created(self, tool_call):
    print(f"\nassistant > {tool_call.type}\n", flush=True)

  @override
  def on_message_done(self, message) -> None:
    message_content = message.content[0].text
    annotations = message_content.annotations
    citations = []
    for index, annotation in enumerate(annotations):
      message_content.value = message_content.value.replace(
        annotation.text, f"[{index}]"
      )
      if file_citation := getattr(annotation, "file_citation", None):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f"[{index}] {cited_file.filename}")

    print(message_content.value)
    print("\n".join(citations))

Let's look at the final result!

In [29]:
with client.beta.threads.runs.stream(
  thread_id=fs_thread.id,
  assistant_id=fs_assistant.id,
  event_handler=FSEventHandler(),
) as stream:
  stream.until_done()


assistant > file_search


assistant > When you choose to build GPT-2 from scratch instead of just studying from a textbook, you gain a deep and practical understanding of several key aspects of machine learning and neural networks. Here's an overview of what you'll learn:

1. **Understanding the GPT-2 Architecture:**
    - You'll explore the original GPT-2 code and Hugging Face Transformers implementation.
    - Compare it to the original transformer architecture from the "Attention is All You Need" paper[0].

2. **Implementing the GPT-2 Network:**
    - Build a PyTorch `nn.Module` for GPT-2 from scratch, including implementations of token and positional embeddings, transformer blocks, MLPs, and attention mechanisms[0][2][3].

3. **Optimizing for Speed:**
    - Use techniques such as mixed precision training, Torch Compile, and Flash Attention to drastically improve training speed and throughput[0].

4. **Hyperparameter Tuning and Optimization:**
    - Apply advanced hyperparameters a

### Task 2b: Creating an Assistant with the Code Interpreter Tool

Now that we've explored the Retrieval Tool - let's try the Code Interpreter tool!

The process will be almost exactly the same - but we can explore a different query, and we'll add our file at the Message level!

In [30]:
!git clone https://github.com/karpathy/build-nanogpt.git

fatal: destination path 'build-nanogpt' already exists and is not an empty directory.


In [31]:
file = client.files.create(
  file=open("build-nanogpt/train_gpt2.py", "rb"),
  purpose='assistants'
)

In the following example, we'll also see how we can package the Thread creation with the Message adding step!

> NOTE: Files added at the message/thread level will not be available to the Assistant outside of that Thread.

In [32]:
ci_thread = client.beta.threads.create(
  messages=[
    {
      "role": "user",
      "content": "This file looks pretty amazing.  Can you break it down step by step?  How would we go about training a GPT2 model from scratch ourselves",
      "attachments": [
          {
              "file_id" : file.id,
              "tools" : [{"type" : "code_interpreter"}]
          }
      ]
    }
  ]
)

We'll once again need to create an `EventHandler`, except this time it will have an `on_tool_call_delta` method which will let us see the output of the code interpreter tool as well!

> NOTE: Remember that we create runs at the *thread* level - and so don't need the message object to continue.

In [33]:
class CIEventHandler(AssistantEventHandler):
  @override
  def on_text_created(self, text) -> None:
    print(f"\nassistant > ", end="", flush=True)

  @override
  def on_text_delta(self, delta, snapshot):
    print(delta.value, end="", flush=True)

  def on_tool_call_created(self, tool_call):
    print(f"\nassistant > {tool_call.type}\n", flush=True)

  def on_tool_call_delta(self, delta, snapshot):
    if delta.type == 'code_interpreter':
      if delta.code_interpreter.input:
        print(delta.code_interpreter.input, end="", flush=True)
      if delta.code_interpreter.outputs:
        print(f"\n\noutput >", flush=True)
        for output in delta.code_interpreter.outputs:
          if output.type == "logs":
            print(f"\n{output.logs}", flush=True)

Once again, we can use the streaming interface thanks to creating our `EventHandler`!

In [34]:
with client.beta.threads.runs.stream(
  thread_id=ci_thread.id,
  assistant_id=ci_assistant.id,
  instructions=additional_instructions,
  event_handler=CIEventHandler(),
) as stream:
  stream.until_done()


assistant > Of course! Let's start by taking a look at the contents of the file you have uploaded. I'll first check what type of file it is, and then we'll break down the steps to train a GPT-2 model from scratch.

Let's begin by opening and inspecting the file.
assistant > code_interpreter

# Let's inspect the file to understand its contents. 

file_path = '/mnt/data/file-zMuynmpm10xTpOn98gDrZgmt'

# Try to determine what type of file it is by reading the file and check initial few lines
with open(file_path, 'r', encoding='utf-8', errors='ignore') as file:
    contents = file.readlines()

# Display the first few lines of the file to understand its format and content
contents[:10]  # Displaying the first 10 lines for inspection
assistant > Great, it looks like the file contains Python code related to training a model, possibly using PyTorch. Let's walk through a typical process to train a GPT-2 model from scratch.

### Steps to Train a GPT-2 Model from Scratch

1. **Set up the environ

And there you go!

We've fit our Assistant with an awesome Code Interpreter that lets our Assistant run code on our provided files!

### Task 2c: Creating an Assistant with a Function Calling Tool

Let's finally create an Assistant that utilizes the Function Calling API.

We'll start by creating a function that we wish to be called.

We'll utilize DuckDuckGo search to allow our Assistant to have the most up to date information!

In [35]:
!pip install -qU duckduckgo_search

In [36]:
from duckduckgo_search import DDGS

def duckduckgo_search(query):
  with DDGS() as ddgs:
    results = [r for r in ddgs.text(query, max_results=5)]
    return "\n".join(result["body"] for result in results)

Let's test our function to make sure it behaves as we expect it to.

In [37]:
duckduckgo_search("Who is Andrej Karpathy?")

'Andrej Karpathy (born 23 October 1986) is a Slovak-Canadian computer scientist who served as the director of artificial intelligence and Autopilot Vision at Tesla. He co-founded and formerly worked at OpenAI, where he specialized in deep learning and computer vision. ...\nAndrej Karpathy, Tesla\'s director of artificial intelligence, announced Wednesday he\'s leaving the company only months before its anticipated release of its long-delayed "full self-driving ...\nAndrej Karpathy. 2024 -. coming soon 🧑\u200d🍳. 2023 - 2024. Back to OpenAI. Built a small team, launched a model to ChatGPT, great pleasure to build with the top notch talent within. 2017 - 2022. I was the Sr. Director of AI at Tesla, where I led the computer vision team of Tesla Autopilot. This includes in-house data labeling, neural ...\nAndrej Karpathy is a computer scientist who has a passion for training deep neural nets on large datasets. He is best known for his principal roles at OpenAI and Tesla and also designed an

Now we need to express how our function works in a way that is compatible with the OpenAI Function Calling API.

We'll want to provide a `JSON` object that includes what parameters we have, how to call them, and a short natural language description.

In [38]:
ddg_function = {
    "name" : "duckduckgo_search",
    "description" : "Answer non-technical questions. ",
    "parameters" : {
        "type" : "object",
        "properties" : {
            "query" : {
                "type:" : "string",
                "description" : "The search query to use. For example: 'Who is the current Goalie of the Colorado Avalance?'"
            }
        },
        "required" : ["query"]
    }
}

#### ❓ Question

Why does the description key-value pair matter? (with the OpenAI function calling api)

##### ANSWER

(it helps out developers as well, but the following reasons seemd more relevant to the question...)

Yes, the `description` key-value pair plays a crucial role in the OpenAI function calling API by helping the language model determine which tool is most appropriate for a given task. This functionality enhances the model's ability to interact effectively with external tools and APIs. Here are the key points:

1. **Contextual Understanding and Disambiguation:**
   - The `description` provides the model with a clear understanding of what each function does, enabling it to distinguish between multiple functions that may have similar names or purposes. This helps the model make more accurate decisions when selecting the appropriate function to call.

2. **Intent Matching:**
   - By using the `description`, the model can better align the user's intent with the correct function. This alignment ensures that the model chooses a function that best fits the user's request, improving the relevance and accuracy of the function call.

3. **Enhanced Performance:**
   - Descriptions help the model generate structured JSON objects that contain the necessary arguments for calling the functions. This structured approach allows developers to create more reliable and sophisticated applications by ensuring the model's outputs are consistent with the function signatures.

4. **Use Cases:**
   - Function calling can convert natural language queries into API calls, database queries, or structured data extraction tasks. For example, the model can convert a query like "What's the weather like in Boston?" into a function call like `get_current_weather(location: string, unit: 'celsius' | 'fahrenheit')`.

These capabilities are part of the updates to the GPT-4 and GPT-3.5-turbo models, which include improved function calling features that make the integration with external tools more efficient and reliable.

Reference:

- [Function calling](https://platform.openai.com/docs/guides/function-calling)
- [How to call functions with chat models](https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models)
- [Create chat completion - tool choice](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice)
- [examples/Fine_tuning_for_function_calling.ipynb](https://github.com/openai/openai-cookbook/blob/main/examples/Fine_tuning_for_function_calling.ipynb)
- [GitHub search on OpenAI Function Calling](https://github.com/search?q=repo%3Aopenai%2Fopenai-cookbook%20Function%20Calling&type=code)

#### Assistant with Function Calling tool

In [39]:
# confirm ability to set json mode during assistant creation

fc_assistant = client.beta.assistants.create(
    name=name + " + Function Calling",
    instructions=instructions,
    tools=[
        {"type": "function",
         "function" : ddg_function
        }
    ],
    model=model
)

Now when we create our Assistant - we'll want to include the function description as a tool using the following format.

Now we can create our thread, and attach our message to it - just as we've been doing!

In [40]:
fc_thread = client.beta.threads.create()
fc_message = client.beta.threads.messages.create(
  thread_id=fc_thread.id,
  role="user",
  content="Can you describe the Twitter beef between Elon and LeCun? Use JSON as the response format.",
)

Once again, we'll need an `EventHandler` for the streaming response - notice that this time we're utilizing a few new methods:

1. `handle_requires_action` - this will handle whatever action needs to take place in *our local environment*.
2. `submit_tool_outputs` - this will let us submit the resultant outputs from our local function call back to the Assistant run in the required format.

To be very clear and explicit - we'll be following this pattern:

1. Make a call to the LLM which will decide if a local function call is required.
2. If a local function call is required - send a response that indicates a local function call is required.
3. Call the function using the arguments provided by the LLM.
4. Return the response from the function call with LLM provided arguments.
5. Receive a response based on the output of the functional call from the LLM.

In [41]:
class FCEventHandler(AssistantEventHandler):
  @override
  def on_event(self, event):
    # Retrieve events that are denoted with 'requires_action'
    # since these will have our tool_calls
    if event.event == 'thread.run.requires_action':
      run_id = event.data.id  # Retrieve the run ID from the event data
      self.handle_requires_action(event.data, run_id)

  def handle_requires_action(self, data, run_id):
    tool_outputs = []

    for tool in data.required_action.submit_tool_outputs.tool_calls:
      print(tool.function.arguments)
      if tool.function.name == "duckduckgo_search":
        tool_outputs.append({"tool_call_id": tool.id, "output": duckduckgo_search(tool.function.arguments)})

    # Submit all tool_outputs at the same time
    self.submit_tool_outputs(tool_outputs, run_id)

  def submit_tool_outputs(self, tool_outputs, run_id):
    # Use the submit_tool_outputs_stream helper
    with client.beta.threads.runs.submit_tool_outputs_stream(
      thread_id=self.current_run.thread_id,
      run_id=self.current_run.id,
      tool_outputs=tool_outputs,
      event_handler=FCEventHandler(),
    ) as stream:
      for text in stream.text_deltas:
        print(text, end="", flush=True)
      print()

Thanks to our event handler - we can stream this process in our notebook!

In [42]:
with client.beta.threads.runs.stream(
  thread_id=fc_thread.id,
  assistant_id=fc_assistant.id,
  event_handler=FCEventHandler()
) as stream:
  stream.until_done()

{"query":"Twitter beef between Elon Musk and Yann LeCun"}
```json
{
  "summary": "There have been various exchanges between Elon Musk and Yann LeCun on Twitter. Yann LeCun, Chief AI Scientist at Meta (formerly Facebook), and Elon Musk, CEO of SpaceX and Tesla, have different views on the future and safety of artificial intelligence. Musk has expressed concerns over AI safety and the potential existential threat it poses. In contrast, LeCun has been more optimistic and has criticized Musk's doomsday stance on AI. Their exchanges often involve debating AI's future impacts, safety protocols, and differing perspectives on the responsible development of technology.",
  "details": [
    {
      "date": "January 2020",
      "content": "Yann LeCun criticized Elon Musk for his alarmist views on AI safety, claiming that Musk's perspective spreads unnecessary fear. Musk responded by reiterating his concerns about AI safety."
    },
    {
      "date": "July 2021",
      "content": "Musk tweeted 

# How do we diagram this flow?

- and how do we
  - effectively monitor it?
  - implement resiliency?
- what Event Handler solutions exist in the market?
  - where do solutions such as LangChain / LangGraph and LllamaIndex fit in?
  - what about existing Hyperscaler solutions?

- need to fix this diagram so it reflects the basics events, maybe change it to a state diagram :)

Reiterating some of the key events:
1. A `on_tool_call_created` function which tells us which tool is being used.
2. A `on_message_done` that includes citations that were used by our File Search tool - this is like return the context *and* the response that we saw in our Pythonic RAG implementation.
3. on_tool_call_delta
4. `handle_requires_action` - this will handle whatever action needs to take place in *our local environment*.
5. `submit_tool_outputs` - this will let us submit the resultant outputs from our local function call back to the Assistant run in the required format.


```mermaid
sequenceDiagram
    participant User
    participant ClientApp
    participant EventHandler
    participant OpenAI
    participant FileSearch
    participant CodeInterpreter
    participant FunctionCalling

    User->>ClientApp: Enter query
    ClientApp->>EventHandler: Trigger Event (tool_call)
    EventHandler->>OpenAI: Send tool call type and arguments

    activate OpenAI
    OpenAI->>OpenAI: Parse and analyze query
    alt File Search
        OpenAI->>FileSearch: Direct file search
        activate FileSearch
        FileSearch-->>OpenAI: Return relevant information
        deactivate FileSearch
    else Code Interpreter
        OpenAI->>CodeInterpreter: Direct code execution
        activate CodeInterpreter
        CodeInterpreter-->>OpenAI: Return code output
        deactivate CodeInterpreter
    else Function Calling
        OpenAI->>FunctionCalling: Identify function and arguments
        activate FunctionCalling
        FunctionCalling-->>OpenAI: Return function call requirements
        deactivate FunctionCalling
    end
    OpenAI-->>EventHandler: Return adjusted parameters or tool requirements
    deactivate OpenAI

    alt File Search
        EventHandler->>ClientApp: Display relevant information
    else Code Interpreter
        EventHandler->>ClientApp: Display code output or handle generated files
    else Function Calling
        EventHandler->>FunctionCalling: Execute function with arguments
        activate FunctionCalling
        FunctionCalling-->>EventHandler: Return function output
        deactivate FunctionCalling
        EventHandler->>OpenAI: Submit function output
        activate OpenAI
        OpenAI-->>EventHandler: Return response based on function output
        deactivate OpenAI
        EventHandler->>ClientApp: Display response
    end

    ClientApp->>User: Present results or response


```