# Code Interpreter

Traditional LLMs are good at generating text, but they struggle with tasks that require maths or calculations.
<br><br>
**Example:** How many "r"s present in the string "strawberry"?<br>
**Answer from LLM:** "strawberry" has 2 "r"s.
<br><br>
Yikes!
<br><br>
Discussions regarding LLMs can’t count:
- https://community.openai.com/t/should-a-custom-gpt-be-able-to-count-the-number-of-items-in-a-json-list/575999https://community.openai.com/t/assistant-can-not-search-the-whole-file-using-file-search/739661/3
- https://www.reddit.com/r/OpenAI/comments/15xfcuk/how_do_i_pass_complex_and_nested_large_json_data

<br>
To solve this problem, OpenAI has introduced a feature called "Code Interpreter"

- Code Interpreter allows `Assistants` to write and run Python code in a sandboxed execution environment.
- If the generated code fails to execute, the Assistant will iteratively debug and refine the code until the code executes successfully.
With Code Interpreter enabled, your Assistant can now solve code, math, and data analysis problems.

### Steps:
1. Upload a file (CSV, JSON, etc.) to Azure Server.
1. Create an `assistant` using `assistant API` and provide it access to the file
1. Create a `thread` for the `assistant` with the purpose of analyzing the file and providing results based on the given instructions.
1. The `assistant` will generate and run a Python code to analyze the file.
    - The analysis results will be dumped to a file
    - Once the thread execution is completed, the Assistant will return the results.
1. Print the results
1. Delete the uploaded file from the Azure Server.

### References
- https://platform.openai.com/docs/assistants/tools/code-interpreter
- https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/code-interpreter?tabs=python
- https://platform.openai.com/docs/assistants/quickstart?example=without-streaming
***

## Prerequisites

1. Make sure that `python3` is installed on your system.
1. Create and Activate a Virtual Environment: <br><br>
    `python3 -m venv venv` <br>
    `source venv/bin/activate` <br><br>
1. Create a `.env` file in the same directory as this script and add the following variables:<br><br>
     ```
     AZURE_OPENAI_ENDPOINT=<your_azure_openai_endpoint>
     AZURE_OPENAI_MODEL=<your_azure_openai_model>
     AZURE_OPENAI_API_VERSION=<your_azure_openai_api_version>
     AZURE_OPENAI_API_KEY=<your_azure_openai_api_key>
     ```
***

## Install Dependencies

The required libraries are listed in the requirements.txt file. Use the following command to install them:

In [11]:
! pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


***
## Import Modules

In [12]:
from openai import AzureOpenAI  # The `AzureOpenAI` library is used to interact with the Azure OpenAI API.
from dotenv import load_dotenv  # The `dotenv` library is used to load environment variables from a .env file.
import os                       # Used to get the values from environment variables.
import json                     # The `json` library is used to work with JSON data in Python.
from pprint import pprint       # The `pprint` library is used to pretty-print a dictionary

## Load environment variables from .env file

The `load_dotenv()` function reads the .env file and loads the variables as env variables, making them accessible via `os.environ` or `os.getenv()`.

In [13]:
load_dotenv()

AZURE_OPENAI_ENDPOINT        = os.environ['AZURE_OPENAI_ENDPOINT']
AZURE_OPENAI_MODEL           = os.environ['AZURE_OPENAI_MODEL']
AZURE_OPENAI_API_VERSION     = os.environ['AZURE_OPENAI_VERSION']
AZURE_OPENAI_API_KEY         = os.environ['AZURE_OPENAI_API_KEY']

## Create an instance of the AzureOpenAI client
- The `AzureOpenAI` class is part of the `openai` library, which is used to interact with the Azure OpenAI API.
- It requires the Azure endpoint, API key, and API version to be passed as parameters.

In [14]:
client = AzureOpenAI(
    azure_endpoint = AZURE_OPENAI_ENDPOINT,
    api_key = AZURE_OPENAI_API_KEY,  
    api_version = AZURE_OPENAI_API_VERSION
)

## Step 1: Upload your file to Azure Server with an "assistants" purpose

What is a `purpose`?<br>
When you upload a file to Azure OpenAI, you need to specify the purpose of the file.
<br>
The following purposes are supported:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/code-interpreter?tabs=python#supported-file-types
<br><br>
What file formats are supported for upload?<br>
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/code-interpreter?tabs=python#supported-file-types

In [15]:
file = client.files.create(
    file=open("dummy_build_data.json", "rb"), #multipart file upload requires the file to be in binary not in text
    purpose='assistants' 
)
# Use file.id to refer to the file
print(f"Uploaded file, file ID: {file.id}")

Uploaded file, file ID: assistant-THMsYwk8Tfbsj8mgr8uLcv


Note: You cannot view the content of a file uploaded to the Azure OpenAI server if the purpose is defined as `assistants`
<br><br>
The following code will not work:
```python
uploaded_file_content = client.files.content(file.id)
```
The command will throw the following error:
```python
openai.error.InvalidRequestError: The file content is not available for the purpose of "assistants".
```

## Step 2: Create an "assistant" using assistant API 
Instruct that `code_interpreter` is enabled and provide this assistant access to the file

In [None]:
assistant = client.beta.assistants.create(
    model=AZURE_OPENAI_MODEL,
    name="build-analyzer-agent", # name of the agent (optional)    
    instructions="You are an AI assistant that can read and analyze JSON files. "
            "The JSON file contains Jenkins build information under the key `results`. "
            "Each entry in the `results` array contains information about a build. "
            "Build status of a build can be found by checking the `build_status` key. "
            "Build duration (time build took to complete) can be found by checking the `build_duration` key. "
            "Queue time (time build spent in queue) can be found by checking the `queue_time` key. "
            "Build label can be found by checking the `build_label` key. When somebody ask about a build, make sure to provide the build label. ",
    tools=[{"type": "code_interpreter"}], # mentions that the assistant can use the code interpreter tool
    tool_resources={"code_interpreter":{"file_ids":[file.id]}} # mentions that the assistant can use the file we just uploaded
)
print(f"A new assistant {assistant.id} created:\n")
print(assistant)

A new assistant asst_ILUKBXBTwrrs0TYLNv15mHJF created:

Assistant(id='asst_ILUKBXBTwrrs0TYLNv15mHJF', created_at=1747457232, description=None, instructions='You are an AI assistant that can read and analyze JSON files. The JSON file contains Jenkins build information under the key `results`. Each entry in the `results` array contains information about a build. Build status of a build can be found by checking the `build_status` key. Build duration (time build took to complete) can be found by checking the `build_duration` key. Queue time (time build spent in queue) can be found by checking the `queue_time` key. Build label can be found by checking the `build_label` key. When somebody ask about a build, make sure to provide the build label. ', metadata={}, model='gpt-4', name='build-analyzer-agent', object='assistant', tools=[CodeInterpreterTool(type='code_interpreter')], response_format='auto', temperature=1.0, tool_resources=ToolResources(code_interpreter=ToolResourcesCodeInterpreter(f

## Step 3: Create a thread for the assistant

In [17]:
thread = client.beta.threads.create(
    messages=[
        {
            "role": "user",
            "content": "Provide Total builds and list all build statuses along their counts and percentages. "
                        "Also provide the fastest and the slowest build along with their build duration. "
                        "Also provide the build labels with the longest and shortest queue time. Provide durations too. "
                        "Also provide the average build and queue duration. "
        }
    ]
)
print(f"A new thread: {thread.id} created:\n")
print(thread)

A new thread: thread_OJG6UWDKipgOy9WQNwJm07gD created:

Thread(id='thread_OJG6UWDKipgOy9WQNwJm07gD', created_at=1747457233, metadata={}, object='thread', tool_resources=ToolResources(code_interpreter=None, file_search=None))


## Step 4: Run the thread with the assistant

Ones a thread is created, you can "run" it with any assistant (in real projects, you may have multiple assistants created for different purposes).
<br><br>
Also note that thread runs are asynchronous, which means you'll need to monitor their status by polling the Run object until a termination status is reached. 
<br><br> 
If however, you are not bothered about streaming, then use the convenience helper method `create_and_poll` that can assist both in creating the run and then polling for its completion.

In [18]:
print(f"Running thread: {thread.id} with assistant: {assistant.id}...\n")
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id
)


Running thread: thread_OJG6UWDKipgOy9WQNwJm07gD with assistant: asst_ILUKBXBTwrrs0TYLNv15mHJF...



The above code is equivalent to `Thread.run()` in `Java`

## Step 5: Capture result

In [19]:
if run.status == 'completed': 
    # https://platform.openai.com/docs/api-reference/messages/listMessages
    messages = client.beta.threads.messages.list(thread_id=thread.id, order='asc')
    print("\n---------------------\n")    
    print(f"Thread run completed and returned the following JSON:.\n")
    print(messages.model_dump_json(indent=4))
    print("\n---------------------\n")

    for message in messages:
        text=client.beta.threads.messages.retrieve(message_id=message.id, thread_id=thread.id)
        print("\n------ Message --------\n")
        print(text.content[0].text.value)
        print("\n---------------------\n")


---------------------

Thread run completed and returned the following JSON:.

{
    "data": [
        {
            "id": "msg_4xT0bRGNV5terVf9NDIdsllf",
            "assistant_id": null,
            "attachments": [],
            "completed_at": null,
            "content": [
                {
                    "text": {
                        "annotations": [],
                        "value": "Provide Total builds and list all build statuses along their counts and percentages. Also provide the fastest and the slowest build along with their build duration. Also provide the build labels with the longest and shortest queue time. Provide durations too. Also provide the average build and queue duration. "
                    },
                    "type": "text"
                }
            ],
            "created_at": 1747457233,
            "incomplete_at": null,
            "incomplete_details": null,
            "metadata": {},
            "object": "thread.message",
          

## Step 6: Cleanup - delete the original file from the server to free up space

In [20]:
client.files.delete(file.id)
print(f"Deleted file, file ID: {file.id}")

Deleted file, file ID: assistant-THMsYwk8Tfbsj8mgr8uLcv
