# Accessing multimodal capapabilities in GPT-4o

This notebook provides a basic demonstration of how to use the GPT-4o model through the OpenAI API to interpret both text and image prompts.

Relevant links:
- [Introduction to GPT-4o cookbook from OpenAI](https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o)
- [API reference for chat completions](https://platform.openai.com/docs/api-reference/chat/create)
- [Documentation for function calling](https://platform.openai.com/docs/guides/function-calling)
- [JSON Schema documentation](https://json-schema.org/understanding-json-schema/reference/non_json_data#light-scheme-icon)

In [2]:
# Import necessary libraries
import os
import pandas as pd
from openai import OpenAI
from dotenv import load_dotenv
import base64
import json
from datetime import date

# Load the .env file
load_dotenv()

# Set up an OpenAI object using the OpenAI API key
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

## Create a standard chat completion using GPT-4o.

In [3]:
completion = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "system", 
      "content": "You are a helpful assistant."
    },
    {
      "role": "user", 
      "content": "Write a haiku about a duck."
    } 
  ]
)

print(completion.choices[0].message.content)

NotFoundError: Error code: 404 - {'error': {'message': 'The model `gpt-4o` does not exist or you do not have access to it.', 'type': 'invalid_request_error', 'param': None, 'code': 'model_not_found'}}

### Encode images
To pass an image to the model, first turn it into a base64-encoded string.

In [None]:
# Image path
IMAGE_PATH = "data/receipt-01.png"

# Encode the image file as a base64 string
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode("utf-8")

base64_image = encode_image(IMAGE_PATH)

### Prompt GPT-4o using an image
Next, pass the image in the messages object by setting the `type` to `image_url`.

In [None]:

# Pass the image to GPT-4o anlong with a prompt.
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "system",
      "content": "Answer the quesiton based on the provided image."
    },
    {
      "role": "user",
      "content": [
        {
            "type": "text",
            "text": "What store is this receipt from?"
        },
        {
          "type": "image_url", 
          "image_url": {
            "url": f"data:image/png;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  temperature=0.0,
)

# Print the response
print(response.choices[0].message.content)

### Get structured data from GPT-4o
To get the output as JSON, specify JSON output in the system message and set the `response_format` parameter to `{"type": "json_object"}`.

In [None]:
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "system",
      "content": "If the image is a receipt, output store, purchase date, items, taxes, and total as JSON. If it's not a receipt, ask for a receipt."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/png;base64,{base64_image}"
          }
        }
      ]
    }
  ],
  temperature=0.0,
  response_format={ "type": "json_object" }
)

print(response.choices[0].message.content)

### Set up a function call
To further control the JSON output, use function calling. See the [API reference](https://platform.openai.com/docs/api-reference/chat/create#chat-create-tools) for more on function calling, and [JSON schema reference](https://json-schema.org/understanding-json-schema/reference) for info on how to format the function call schema.

Below the function call schema is broken out into a variable to make the function easier to read.

In [None]:
function_call = [
  {
    "type": "function",
    "function": {
      "name": "itemize_receipt",
      "description": "Itemize a receipt from an image",
      "parameters": {
        "type": "object",
        "properties": {
          "vendor": {
            "type": "string",
            "description": "Name of vendor",
          },
          "date": {
            "type": "string",
            "format": "date",
            "description": "Date of purchase",
          },
          "items": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                  "name": {
                    "type": "string",
                    "description": "Name of item",
                  },
                  "price": {
                    "type": "number",
                    "description": "Price of item",
                  },
                  "quantity": {
                    "type": "integer",
                    "description": "Quantity of item",
                  },
                  "category": {
                    "type": "string",
                    "description": "Category of item",
                    "enum": ["take-out", "meal", "groceries", "clothing", "electronics", "supplies", "other"],
                  },
              },
            },
            "description": "List of items purchased",
          },
          "payment_method": {
            "type": "string",
            "description": "Payment method",
            "enum": ["cash", "credit", "debit", "mobile", "other"],
          },
        },
        "required": ["vendor","date","items","payment_method"],
      },
    }
  }
]

### Multimodal prompting with function calling
Combine the multimodal image prompt with a function call to capture relevant data from receipts.

Note: The system message is set up to capture any images that are not of receipts and return a regular completion instead of the function call.

In [None]:
# Use IPython.display.JSON for easier to read JSON output.
from IPython.display import JSON

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "system", "content": "If the image is a receipt, process the data. If it's not a receipt, ask for a receipt."},
    {"role": "user", "content": [
      {"type": "image_url", "image_url": {
        "url": f"data:image/png;base64,{base64_image}"}
      }
    ]},
  ],
  tools=function_call, # <-- Add the function_call schema from above
  tool_choice="auto",
  temperature=0.0,
)

print(response)
# Parse the JSON data from the response
receipt_data = json.loads(response.choices[0].message.tool_calls[0].function.arguments)

# Display the JSON data
JSON(receipt_data, expanded=True)



### Create a Dataframe from a CSV file

In [None]:
expenses_df = pd.read_csv("expenses.csv")
expenses_df

### Add new rows to the dataframe
Iterate through `receipt_data`, create a new row for each item, and add the data to the `expenses_df` DataFrame.

In [None]:
new_rows = []
for item in receipt_data['items']:

  print(f"Adding item: {item['name']}")
  new_row = {
    "Date": receipt_data.get("date", date.today().isoformat()),
    "Vendor": receipt_data.get("vendor", ""),
    "Name": item.get("name", ""),
    "Quantity": item.get("quantity", 1),
    "Price": item.get("price", 0),
    "Category": item.get("category", "Uncategorized"),
    "Payment method": receipt_data.get("payment_method", "Unknown"),
  }
  new_rows.append(new_row)

# Convert the list of new rows to a DataFrame
new_rows_df = pd.DataFrame(new_rows)

# Concatenate the new rows DataFrame to the existing expenses DataFrame
if expenses_df.empty:
  expenses_df = new_rows_df
else:
  expenses_df = pd.concat([expenses_df, new_rows_df], ignore_index=True)

expenses_df

### Write new rows to CSV
Save the new data in the existing CSV by overwriting it with the `expenses_df` data.

In [None]:
expenses_df.to_csv('expenses.csv', index=False)