# Introduction

In this script, we automate the process of converting menu images into a structured Excel spreadsheet using the OpenAI GPT model. The script reads images from a specified directory, processes each image to extract menu data, and compiles the extracted information into an Excel file following a predefined template.



# Setup

First, we mount Google Drive to access files stored in it and set up the working directory where our menu images are located.

In [None]:
# Mount Google Drive to access files
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Set the directory containing the menu images
directory = '/content/drive/MyDrive/GenAI/OpenAI/OpenAI Project'

In [None]:
# Install the OpenAI library quietly (without verbose output)
!pip install openai --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/386.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m378.9/386.9 kB[0m [31m14.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.9/386.9 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.0/78.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.2/325.2 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# Retrieve the OpenAI API key from Colab's user data
from google.colab import userdata
openai_api_key = userdata.get('genai_course')

By mounting Google Drive, we can read and write files directly from our Colab notebook. The `directory` variable points to the folder where our menu images are stored. We also install the OpenAI library and retrieve our API key, which is necessary for authenticating requests to the OpenAI API.



Next, we import all the libraries required for image processing, data handling, and interacting with the OpenAI API.



In [None]:
# Load the libraries
from openai import OpenAI
import os
import base64
from IPython.display import Image, display, Markdown
import pandas as pd

In [None]:
# Set up the OpenAI client and specify the model to use
MODEL = "gpt-4o"
client = OpenAI(api_key=openai_api_key)

Here, we import the necessary modules to handle file operations, encode images, display outputs, and manage data structures. We initialize the OpenAI client with our API key and specify the model we'll be using for text generation.



# Defining the System Prompt

We define a detailed system prompt that instructs the GPT model on how to convert the menu images into a structured Excel format.



In [None]:
# Define the system prompt with detailed instructions
system_prompt = """
Convert the menu image to a structured excel sheet format following the provided template and instructions.
This assistant converts restaurant or cafe menu data into a structured Excel sheet that adheres to a specific template.
The template includes categories, subcategories, item names, prices, descriptions, and more, ensuring data consistency.
This assistant helps users fill out each row correctly, following the detailed instructions provided.

Overview:
- Each row in the Excel spreadsheet represents a unique item, categorized under a category or subcategory.
- Category and subcategory names are repeated for items within the same subcategory.
- Certain columns are left blank when not applicable, such as subcategory details for items directly under a category.
- Item details, including names, prices, and descriptions, must be unique for each entry.
- Uploaded menu content will be appended to the existing menu without deleting any current entries.

Columns Guide:

Column Name                    | Description                               | Accepted Values           | Example
-------------------------------|-------------------------------------------|---------------------------|-----------------------
CategoryTitlePt (Column A)      | Category names in Portuguese              | Text, 256 characters max  | Bebidas
CategoryTitleEn (Column B) (Optional) | English translations of category titles | Text, 256 characters max  | Beverages
SubcategoryTitlePt (Column C) (Optional) | Subcategory titles in Portuguese | Text, 256 characters max or blank | Sucos
SubcategoryTitleEn (Column D) (Optional) | English translations of subcategory titles | Text, 256 characters max or blank | Juices
ItemNamePt (Column E)           | Item names in Portuguese                  | Text, 256 characters max  | Água Mineral
ItemNameEn (Column F) (Optional) | English translations of item names | Text, 256 characters max or blank | Mineral Water
ItemPrice (Column G)          | Price of each item without currency symbol  | Text                      | 2.50 or 2,50
Calories (Column H) (Optional) | Caloric content of each item              | Numeric                   | 150
PortionSize (Column I)        | Portion size for each item in units        | Text                      | 500ml, 1, 2-3
Availability (Column J) (Optional) | Current availability of the item     | Numeric: 1 for Yes, 0 for No | 1
ItemDescriptionPt (Column K) (Optional) | Detailed description in Portuguese | Text, 500 characters max  | Contains essential minerals
ItemDescriptionEn (Column L) (Optional) | Detailed description in English | Text, 500 characters max  | Contains essential minerals

Notes:
- Ensure all data entered follows the specified formats to maintain database integrity.
- Review the data for accuracy and consistency before submitting the Excel sheet.
"""


This prompt provides the model with comprehensive instructions on how to process the menu images and the exact format expected for the Excel output. It includes an overview, column descriptions, and examples to ensure consistency and accuracy in the data extraction process.



We change the current working directory to the specified directory containing the menu images.

This step ensures that all file operations are performed in the correct directory, allowing the script to access the menu images and save the Excel file in the desired location.





In [None]:
# Change the current working directory to the image directory
os.chdir(directory)
IMAGE_DIR = directory

def encode_image(image_path):
    # Open the image file in binary mode and encode it in Base64
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Process imaged in the directory
image_files = sorted([f for f in os.listdir(IMAGE_DIR) if f.lower().endswith(('.png', '.jpg', '.jpeg'))])
image_files

['DimSum Amoreiras 1.PNG',
 'DimSum Amoreiras 2.PNG',
 'DimSum Amoreiras 3.PNG',
 'DimSum Amoreiras 4.PNG',
 'DimSum Amoreiras 5.PNG']

Encoding images in Base64 allows us to include image data directly in our API requests without relying on external URLs.

This code scans the directory for files ending with `.png`, `.jpg`, or `.jpeg`, ensuring we only process image files relevant to our task.

We prompt the user to input a name for the new Excel file where the extracted data will be saved.

We loop through each image file, encode it, send it to the OpenAI API for processing, and parse the response to populate our DataFrame.

In [None]:
# Prompt the user for the excel file name
new_excel_file_name = input("Enter the new Excel file name (without extension): ")
EXCEL_PATH = os.path.join(directory, f"{new_excel_file_name}.xlsx")

# Create the PANDAS dataframe
df = pd.DataFrame(columns=['CategoryTitlePt', 'CategoryTitleEn', 'SubcategoryTitlePt', 'SubcategoryTitleEn',
                           'ItemNamePt', 'ItemNameEn', 'ItemPrice', 'Calories', 'PortionSize', 'Availability',
                           'ItemDescriptionPt', 'ItemDescriptionEn'])

for image in image_files:
  # Retrieve and encode the image
  image_path = os.path.join(IMAGE_DIR, image)
  image_data = encode_image(image_path)

  # Adding a flag for the headers
  headers_added = False

  # Use GPT-4o to analyze and convert the imae
  response = client.chat.completions.create(
      model=MODEL,
      messages=[
          {"role": "system", "content": system_prompt},
          {"role": "user", "content": [
              {'type': 'text',
              'text': "Convert this menu image to a structured Excel Sheet Format."},
              {'type': 'image_url',
              'image_url': {'url': f'data:image/png;base64,{image_data}'}}
          ]}],
      temperature = 0
  )

  for row in response.choices[0].message.content.split('\n'):
    if row.startswith('|') and not row.startswith('|-'): # Ensure that the data is a row and not a header format
      columns = [col.strip() for col in row.split('|')[1:-1]]
      if len(columns) == len(df.columns):
        if 'CategoryTitlePt' in columns:
          headers_added = True
          continue
        if headers_added and 'CategoryTitlePt' in columns:
          continue # skip the row
        new_row = pd.Series(columns, index=df.columns)
        df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)
      else:
        print(f"Skipping row { row}")

df.to_excel(EXCEL_PATH, index=False)
print(f"Excel file saved at: {EXCEL_PATH}")

In this loop:

* **Encoding the Image**: Each image is encoded in Base64 format using the `encode_image` function.
* **API Request**: We send the encoded image along with the prompt to the OpenAI API using `client.chat.completions.create`.
* **Temperature Parameter**: We set `temperature=0` to make the output deterministic, ensuring consistent formatting.
* **Response Parsing**: The API response is expected to be in a Markdown table format. We parse each line, checking if it's a data row.
* **Data Extraction**: We extract the columns, check if they match the expected number of DataFrame columns, and append them to the DataFrame.
* **Error Handling**: If a row doesn't match the expected format, we print a message and skip it.


After processing all images, we save the populated DataFrame to an Excel file.

This script demonstrates how to automate the extraction of structured data from menu images using the OpenAI GPT model.

By converting menu images into a standardized Excel format, we facilitate easier data management and analysis for restaurant or cafe menus.

The use of the OpenAI API for image-to-text conversion streamlines the data entry process, reducing manual effort and potential errors.

# Explanation of Key Concepts

## OpenAI GPT-4o Model

**GPT-4o** is a language model developed by OpenAI capable of understanding and generating human-like text, as well as processing image data when appropriately formatted.

In this script, we leverage the model's ability to interpret images and generate structured text outputs that conform to our Excel template.

---

## Base64 Encoding

**Base64 Encoding** converts binary data (like images) into ASCII characters, allowing us to include image data directly in text-based formats such as JSON or API requests.

This is essential when the API accepts image data in Base64 format rather than requiring an accessible image URL.

---

## Pandas DataFrame

**Pandas** is a powerful Python library for data manipulation and analysis.

A **DataFrame** is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).

Using a DataFrame allows us to store and manipulate the extracted data efficiently before exporting it to Excel.

---

## Parsing API Responses

The API response is expected to be in a **Markdown table format**.

We parse the response line by line, extract the data from each row, and populate the DataFrame accordingly.

Careful parsing ensures that data aligns correctly with the specified columns.

---

## Temperature Parameter

The **temperature** parameter controls the randomness of the model's output.

Setting `temperature=0` makes the output more deterministic, which is desirable when we need consistent and predictable formatting for data extraction tasks.

---

## Error Handling and Data Validation

The script includes checks to ensure that each row of data matches the expected format and number of columns.

Rows that do not conform are skipped, and a message is printed. This prevents malformed data from corrupting the DataFrame.

---

# Tips for Using This Script

- **Ensure Image Quality**: High-quality images with clear text improve the accuracy of the data extraction. Blurry or low-resolution images may lead to incorrect or incomplete data.

- **Review the Output**: Always verify the Excel output for accuracy and completeness. Manual review helps catch any discrepancies or errors introduced during the extraction process.

- **API Rate Limits**: Be mindful of the OpenAI API usage limits to avoid exceeding your quota. If processing a large number of images, consider implementing rate limiting or batching.

- **Error Handling**: Consider adding more robust error handling to manage exceptions such as API errors, network issues, or unexpected response formats.

- **Extensibility**: The script can be extended to handle additional data fields or different templates by modifying the `system_prompt` and adjusting the DataFrame columns accordingly.
