# Batch processing with the Batch API

The new Batch API allows to **create async batch jobs for a lower price and with higher rate limits**.

Jobs will be completed within 24h, but can be completed faster depending on global usage. 

This notebook covers how to use the Batch API with a practical example.

As an example, we will caption images using the Amazon furniture dataset, using the `gpt-4-vision-preview` model. 

Please note that multiple models are available through the Batch API, and that you can use several parameters in your batch API calls, as you would with the Chat Completions endpoint.

## Setup

### Imports

In [None]:
# Make sure you have the latest version of the SDK available to use the Batch API
%pip install openai --upgrade

In [1]:
import json
from openai import OpenAI
import pandas as pd
from IPython.display import Image, display

In [2]:
# Initializing OpenAI client - see https://platform.openai.com/docs/quickstart?context=python
client = OpenAI()

### Loading data

In [3]:
dataset_path = "data/amazon_furniture_dataset.csv"

In [4]:
df = pd.read_csv(dataset_path)
df

Unnamed: 0,asin,url,title,brand,price,availability,categories,primary_image,images,upc,...,color,material,style,important_information,product_overview,about_item,description,specifications,uniq_id,scraped_at
0,B0CJHKVG6P,https://www.amazon.com/dp/B0CJHKVG6P,"GOYMFK 1pc Free Standing Shoe Rack, Multi-laye...",GOYMFK,$24.99,Only 13 left in stock - order soon.,"['Home & Kitchen', 'Storage & Organization', '...",https://m.media-amazon.com/images/I/416WaLx10j...,['https://m.media-amazon.com/images/I/416WaLx1...,,...,White,Metal,Modern,[],"[{'Brand': ' GOYMFK '}, {'Color': ' White '}, ...",['Multiple layers: Provides ample storage spac...,"multiple shoes, coats, hats, and other items E...","['Brand: GOYMFK', 'Color: White', 'Material: M...",02593e81-5c09-5069-8516-b0b29f439ded,2024-02-02 15:15:08
1,B0B66QHB23,https://www.amazon.com/dp/B0B66QHB23,"subrtex Leather ding Room, Dining Chairs Set o...",subrtex,,,"['Home & Kitchen', 'Furniture', 'Dining Room F...",https://m.media-amazon.com/images/I/31SejUEWY7...,['https://m.media-amazon.com/images/I/31SejUEW...,,...,Black,Sponge,Black Rubber Wood,[],,['【Easy Assembly】: Set of 2 dining room chairs...,subrtex Dining chairs Set of 2,"['Brand: subrtex', 'Color: Black', 'Product Di...",5938d217-b8c5-5d3e-b1cf-e28e340f292e,2024-02-02 15:15:09
2,B0BXRTWLYK,https://www.amazon.com/dp/B0BXRTWLYK,Plant Repotting Mat MUYETOL Waterproof Transpl...,MUYETOL,$5.98,In Stock,"['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo...",https://m.media-amazon.com/images/I/41RgefVq70...,['https://m.media-amazon.com/images/I/41RgefVq...,,...,Green,Polyethylene,Modern,[],"[{'Brand': ' MUYETOL '}, {'Size': ' 26.8*26.8 ...","['PLANT REPOTTING MAT SIZE: 26.8"" x 26.8"", squ...",,"['Brand: MUYETOL', 'Size: 26.8*26.8', 'Item We...",b2ede786-3f51-5a45-9a5b-bcf856958cd8,2024-02-02 15:15:09
3,B0C1MRB2M8,https://www.amazon.com/dp/B0C1MRB2M8,"Pickleball Doormat, Welcome Doormat Absorbent ...",VEWETOL,$13.99,Only 10 left in stock - order soon.,"['Patio, Lawn & Garden', 'Outdoor Décor', 'Doo...",https://m.media-amazon.com/images/I/61vz1Igler...,['https://m.media-amazon.com/images/I/61vz1Igl...,,...,A5589,Rubber,Modern,[],"[{'Brand': ' VEWETOL '}, {'Size': ' 16*24INCH ...","['Specifications: 16x24 Inch ', "" High-Quality...",The decorative doormat features a subtle textu...,"['Brand: VEWETOL', 'Size: 16*24INCH', 'Materia...",8fd9377b-cfa6-5f10-835c-6b8eca2816b5,2024-02-02 15:15:10
4,B0CG1N9QRC,https://www.amazon.com/dp/B0CG1N9QRC,JOIN IRON Foldable TV Trays for Eating Set of ...,JOIN IRON Store,$89.99,Usually ships within 5 to 6 weeks,"['Home & Kitchen', 'Furniture', 'Game & Recrea...",https://m.media-amazon.com/images/I/41p4d4VJnN...,['https://m.media-amazon.com/images/I/41p4d4VJ...,,...,Grey Set of 4,Iron,X Classic Style,[],,['Includes 4 Folding Tv Tray Tables And one Co...,Set of Four Folding Trays With Matching Storag...,"['Brand: JOIN IRON', 'Shape: Rectangular', 'In...",bdc9aa30-9439-50dc-8e89-213ea211d66a,2024-02-02 15:15:11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
307,B08SLPBC36,https://www.amazon.com/dp/B08SLPBC36,Lexicon Victoria Saddle Wood Bar Stools (Set o...,Lexicon,$58.99,Only 7 left in stock (more on the way).,"['Home & Kitchen', 'Furniture', 'Game & Recrea...",https://m.media-amazon.com/images/I/41CPL03Y-W...,['https://m.media-amazon.com/images/I/41CPL03Y...,,...,Black Sand,Wood,Contemporary,[],,"['Frame Material: Wood ', ' Set includes two (...",With a country flair and a deep black sand fin...,"['Product Dimensions: 18""D x 15.5""W x 29""H', '...",d3b681ac-6195-5c9b-8125-98b6425829f4,2024-02-02 18:56:50
308,B09KN5ZTXC,https://www.amazon.com/dp/B09KN5ZTXC,ANZORG Behind Door Hanging Kids Shoes Organize...,ANZORG Store,$9.99,Only 14 left in stock - order soon.,"['Home & Kitchen', 'Storage & Organization', '...",https://m.media-amazon.com/images/I/31qQ2tZPv-...,['https://m.media-amazon.com/images/I/31qQ2tZP...,,...,12 Pockets,Non Woven Fabric,,[],,"['Non Woven Fabric ', "" Hanging organizer with...",,"['Specific Uses For Product: 鞋子', 'Material: N...",07e5e60e-953d-5512-aab8-cf83193de252,2024-02-02 18:56:51
309,B0BN7T57NK,https://www.amazon.com/dp/B0BN7T57NK,Pipishell Full-Motion TV Wall Mount for Most 3...,Pipishell Store,$35.99,In Stock,"['Electronics', 'Television & Video', 'Accesso...",https://m.media-amazon.com/images/I/41TkLI3K2-...,['https://m.media-amazon.com/images/I/41TkLI3K...,,...,Black,,,[],"[{'Mounting Type': "" Wall Mount for 16'' Wood ...",['Solid & Stable Support: This swivel wall mou...,,"['Brand Name: Pipishell', 'Item Weight: 10.83 ...",cb66eee5-7113-5568-9713-9e08e2b48a26,2024-02-02 18:56:52
310,B097FC9C27,https://www.amazon.com/dp/B097FC9C27,Noori Rug Home - Lux Collection Modern Ava Rou...,NOORI RUG,$67.60,In Stock,"['Home & Kitchen', 'Furniture', 'Living Room F...",https://m.media-amazon.com/images/I/21Uq9uJEE5...,['https://m.media-amazon.com/images/I/21Uq9uJE...,,...,Ivory/Gold Ava,Engineered Wood,Glam,[],,"['Velvet ', ' Both functional and decorative, ...","Both functional and decorative, this storage s...","['Product Dimensions: 13""D x 13""W x 15""H', 'Co...",0a3805a8-8249-55e1-a9de-ccf6c602f167,2024-02-02 18:56:53


### Processing step 

Here, we will prepare our tasks by first trying them out with the Chat Completions endpoint.

Once you're happy with the results you have using regular chat completions, you can move on to creating your batch job files.

In [5]:
system_prompt = '''
Your goal is to generate short, descriptive captions for images of items.
You will be provided with an item image and the name of that item and you will output a caption that captures the most important information about the item.
If there are multiple items depicted, refer to the name provided to understand which item you should describe.
Your generated caption should be short (1 sentence), and include the most relevant information about the item.
The most important information could be: the type of item, the style (if mentioned), the material or color if especially relevant and any distinctive features.
'''

def get_caption(img_url, title):
    response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    temperature=0.2,
    messages=[
        {
            "role": "system",
            "content": system_prompt
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": title
                },
                {
                    "type": "image_url",
                    "image_url": img_url,
                },
            ],
        }
    ],
    max_tokens=300,
    )

    return response.choices[0].message.content

In [6]:
# Testing on a few images
for _, row in df[:5].iterrows():
    img_url = row['primary_image']
    caption = get_caption(img_url, row['title'])
    img = Image(url=img_url)
    display(img)
    print(f"CAPTION: {caption}\n\n")

CAPTION: A white multi-layer metal shoe rack with eight double hooks, featuring shoes and accessories storage, placed against a wall next to a door.




CAPTION: A set of two black leather upholstered dining chairs with a simple, contemporary design.




CAPTION: A green, waterproof, square plant repotting mat with raised corners and gardening tools displayed on it.




CAPTION: A brown absorbent non-slip doormat with the phrase "it's a good day to play PICKLEBALL" and pickleball paddle graphics.




CAPTION: A set of four grey foldable TV trays with a matching stand, featuring a sleek and space-saving design for convenient snacking or dining.




## Creating the batch file

The batch file (jsonl) should contain one line per task.
Each task is defined as such:

```
{
    "custom_id": <TASK_ID>,
    "method": "POST",
    "url": "/v1/chat/completions",
    "body": {
        "model": <MODEL>,
        "messages": <MESSAGES>,
        // other parameters
    }
}
```

Note: the task ID should be unique per batch job. This is what you can use to match results to the initial input files, are tasks will not be returned in the same order.

In [7]:
# Creating an array of json tasks

tasks = []

for index, row in df.iterrows():
    
    title = row['title']
    img_url = row['primary_image']
    
    task = {
        "custom_id": f"task-{index}",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            # This is what you would have in your Chat Completions API call
            "model": "gpt-4-vision-preview",
            "messages": [
                {
                    "role": "system",
                    "content": system_prompt
                },
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": title
                        },
                        {
                            "type": "image_url",
                            "image_url": img_url,
                        },
                    ],
                }
            ],
            "temperature": 0.2,
            "max_tokens": 300
        }
    }
    
    tasks.append(task)

In [8]:
# Creating the file

file_name = "data/batch_tasks.jsonl"

with open(file_name, 'w') as file:
    for obj in tasks:
        file.write(json.dumps(obj) + '\n')

### Uploading the file

In [9]:
batch_file = client.files.create(
  file=open(file_name, "rb"),
  purpose="batch"
)

In [10]:
print(batch_file)

FileObject(id='file-kqHmhAcZM1nRcewdvT4V9Htr', bytes=350626, created_at=1713979938, filename='batch_tasks.jsonl', object='file', purpose='batch', status='processed', status_details=None)


## Creating the batch job

In [11]:
batch_job = client.batches.create(
  input_file_id=batch_file.id,
  endpoint="/v1/chat/completions",
  completion_window="24h"
)

### Checking job status

Note: this can take up to 24h, but it will usually be completed faster.

You can continue checking until the status is 'completed'.

In [20]:
batch_job = client.batches.retrieve(batch_job.id)
print(batch_job)

Batch(id='batch_8lRuRwDxKKdXm4A8S1u577zi', completion_window='24h', created_at=1713952714, endpoint='/v1/chat/completions', input_file_id='file-N9FrpHjYftSlW4zC1WkLv81a', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1713952907, error_file_id=None, errors=None, expired_at=None, expires_at=1714039114, failed_at=None, finalizing_at=1713952892, in_progress_at=1713952750, metadata=None, output_file_id='file-EuZcIJXoWgMWSnCypOEipVLQ', request_counts=BatchRequestCounts(completed=312, failed=0, total=312))


## Retrieving results

In [29]:
result_file_id = batch_job.output_file_id
result = client.files.content(result_file_id).content

In [33]:
result_file_name = "data/batch_job_results.jsonl"

with open(result_file_name, 'wb') as file:
    file.write(result)

In [39]:
# Loading data from saved file
results = []
with open(result_file_name, 'r') as file:
    for line in file:
        # Parsing the JSON string into a dict and appending to the list of results
        json_object = json.loads(line.strip())
        results.append(json_object)

### Reading results
Reminder: the results are not in the same order as in the input file.
Make sure to check the custom_id to match the results against the input tasks

In [47]:
# Reading only the first results
for res in results[:5]:
    task_id = res['custom_id']
    # Getting index from task id
    index = task_id.split('-')[-1]
    result = res['response']['body']['choices'][0]['message']['content']
    item = df.iloc[int(index)]
    img_url = item['primary_image']
    img = Image(url=img_url)
    display(img)
    print(f"CAPTION: {result}\n\n")

CAPTION: A brown absorbent non-slip doormat with the phrase "it's a good day to play PICKLEBALL" and pickleball paddle graphics.




CAPTION: A 30-inch LOVMOR bathroom vanity sink base cabinet with three drawers on the left and a single door, finished in a warm brown wood tone.




CAPTION: A black 4-tier freestanding bathroom organizer with adjustable shelves and baskets, designed to fit over a toilet.




CAPTION: Black full-motion TV wall mount with dual articulating arms for 37–75 inch TVs, capable of swivel and tilt, supporting up to 100 lbs and fitting 16" wood studs.




CAPTION: A colorful modular kids play couch set with a galaxy-themed pattern that glows in the dark, designed for creative play and seating in a child's playroom.


