# ChatGPT Batch API 

In this notebook, we utilize the Batch API which allows us to send up to 50k requests via a `.jsonl` file. 
- [More information on Batch API here](https://platform.openai.com/docs/guides/batch/overview)

The advantage of using this over the standard API is that there is a 50% decrease in rate limits, and it is less frustrating to deal with.

We first write to a `.jsonl` by generating a list json objects using rows from the `city_state.csv`.
Afterward, we pass the file to the API to start a batch which should complete within 24 hours.
Finally, we save the results, and create a new `.csv` file which can then be used to upload documents to our database.

In [None]:
# !pip3 install openai
# !pip3 install pandas

In [7]:
# Setup list of cities
import pandas as pd
df = pd.read_csv('../data/final_datasets/city_state_lat_lon.csv')
len(df)

22720

In [8]:
# Generate .jsonl file

import json
requests = []
for row in df.iterrows():

    city = f"{row[1]['city']}, {row[1]['state']}"
    content = f"Create a two sentence description about {city} with a character limit of 150. Make it a single line with no line breaks. Don't include special characters"
    request = {
                "custom_id": f"request-{row[0]}", 
                "method": "POST", 
                "url": "/v1/chat/completions", 
                "body": {
                            "model": "gpt-3.5-turbo-0125", 
                            "messages": [{
                                            "role": "system", 
                                            "content": 
                                            "You are a helpful assistant."
                                            },
                                            {
                                            "role": "user", 
                                            "content": content
                                            }],
                            "max_tokens": 1000}
              }
    requests.append(request)

# Write the content to the file
with open('batchinput.jsonl', 'w') as file:
    for req in requests:
        file.write(json.dumps(req) + '\n')

In [18]:
# Setup environment variables
import os
os.environ["OPENAI_API_KEY"] = "YourAPIKey"

In [140]:
from openai import OpenAI
client = OpenAI()

# Get batch file
batch_input_file = client.files.create(
  file=open("batchinput.jsonl", "rb"),
  purpose="batch"
)

# Create the batch
batch_input_file_id = batch_input_file.id
batch_response = client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
      "description": "nightly eval job"
    }
)

In [171]:
# Check batch
client = OpenAI()
batch_response = client.batches.retrieve(batch_response.id)
print(batch_response)
print(batch_response.status)

# Extras!
# client.batches.cancel(batch_response.id) # Danger!
# print(client.batches.list()) # list all batches

Batch(id='batch_66f4f1533d608190b8340bcb03791f0b', completion_window='24h', created_at=1727328595, endpoint='/v1/chat/completions', input_file_id='file-OPsnb9q0c6VZTXAKubYpxiQX', object='batch', status='completed', cancelled_at=None, cancelling_at=None, completed_at=1727335420, error_file_id=None, errors=None, expired_at=None, expires_at=1727414995, failed_at=None, finalizing_at=1727332052, in_progress_at=1727328606, metadata={'description': 'nightly eval job'}, output_file_id='file-5wMrBh1F4lzc57CaiFzd9Zsg', request_counts=BatchRequestCounts(completed=22903, failed=0, total=22903))
completed


In [None]:
# Save results
if batch_response.output_file_id != None:
    file_response = client.files.content(batch_response.output_file_id)
    print(file_response.text)

    # Write the content to the file
    with open('output.jsonl', 'w') as file:
        file.write(file_response.text)
else:
    print("output_file_id not available yet!")

In [173]:
import json

file_path = 'output.jsonl'

# Initialize a list to hold the parsed JSON objects
data = []

# Read the JSONL file
with open(file_path, 'r') as file:
    for line in file:
        # Parse each line as a JSON object and append to the list
        data.append(json.loads(line))

# Access the contents 
# NOTE: Documentation states the order may not be maintained
# You may need to re-order them based on the custom_ID. It was not necessary to do so in this case.
descriptions = []
for entry in data:
    descriptions.append(entry['response']['body']['choices'][0]['message']['content'])

# Create pandas dataframe
final_df = pd.DataFrame()
final_df['city'] = df['city']
final_df['state'] = df['state']
final_df['descriptions'] = descriptions
final_df.to_csv('../data/final_datasets/city_descriptions.csv')

## Standard API Example

The below snippet shows the first method I tried using, which started to slow down after the first hour to the point where it was crawling on every request.

---

```python
from openai import OpenAI
import tqdm

client = OpenAI()

descriptions = []
city_names = []
state_names = []

# Loop over over rows to to query the chatGPT API and collect descriptions
for index, row in tqdm.tqdm(df.iloc[9301:].iterrows(), total=df.iloc[9301:].shape[0], desc='Processing Rows'):

    city = f'{row['city']}, {row['state']}'

    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {
                "role": "user",
                "content": f"Create a two sentence description about {city} with a character limit of 150. Make it a single line with no line breaks. Don't include special characters"
            }
        ]
    )

    # Append to lists for data frame creation later
    city_names.append(row['city'])
    state_names.append(row['state'])
    descriptions.append(completion.choices[0].message.content)

# Create a new data frame to save a CSV
new_df = pd.DataFrame()
new_df['city'] = city_names
new_df['state'] = state_names
new_df['description'] = descriptions
new_df.to_csv('../data/final_datasets/city_descriptions.csv')
```