This week (23rd Oct '24), Azure OpenAI Global Batch reached General Availability. This allows for high-volume, asynchronous processing at 50% less cost than global standard.
A few days before GA, Microsoft Learn was updated to include additional documentation on how to use Structured Outputs with Global Batch deployments.
This allows customers and partners to combine two relatively new Azure OpenAI features, to go from something like this...
Custom ID | Model Name | Prompt | Required |
---|---|---|---|
task-0 | gpt-4o-batch | Provide info about the Empire State Building | BuildingName, HeightInFeet, City, Country |
task-1 | gpt-4o-batch | Provide info about the Shard in London | BuildingName, HeightInFeet, City, Country |
task-2 | gpt-4o-batch | Provide info about the Burj Khalifa | BuildingName, HeightInFeet, City, Country |
...to something like this (reformatted as a table for human-friendly viewing):
Custom ID | BuildingName | HeightinFeet | City | Country |
---|---|---|---|---|
task-0 | Empire State Building | 1,454 | New York City | United States |
task-1 | The Shard | 1,016 | London | United Kingdom |
task-2 | Burj Khalifa | 2,717 | Dubai | United Arab Emirates |
Structured Outputs ensures model-generated outputs conform exactly to developer-provided JSON Schemas, solving challenges around generating structured data from unstructured inputs. For an example, check out my Structured Outputs demo.
Global Batch is a new deployment type which allows efficient, large-scale processing of asynchronous requests with a separate quota, 24-hour target turnaround, and 50% lower cost than standard, by sending multiple requests in a single JSONL batch file.
If you want to use both of these features at the same time, firstly, deploy gpt-4o in a supported region using the "Global Batch" deployment type. Be sure to use version 2024-08-06 or newer for Structured Output support. At time of writing (Oct '24), only gpt-4o version: 2024-08-06 supports structured outputs. gpt-4o mini support is coming soon.
In this example, I've deployed gpt-4o 2024-08-06 into Sweden Central, and have named it 'gpt-4o-batch'.
The requests for batch are stored in a JSONL file, one request per line. Each line contains the deployment name, the system message, the user message, and the JSON Schema (download this sample JSONL file here):
{"custom_id": "task-0", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-batch", "messages": [{"role": "system", "content": "You are an AI assistant that provides facts about tall buildings."}, {"role": "user", "content": "Provide info about the Empire State Building"}], "response_format": {"type": "json_schema", "json_schema": {"name": "building_info_schema", "schema": {"type": "object", "properties": {"BuildingName": {"type": "string"}, "HeightInFeet": {"type": "string"}, "City": {"type": "string"}, "Country": {"type": "string"}}, "required": ["BuildingName", "HeightInFeet", "City", "Country"], "additionalProperties": false}, "strict": true}}}}
{"custom_id": "task-1", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-batch", "messages": [{"role": "system", "content": "You are an AI assistant that provides facts about tall buildings."}, {"role": "user", "content": "Provide info about the Shard in London"}], "response_format": {"type": "json_schema", "json_schema": {"name": "building_info_schema", "schema": {"type": "object", "properties": {"BuildingName": {"type": "string"}, "HeightInFeet": {"type": "string"}, "City": {"type": "string"}, "Country": {"type": "string"}}, "required": ["BuildingName", "HeightInFeet", "City", "Country"], "additionalProperties": false}, "strict": true}}}}
{"custom_id": "task-2", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-batch", "messages": [{"role": "system", "content": "You are an AI assistant that provides facts about tall buildings."}, {"role": "user", "content": "Provide info about the Burj Khalifa"}], "response_format": {"type": "json_schema", "json_schema": {"name": "building_info_schema", "schema": {"type": "object", "properties": {"BuildingName": {"type": "string"}, "HeightInFeet": {"type": "string"}, "City": {"type": "string"}, "Country": {"type": "string"}}, "required": ["BuildingName", "HeightInFeet", "City", "Country"], "additionalProperties": false}, "strict": true}}}}
Here's what one of these requests looks like if we reformat for easier readability:
Important
In your final JSONL file, ensure that each request is on a single line to ensure compliance to the JSONL format
{
"custom_id": "task-0",
"method": "POST",
"url": "/chat/completions",
"body": {
"model": "gpt-4o-batch",
"messages": [
{
"role": "system",
"content": "You are an AI assistant that provides facts about tall buildings."
},
{
"role": "user",
"content": "Provide info about the Empire State Building"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "building_info_schema",
"schema": {
"type": "object",
"properties": {
"BuildingName": {
"type": "string"
},
"HeightInFeet": {
"type": "string"
},
"City": {
"type": "string"
},
"Country": {
"type": "string"
}
},
"required": [
"BuildingName",
"HeightInFeet",
"City",
"Country"
],
"additionalProperties": false
},
"strict": true
}
}
}
}
This Python notebook walks through the steps required to upload an example batch file, submit it for processing, track its progress, and retrieve structured outputs using Azure OpenAI's Batch API. It's based on the sample in the Microsoft Learn documentation.
In this section, the notebook sets up the Azure OpenAI client, using environment variables for the API endpoint and key. It then uploads a file containing batch requests, specifying its purpose as batch.
Important
Ensure the API is set to 2024-10-01-preview or later, and the gpt-4o model is deployed in a supported region
from dotenv import load_dotenv
import os
from openai import AzureOpenAI
load_dotenv()
client = AzureOpenAI(
azure_endpoint = os.getenv("AZURE_OPENAI_API_ENDPOINT"),
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-10-01-preview"
)
# Upload a file with a purpose of "batch"
file = client.files.create(
file=open("batch - StructuredOutputs.jsonl", "rb"),
purpose="batch"
)
print(file.model_dump_json(indent=2))
file_id = file.id
- The client is initialized using environment variables to keep sensitive information secure.
- The batch file is opened and uploaded with the purpose batch, which is used for asynchronous processing.
Key Output: The file_id is captured, which will be used in subsequent steps to reference the file.
After uploading the batch file, the next step is to track its upload status. This loop checks the file status every 15 seconds until the file is processed.
import time
import datetime
status = "pending"
while status != "processed":
time.sleep(15)
file_response = client.files.retrieve(file_id)
status = file_response.status
print(f"{datetime.datetime.now()} File Id: {file_id}, Status: {status}")
Explanation: The script repeatedly checks the status of the uploaded file. The loop pauses for 15 seconds between each check until the status becomes "processed", indicating that the file is ready for use.
Once the file is processed, we can submit it as a batch job. The batch request is processed asynchronously, and the notebook tracks the job's progress.
batch_response = client.batches.create(
input_file_id=file_id,
endpoint="/chat/completions",
completion_window="24h",
)
# Save batch ID for later use
batch_id = batch_response.id
print(batch_response.model_dump_json(indent=2))
Explanation: This section submits the batch job, which processes multiple requests in parallel using the provided batch file. The completion_window is set to 24 hours, meaning the job is expected to complete within that time.
Just like tracking the file upload, this code tracks the status of the batch job until it reaches a terminal state (completed, failed, or canceled).
status = "validating"
while status not in ("completed", "failed", "canceled"):
time.sleep(60)
batch_response = client.batches.retrieve(batch_id)
status = batch_response.status
print(f"{datetime.datetime.now()} Batch Id: {batch_id}, Status: {status}")
Explanation: The batch status is updated in real-time and printed in intervals of 60 seconds. Once the batch job reaches the "completed" status, you can proceed to retrieve the output.
After the batch job is completed, the response contains various metadata and details about the processing status.
print(batch_response.model_dump_json(indent=2))
Explanation: This code provides a detailed view of the batch job's final status, including whether any errors occurred and how many requests were successfully completed.
Once the batch job completes, this step retrieves and prints the output for each request in the batch, formatting the responses into readable JSON.
import json
response = client.files.content(batch_response.output_file_id)
raw_responses = response.text.strip().split('\n')
for raw_response in raw_responses:
json_response = json.loads(raw_response)
formatted_json = json.dumps(json_response, indent=2)
print(formatted_json)
Explanation: The output of the batch job is stored in a file on Azure. Each response is retrieved, parsed from JSON, and formatted for readability.
The notebook also provides examples of how to cancel a batch job and list all batch jobs submitted.
client.batches.cancel("batch_abc123") # Set to your batch_id for the job you want to cancel
Explanation: If necessary, you can cancel a batch job that is still in progress.
client.batches.list()
Explanation: This command lists all batch jobs submitted to your Azure OpenAI service, showing their status and other details.
By following the steps in this notebook, users can automate batch processing of Azure OpenAI requests with structured outputs, track the status of their jobs, and retrieve the results in a scalable and efficient manner.