# Azure OpenAI Batch Text

The Azure OpenAI Batch API is designed to handle large-scale and high-volume processing tasks efficiently. **Process asynchronous groups of requests with separate quota, with 24-hour target turnaround, at 50% less cost than global standard.** With batch processing, rather than send one request at a time you send a large number of requests in a single file. Global batch requests have a separate enqueued token quota avoiding any disruption of your online workloads.

Key use cases include:
- Large-Scale Data Processing: Quickly analyze extensive datasets in parallel.
- Content Generation: Create large volumes of text, such as product descriptions or articles.
- Document Review and Summarization: Automate the review and summarization of lengthy documents.
- Customer Support Automation: Handle numerous queries simultaneously for faster responses.
- Data Extraction and Analysis: Extract and analyze information from vast amounts of unstructured data.
- Natural Language Processing (NLP) Tasks: Perform tasks like sentiment analysis or translation on large datasets.
- Marketing and Personalization: Generate personalized content and recommendations at scale.

https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/batch

In [1]:
import datetime
import json
import openai
import os
import sys
import time

from dotenv import load_dotenv
from openai import AzureOpenAI

In [2]:
print(f"Python version: {sys.version}")
print(f"OpenAI version: {openai.__version__}")

Python version: 3.10.11 (main, May 16 2023, 00:28:57) [GCC 11.2.0]
OpenAI version: 1.43.0


In [3]:
print(f"Today is {datetime.datetime.today().strftime('%d-%b-%Y %H:%M:%S')}")

Today is 06-Sep-2024 08:41:33


## Settings

In [4]:
load_dotenv("azure.env")

# Azure OpenAI
AZURE_OPENAI_API_ENDPOINT: str = os.getenv("AZURE_OPENAI_API_ENDPOINT")
AZURE_OPENAI_API_KEY: str = os.getenv("AZURE_OPENAI_API_KEY")

# Azure OpenAI batch model
AOAI_BATCH_MODEL: str = "gpt-4o-batch"

In [5]:
JSONL_DIR = "jsonl"
RESULTS_DIR = "results"

os.makedirs(JSONL_DIR, exist_ok=True)
os.makedirs(RESULTS_DIR, exist_ok=True)

## Creating some prompts

In [6]:
prompts = [
    "Hello. Who are you?",
    "What is Azure OpenAI?",
    "What are Arima models?",
    "Translate this from English to French: 'Welcome'",
]

prompts

['Hello. Who are you?',
 'What is Azure OpenAI?',
 'What are Arima models?',
 "Translate this from English to French: 'Welcome'"]

In [7]:
len(prompts)

4

## Creating the jsonl input file for batch

In [8]:
# File to create
jsonlfile = os.path.join(JSONL_DIR, "batch_text.jsonl")

In [9]:
# Define the structure template
template = {
    "custom_id": "",
    "method": "POST",
    "url": "/chat/completions",
    "body": {
        "model":
        AOAI_BATCH_MODEL,
        "messages": [{
            "role": "system",
            "content": "You are an AI assistant that helps people find information."
        }, {
            "role": "user", "content": ""
        }]
    }
}

# Generate the JSONL content
jsonl_content = ""

for i, prompt in enumerate(prompts):
    entry = template.copy()
    entry["custom_id"] = f"task-{i}"
    entry["body"]["messages"][1]["content"] = prompt
    jsonl_content += json.dumps(entry) + "\n"

# Write to a file
with open(jsonlfile, "w") as file:
    file.write(jsonl_content)

print(f"JSONL file {jsonlfile} created successfully.")

JSONL file jsonl/batch_text.jsonl created successfully.


In [10]:
!ls $jsonlfile -lh

-rwxrwxrwx 1 root root 1.1K Sep  6 08:41 jsonl/batch_text.jsonl


In [11]:
print(jsonl_content)

{"custom_id": "task-0", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-batch", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "Hello. Who are you?"}]}}
{"custom_id": "task-1", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-batch", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "What is Azure OpenAI?"}]}}
{"custom_id": "task-2", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-batch", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find information."}, {"role": "user", "content": "What are Arima models?"}]}}
{"custom_id": "task-3", "method": "POST", "url": "/chat/completions", "body": {"model": "gpt-4o-batch", "messages": [{"role": "system", "content": "You are an AI assistant that helps people find infor

## Azure OpenAI client

In [12]:
client = AzureOpenAI(
    api_key=AZURE_OPENAI_API_KEY,
    api_version="2024-07-01-preview",
    azure_endpoint=AZURE_OPENAI_API_ENDPOINT,
)

## Upload batch file

In [13]:
# Upload a file with a purpose of "batch"
file = client.files.create(file=open(jsonlfile, "rb"), purpose="batch")

print("\033[1;34m")
print(file.model_dump_json(indent=2))
file_id = file.id

[1;34m
{
  "id": "file-8f12ac7ee99c4401acecee44f87f4de3",
  "bytes": 1094,
  "created_at": 1725612095,
  "filename": "batch_text.jsonl",
  "object": "file",
  "purpose": "batch",
  "status": "pending",
  "status_details": null
}


## Track file upload status

In [14]:
# Wait until the uploaded file is in processed state
status = "pending"

print("\033[1;34m")

while status != "processed":
    time.sleep(1)
    file_response = client.files.retrieve(file_id)
    status = file_response.status
    print(f"{datetime.datetime.now()} | File Id: {file_id} | Status: {status}")

print(f"{datetime.datetime.now()} End")

[1;34m
2024-09-06 08:41:36.815144 | File Id: file-8f12ac7ee99c4401acecee44f87f4de3 | Status: running
2024-09-06 08:41:37.939146 | File Id: file-8f12ac7ee99c4401acecee44f87f4de3 | Status: processed
2024-09-06 08:41:37.939399 End


## Create batch job

In [15]:
# Submit a batch job with the file
batch_response = client.batches.create(
    input_file_id=file_id,
    endpoint="/chat/completions",
    completion_window="24h",
)

# Save batch ID for later use
batch_id = batch_response.id

print("\033[1;34m")
print(batch_response.model_dump_json(indent=2))

[1;34m
{
  "id": "batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da",
  "completion_window": "24h",
  "created_at": 1725612098,
  "endpoint": "/chat/completions",
  "input_file_id": "file-8f12ac7ee99c4401acecee44f87f4de3",
  "object": "batch",
  "status": "validating",
  "cancelled_at": null,
  "cancelling_at": null,
  "completed_at": null,
  "error_file_id": null,
  "errors": null,
  "expired_at": null,
  "expires_at": 1725698498,
  "failed_at": null,
  "finalizing_at": null,
  "in_progress_at": null,
  "metadata": null,
  "output_file_id": null,
  "request_counts": {
    "completed": 0,
    "failed": 0,
    "total": 0
  }
}


## Track batch job progress

In [16]:
status = "validating"

start = time.time()

status_colors = {
    "validating": "\033[1;31;30m",
    "in_progress": "\033[1;31;34m",
    "finalizing": "\033[1;31;32m",
    "completed": "\033[1;31;35m",
}

while status not in ("completed", "failed", "canceled"):
    time.sleep(30)
    batch_response = client.batches.retrieve(batch_id)
    status = batch_response.status
    print(status_colors.get(status, ""), end="")
    print(f"{datetime.datetime.now()} | Batch Id: {batch_id} | Status: {status}")

elapsed = time.time() - start
minutes, seconds = divmod(elapsed, 60)
print("\033[0m")
print(f"Elapsed time = {minutes:.0f} minutes and {seconds:.0f} seconds")

[1;31;30m2024-09-06 08:42:09.076954 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: validating
[1;31;30m2024-09-06 08:42:39.510865 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: validating
[1;31;30m2024-09-06 08:43:10.001034 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: validating
[1;31;30m2024-09-06 08:43:40.440202 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: validating
[1;31;30m2024-09-06 08:44:11.015461 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: validating
[1;31;34m2024-09-06 08:44:41.618230 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: in_progress
[1;31;34m2024-09-06 08:45:12.109175 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: in_progress
[1;31;34m2024-09-06 08:45:42.586080 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da | Status: in_progress
[1;31;34m2024-09-06 08:46:13.102200 | Batch Id: batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da |

In [17]:
print("\033[1;34m")
print(batch_response.model_dump_json(indent=5))

[1;34m
{
     "id": "batch_4bc6fc5e-3b0f-4ddf-936a-ef1b204977da",
     "completion_window": "24h",
     "created_at": 1725612098,
     "endpoint": "/chat/completions",
     "input_file_id": "file-8f12ac7ee99c4401acecee44f87f4de3",
     "object": "batch",
     "status": "completed",
     "cancelled_at": null,
     "cancelling_at": null,
     "completed_at": 1725612667,
     "error_file_id": "file-f6d1e15b-7ccb-45d9-b475-e5bbb4e582aa",
     "errors": null,
     "expired_at": null,
     "expires_at": 1725698498,
     "failed_at": null,
     "finalizing_at": 1725612572,
     "in_progress_at": 1725612334,
     "metadata": null,
     "output_file_id": "file-e6b599e3-577e-46be-8e10-dc0ca12c1e5d",
     "request_counts": {
          "completed": 4,
          "failed": 0,
          "total": 4
     }
}


## Retrieve batch job output file

In [18]:
%%javascript Python
OutputArea.auto_scroll_threshold = 9999

<IPython.core.display.Javascript object>

In [19]:
# Retrieve the file content
file_response = client.files.content(batch_response.output_file_id)
raw_responses = file_response.text.strip().split('\n')

# Set text color to blue
print("\033[1;34m")

formatted_json_list = []

for raw_response in raw_responses:
    try:
        # Parse the JSON string
        parsed_json = json.loads(raw_response)
        formatted_json_list.append(parsed_json)
        # Format the JSON with indentation
        formatted_json = json.dumps(parsed_json, indent=5)
        # Print the formatted JSON
        print(formatted_json)
    
    except json.JSONDecodeError as e:
        # Handle the case where a line isn't a valid JSON
        print(f"Error decoding JSON: {e}")

[1;34m
{
     "custom_id": "task-3",
     "response": {
          "body": {
               "choices": [
                    {
                         "content_filter_results": {
                              "hate": {
                                   "filtered": false,
                                   "severity": "safe"
                              },
                              "protected_material_code": {
                                   "filtered": false,
                                   "detected": false
                              },
                              "protected_material_text": {
                                   "filtered": false,
                                   "detected": false
                              },
                              "self_harm": {
                                   "filtered": false,
                                   "severity": "safe"
                              },
                              "sexual": {
             

## Saving batch job output file

In [20]:
result_file = os.path.join(RESULTS_DIR, "batch_text_results.json")

with open(result_file, 'w') as output_file:
    json.dump(formatted_json_list, output_file, indent=5)

print(f"Results has been saved to {result_file}\n")

Results has been saved to results/batch_text_results.json



In [21]:
!ls $result_file -lh

-rwxrwxrwx 1 root root 20K Sep  6 08:51 results/batch_text_results.json


In [22]:
with open(result_file, 'r') as file:
    data = json.load(file)

# First, sort the data by custom_id
sorted_data = sorted(data, key=lambda x: x["custom_id"])

# Print the results in alphabetical order of custom_id
for prompt, item in zip(prompts, sorted_data):
    customid = item["custom_id"]
    result = item["response"]["body"]["choices"][0]["message"]["content"]
    print(f"********** Result {customid} **********\n")
    print(f"Prompt: {prompt}\n")
    print(f"\033[1;31;34mAnswer: {result}\n\033[0m")

********** Result task-0 **********

Prompt: Hello. Who are you?

[1;31;34mAnswer: Hello! I'm an AI assistant here to help you find information, answer questions, and provide assistance with various topics. How can I help you today?
[0m
********** Result task-1 **********

Prompt: What is Azure OpenAI?

[1;31;34mAnswer: Azure OpenAI is a service provided by Microsoft Azure that offers access to OpenAI's sophisticated language models, such as GPT-3, through Azure's cloud platform. This service allows developers, enterprises, and data scientists to incorporate advanced artificial intelligence capabilities into their applications more easily and securely. 

Key features of Azure OpenAI include:

1. **Integration with Azure Services**: Seamlessly integrates with other Azure services, providing robust infrastructure, easy scalability, and enhanced security features.

2. **API Accessibility**: Provides APIs that developers can use to integrate OpenAI's models into their applications, enab