[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fw-ai/cookbook/blob/main/learn/batch-api/batch_api.ipynb)

# Get Started with Batch API

## Introduction

The Batch API is designed for asynchronous processing of long-running API tasks by offloading execution and storing results for later retrieval. This architecture is ideal for workloads such as media processing, transcription, translation, and other time-intensive operations that benefit from deferred, non-blocking execution.

> **Design Note:**  
>
> Currently, our Batch API is designed for simplicity by submitting each file as a separate request, with the backend processing them asynchronously. To upload multiple files, we suggest using a loop to submit them individually. We're open to feedback and may consider expanding functionality, including multi-file support, based on user needs and usage patterns.  
>  <br>

Clients submit requests synchronously over HTTP, specifying a target **endpoint_id** and **path** that define the backend service and API route. The server then processes these requests asynchronously in the background. Clients can check request status at any time and retrieve results once processing is complete. The system is fully endpoint-agnostic, allowing it to seamlessly route and support a wide range of backend services across Fireworks.AI's infrastructure.

For more information on Batch API parameters, including endpoint_id, path, and others, please refer to the link below:

- [Create Batch Request – Fireworks Docs](https://docs.fireworks.ai/api-reference/create-batch-request)  
- [Check Batch Status – Fireworks Docs](https://docs.fireworks.ai/api-reference/get-batch-status)

This notebook gives examples of:

*  Submit multiple files from a local directory for asynchronous batch processing;
*  Track submissions and their statuses using a CSV file;
*  Check the processing status of each submitted request and retrieve results once they are completed;
*  Parse the body field of the response based on its content_type to access the final output.

## Install dependencies

In [None]:
!pip3 install requests

## 1. Prepare audio Samples

In this example, we'll download multiple pre-recorded audio files into a local directory.
These files will be individually submitted to the Batch API for asynchronous transcription processing.

In [None]:
!mkdir -p audio_samples
!curl -L -o "audio_samples/audio_sample_1.flac" "https://tinyurl.com/4997djsh"
!curl -L -o "audio_samples/audio_sample_2.flac" "https://tinyurl.com/4997djsh"
!curl -L -o "audio_samples/audio_sample_3.flac" "https://tinyurl.com/4997djsh"
!curl -L -o "audio_samples/audio_sample_4.flac" "https://tinyurl.com/4997djsh"
!curl -L -o "audio_samples/audio_sample_5.flac" "https://tinyurl.com/4997djsh"

## 2. Set up API credentials

To use the Fireworks Batch API, you'll need your API key. For security reasons, we'll get it from environment variables.

You can set your API key in the notebook by running:

In [None]:
import os
os.environ["FIREWORKS_API_KEY"] = "your-api-key-here"

## 3. Submitting a Batch Processing Request

This section demonstrates how to submit multiple requests to the Batch API for asynchronous batch processing.

Files are loaded from a local directory, and submission results are recorded in a CSV file for later tracking.

When constructing your request, you’ll need to specify the following key parameters:

* **`endpoint_id`**: Identifies the target backend service or model to handle the request (e.g., `"audio-prod"`, `"audio-turbo"`). This must be compatible with the selected operation or model type. You can refer to the [official documentation](https://docs.fireworks.ai/api-reference/create-batch-request) for a complete list of supported `endpoint_id`s and their corresponding services.

* **`path`**: The relative route of the target API operation (e.g., `"v1/audio/transcriptions"`, `"v1/audio/translations"`). This should correspond to a valid route supported by the backend service.

* **`payload`**: Contains the input data and configuration specific to the selected API route. Its structure should match the schema expected by the corresponding synchronous API.




The following example demonstrates how to submit multiple files individually to the transcription service for asynchronous batch processing.

In [None]:
import os
import csv
import requests


# === [Required by User] Define your input ===
audio_folder = "audio_samples"
path = "v1/audio/transcriptions"
endpoint_id = "audio-prod"
payload = {"model": "whisper-v3", "response_format": "json"}


# === [Environment and system settings] ===
api_key = os.environ.get("FIREWORKS_API_KEY")
batch_url = "https://audio-batch.link.fireworks.ai/"
url = batch_url + path
params = {"endpoint_id": endpoint_id}


# === [Helper function] Submit a single file ===
def submit_single_file(audio_file_path):
    headers = {"Authorization": api_key}
    try:
        with open(audio_file_path, "rb") as f:
            # 'files' must be a dictionary (required by the requests library) even when uploading a single file.
            # The number of files supported per request depends on the specific backend API.
            files = {"file": f}
            response = requests.post(url, files=files, data=payload, headers=headers, params=params)

        return {
            "audio_file": os.path.basename(audio_file_path),
            "status_code": response.status_code,
            "response_json": response.json(),
            "error": "",
        }
    except Exception as e:
        return {
            "audio_file": os.path.basename(audio_file_path),
            "status_code": None,
            "response_json": None,
            "error": str(e),
        }


# === [Batch submit all files] ===
def batch_submit_all_files(audio_folder):
    audio_files = [os.path.join(audio_folder, f) for f in os.listdir(audio_folder)]

    if not audio_files:
        print(f"No audio files found in {audio_folder}")
        return

    results = []
    for audio_file in audio_files:
        res = submit_single_file(audio_file)
        results.append(res)

        if res["status_code"] is not None and 200 <= res["status_code"] < 300:
            account_id = res["response_json"].get("account_id", "")
            batch_id = res["response_json"].get("batch_id", "")
            print(f"Successfully submitted {res['audio_file']}, Account ID: {account_id}, Batch ID: {batch_id}")
        else:
            error_message = (
                res["response_json"].get("error", "Unknown error") if res["response_json"] else res["error"]
            )
            print(f"Failed to submit {res['audio_file']}: {error_message}")
    
    # === [Write Initial Batch Submission Status to CSV] ===
    with open("batch_submission_status.csv", mode="w", newline="") as f:
        writer = csv.DictWriter(
            f, fieldnames=["audio_file", "status", "account_id", "batch_id", "content_type", "response_body"]
        )
        writer.writeheader()
        for res in results:
            if res["status_code"] is not None and 200 <= res["status_code"] < 300:
                status = "processing"
            else:
                status = "failed"

            row = {
                "audio_file": res["audio_file"],
                "status": status,
                "account_id": res["response_json"].get("account_id", ""),
                "batch_id": res["response_json"].get("batch_id", ""),
                "content_type": "",
                "response_body": "",
            }
            writer.writerow(row)


batch_submit_all_files("audio_samples")


View the submission statuses saved in `batch_submission_status.csv`:

In [None]:
!cat batch_submission_status.csv

## 4. Check Batch Processing Status and Retrieve Results

After submission, you can check whether each file has completed processing by querying the Batch API using the recorded account_id and batch_id.
Completed results are updated back into the CSV file for later parsing.

- **`account_id`**: This can be found on your [Fireworks AI account homepage](https://fireworks.ai/account/home), it was also returned in the response when you initially submitted the batch request.

- **`batch_id`**: This was returned in the response when you initially submitted the batch request.

If the batch job has completed, the response includes a `body` and a `content_type`.

In [None]:
import os
import time
import csv
import requests


# === [Environment and system settings] ===
api_key = os.environ.get("FIREWORKS_API_KEY")
csv_file = "batch_submission_status.csv"


# === [Helper function] Check a single batch ===
def check_single_batch(entry):
    account_id = entry.get("account_id")
    batch_id = entry.get("batch_id")
    audio_file = os.path.splitext(entry.get("audio_file", ""))[0]

    if not account_id or not batch_id:
        return entry, False

    url = f"{batch_url}v1/accounts/{account_id}/batch_job/{batch_id}"
    headers = {"Authorization": api_key}

    try:
        response = requests.get(url, headers=headers)
        response.raise_for_status()

        result = response.json()
        status = result.get("status", "")
        content_type = result.get("content_type", "")
        body = result.get("body", "")

        print(f"Audio File: {audio_file}, Batch ID: {batch_id}, Status: {status}")

        if status == "completed":
            entry["status"] = "completed"
            entry["content_type"] = content_type
            entry["response_body"] = body
            return entry, True

        return entry, False

    except Exception as e:
        print(f"Request failed for {audio_file}: {e}")
        return entry, False


# === [Check Processing Batches and Save Status Updates] ===
try:
    # Wait for the backend to finish processing before checking status
    time.sleep(10)

    with open(csv_file, mode="r", newline="") as f:
        reader = csv.DictReader(f)
        entries = list(reader)

    if not entries:
        print("No entries to process.")
    else:
        # Filter entries to only process those with status "processing"
        processing_row_indices = [i for i, entry in enumerate(entries) if entry["status"] == "processing"]

        for idx in processing_row_indices:
            updated_entry, _ = check_single_batch(entries[idx])
            entries[idx] = updated_entry

    # Rewrite CSV: update all entries
    with open(csv_file, mode="w", newline="") as f:
        writer = csv.DictWriter(
            f, fieldnames=["audio_file", "status", "account_id", "batch_id", "content_type", "response_body"]
        )
        writer.writeheader()
        writer.writerows(entries)

    print("Batch job statuses updated.")

except Exception as e:
    print(f"Error: {e}")


View the updated submission statuses saved in `batch_submission_status.csv`:

In [None]:
!cat batch_submission_status.csv

## 5. Parse and Display Completed Batch Responses

After the batch jobs have completed, you can parse the `body` field of each completed entry based on its `content_type`.

Parse the body according to its `content_type`, such as `application/json` for structured data (e.g., `json`, `verbose_json`), or `text/plain; charset=utf-8` for plain text formats (e.g., `text`, `srt`, `vtt`), to reconstruct the original response from the backend service.

In [None]:
import csv
import json
from email.parser import Parser

csv_file = "batch_submission_status.csv"

# === [Parse and Display Responses for Completed Batch Jobs] ===
with open(csv_file, mode="r", newline="") as f:
    entries = list(csv.DictReader(f))

for idx, entry in enumerate(entries):
    if entry.get("status") != "completed":
        continue

    audio_file = entry.get("audio_file", "")
    content_type = entry.get("content_type", "")
    body = entry.get("response_body", "")

    print(f"[{idx}] Audio File: {audio_file}")

    # Parse content_type
    main_type = Parser().parsestr(f"Content-Type: {content_type}").get_content_type()

    # Parse and print body based on content_type
    if main_type == "application/json":
        try:
            parsed = json.loads(body)
            print(json.dumps(parsed, indent=2))
        except Exception as e:
            print(f"Failed to parse JSON: {e}")
            print(body)
    elif main_type == "text/plain":
        print(body)
    else:
        print("Unsupported Content-Type. Raw body:")
        print(body)

print("Done parsing completed entries.")


## Conclusion

In this notebook, you learned how to use the Fireworks.AI Batch API to asynchronously process long-running requests by submitting multiple files individually.

We covered how to prepare files from a local directory, submit each file as a separate batch request, track submission statuses in a CSV file, check processing status, retrieve results once completed, and parse the response body based on its content type.

This approach is especially useful for workloads such as transcription, translation, and other media-related tasks that benefit from asynchronous, scalable processing.

For more information, see:

- [Create Batch Request – Fireworks Docs](https://docs.fireworks.ai/api-reference/create-batch-request)  
- [Check Batch Status – Fireworks Docs](https://docs.fireworks.ai/api-reference/get-batch-status)

Explore the community or reach out to us in [discord](https://discord.gg/fireworks-ai).