# Voyage AI Batch API - OpenAI SDK compatibility and migration guide

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/tooling/voyageai_batch_api_openai_compatibility.ipynb)

You can fully manage batch job lifecycles for Voyage AI using the OpenAI SDK (Python or TypeScript/JavaScript). Both the Files API and the Batch API follow OpenAI’s specification, so migrating is straightforward. Switching from an OpenAI model to a Voyage AI model requires only minor code updates.

This guide explains how to:

* Convert an existing OpenAI batch input file to a Voyage AI-compatible format
* Use the OpenAI Python SDK to interact with Voyage AI Files and Batch API

In [1]:
%pip install -q openai

## Batch input files

Below is an example of requests following OpenAI's batch input format:

In [None]:
# {"custom_id": "request-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "text-embedding-3-small", "input": "This is a banana."}}
# {"custom_id": "request-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "text-embedding-3-small", "input": "Why is the sky blue?"}}

OpenAI treats each line as an independent request and requires specifying the endpoint and model for every line. In contrast, Voyage AI defines the endpoint and model at the batch level. You can use the helper function below to convert your existing OpenAI batch input files into Voyage AI–compatible ones.

In [2]:
openai_requests = """{"custom_id": "request-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "text-embedding-3-small", "input": "This is a banana."}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "text-embedding-3-small", "input": "Why is the sky blue?"}}
"""

In [3]:
import json
from pathlib import Path

def convert_requests(source, output_file_name):
    """Convert OpenAI-style embedding JSONL to VoyageAI JSONL.
    `source` can be a file path or a JSONL string.
    `output_file_name` is the file to write the converted JSONL to.
    """
    if Path(source).is_file():
        with open(source, "r", encoding="utf-8") as f:
            lines = f.readlines()
    else:
        lines = source.strip().splitlines()

    output = []
    for idx, line in enumerate(lines, 1):
        if not line.strip():
            continue
        data = json.loads(line)

        custom_id = data.get("custom_id", f"request-{idx}")
        inp = data["body"]["input"]

        if isinstance(inp, str):
            input_list = [inp]
        elif isinstance(inp, list):
            input_list = [str(x) for x in inp]
        else:
            raise ValueError(f"Unsupported input type for {custom_id}: {type(inp)}")

        new_data = {"custom_id": custom_id, "body": {"input": input_list}}
        output.append(json.dumps(new_data, ensure_ascii=False))

    with open(output_file_name, "w", encoding="utf-8") as f:
        f.write("\n".join(output))

    return output_file_name


Let's use the function above to convert the OpenAI-formatted batch input file to a Voyage AI-formatted input file.

In [4]:
# Convert OpenAI-formatted requests into Voyage AI-compatible ones
input_file_name = "batch_input_file.jsonl"
voyageai_requests_file = convert_requests(openai_requests, input_file_name)

Instantiate an OpenAI client with your Voyage API key and by providing the base URL:

In [7]:
from openai import OpenAI

client = OpenAI(
    api_key="your_voyage_api_key",
    base_url="https://api.voyageai.com/v1"
)

# Upload file

Let's upload the batch input file to Voyage AI's Files API.

In [8]:
# Upload file
batch_input_file = client.files.create( file=open(input_file_name, "rb"), purpose="batch")
print(batch_input_file)

FileObject(id='file-7vT5tYiajeqrFpsqCT8fbxCp46vwf8R4Qwdh8', bytes=140, created_at='2025-12-01T19:18:23.455349+00:00', filename='batch_input_file.jsonl', object='file', purpose='batch', status=None, expires_at='2025-12-31T19:18:23.455349+00:00', status_details=None)


The file has been uploaded using the Files API. You can list all the files available with:

In [9]:
# Listing files
files = client.files.list().data
print(files)

[FileObject(id='file-7vT5tYiajeqrFpsqCT8fbxCp46vwf8R4Qwdh8', bytes=140, created_at='2025-12-01T19:18:23.455349+00:00', filename='batch_input_file.jsonl', object='file', purpose='batch', status=None, expires_at='2025-12-31T19:18:23.455349+00:00', status_details=None)]


# Create batch job

The Voyage AI Batch API only requires specifying the model name through the extra_body attribute of the OpenAI client. Additional parameters, such as dimensionality or output data type, can also be provided.

Let's create the batch by specifying the previously uploaded file ID, the endpoint (/v1/embeddings or /v1/rerank), the completion window, and any additional parameters.

In [10]:
# Create batch
batch_input_file_id = batch_input_file.id
batch = client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/embeddings",
    completion_window="12h",
    extra_body={
        "request_params": {
          "model": "voyage-3.5",
          "input_type": "document",
          "output_dimension": 512,
          "output_dtype": "uint8"}
    }
)
print(batch)

Batch(id='batch-7vT5tYo5G6MNA3N1PMKrgMqNMErVVkmBNJq9U', completion_window='12h', created_at='2025-12-01T19:18:49.309117+00:00', endpoint='/v1/embeddings', input_file_id='file-7vT5tYiajeqrFpsqCT8fbxCp46vwf8R4Qwdh8', object='batch', status='validating', cancelled_at=None, cancelling_at=None, completed_at=None, error_file_id=None, errors=None, expired_at=None, expires_at=None, failed_at=None, finalizing_at=None, in_progress_at=None, metadata=None, model='voyage-3.5', output_file_id=None, request_counts=BatchRequestCounts(completed=0, failed=0, total=0), usage=None, expected_completion_at='2025-12-02T07:18:49.309117+00:00')


# Checking batch status

Once the batch job has been submitted, you can check its status as follows:

In [11]:
batch_job = client.batches.retrieve(batch.id)
print(batch_job.status)

completed


# Retrieve results

A batch job moves through several phases such as validation and in_progress, then eventually ends in a final state: completed, failed, canceled, or expired. Before you can retrieve the results, you must wait for the job to reach a terminal state so the output file and any optional error file are available. In the example below, we wait for the job to finish and, if it completes successfully, download the output files with the results.

In [12]:
TERMINAL_STATUSES = {"completed", "failed", "cancelled", "expired"}

batch_id = batch.id  # or set manually

batch_job = client.batches.retrieve(batch_id)

# Check batch job status
if batch_job.status in TERMINAL_STATUSES:
    if batch_job.status == "completed":
        output_file_id = batch_job.output_file_id
        result = client.files.content(output_file_id).content

        output_file_name = "output_file.jsonl"
        with open(output_file_name, "wb") as f:
            f.write(result)

        print(f"Output file saved: {output_file_name}")
    else:
        print(f"Batch ended with status: {batch_job.status}")
else:
    print(f"Batch job is still in progress. Current status: {batch_job.status}")

Output file saved: output_file.jsonl


# Check results

Let's take a look at the output file and check the results:

In [14]:
# Display the first 2 rows of the saved output file
with open("output_file.jsonl", "r", encoding="utf-8") as f:
    for i, line in enumerate(f):
        if i >= 2:
            break
        print(line.rstrip())

{"custom_id": "request-1", "response": {"status_code": 200, "body": {"object": "list", "data": [{"object": "embedding", "embedding": [114, 147, 126, 152, 131, 108, 88, 112, 119, 133, 177, 148, 116, 137, 104, 143, 132, 152, 132, 127, 119, 114, 81, 138, 153, 123, 135, 101, 104, 113, 123, 97, 150, 96, 111, 146, 120, 115, 147, 131, 142, 87, 113, 137, 118, 114, 105, 158, 99, 143, 113, 177, 131, 90, 148, 117, 116, 109, 133, 139, 115, 104, 127, 104, 160, 127, 82, 125, 106, 134, 146, 144, 112, 99, 119, 135, 125, 115, 121, 100, 138, 121, 151, 120, 161, 124, 78, 141, 130, 123, 92, 140, 126, 127, 134, 74, 147, 84, 114, 131, 133, 99, 112, 110, 84, 137, 128, 109, 101, 138, 108, 127, 123, 103, 146, 134, 108, 127, 148, 103, 130, 111, 119, 171, 102, 144, 131, 87, 146, 125, 139, 140, 150, 128, 139, 104, 121, 104, 77, 141, 119, 135, 135, 112, 127, 129, 102, 121, 120, 127, 177, 155, 133, 111, 94, 112, 106, 137, 127, 108, 96, 163, 147, 104, 111, 109, 142, 127, 100, 162, 148, 116, 148, 139, 135, 159, 118, 

# Listing all batches

You can list all the batches as follows:

In [13]:
# Listing all batches
batches = client.batches.list().data
for batch in batches:
  print(batch.id)

batch-7vT5tYo5G6MNA3N1PMKrgMqNMErVVkmBNJq9U


# Not supported

Some methods from the OpenAI SDK are not available with Voyage AI, such as:
- Listing models - client.models.list()
- Retrieving a model - client.models.retrieve("xxx")