Skip to content

Cannot retrieve the full file content #650

@rade2020

Description

@rade2020

I use this API to create a batch embedding task:
https://api.openai.com/v1/batches

After the task is completed, download the embedding result file based on the output_file_id using the following API:
https://api.openai.com/v1/files/{file_id}/content

Recently, the project ran into an issue. After investigation, it was found that the problem was caused by incomplete downloaded files resulting in missing data.
The program was set to request 5,000 items per batch, but the result file downloaded locally contained only a few hundred items and showed the following message:
Premature end of Content-Length delimited message body (expected: 391505535; received: 113446592

I also tried to retrieve the result file in chunks, but it still failed and I could only get partial data.
The following is the Java code I originally used to download the file. Before this, this method was able to download the result file completely.

public static int downloadFile(String outputFilePath, String url, Map<String, String> headers) throws IOException {
    HttpGet request = new HttpGet(url);
    
    if (headers != null) {
        headers.forEach(request::addHeader);
    }

    CloseableHttpResponse execute = httpClient.execute(request);
    int statusCode = execute.getStatusLine().getStatusCode();
    HttpEntity entity = execute.getEntity();
    InputStream content = entity.getContent();
    try (FileOutputStream outputStream = new FileOutputStream(outputFilePath)) {
        byte[] buffer = new byte[8192];
        int bytesRead;
        while ((bytesRead = content.read(buffer)) != -1) {
            outputStream.write(buffer, 0, bytesRead);
        }
        System.out.println("File downloaded to: " + outputFilePath);
    } catch (IOException e) {
        System.err.println("Error writing to file: " + e.getMessage());
    } finally {
        try {
            content.close();
        } catch (IOException e) {
            System.err.println("Error closing input stream: " + e.getMessage());
        }
    }
    return statusCode;
}

I tried testing the download on the server using wget. After downloading part of the file, it would close and then retry, repeating in a loop.

[root@scripts]# wget --header="Authorization: Bearer token" \
>      -O file.jsonl \
>      https://api.openai.com/v1/files/{fileId}/content
--2025-11-13 02:59:44--  https://api.openai.com/v1/files/{fileId}/content
Resolving api.openai.com (api.openai.com)... 162.159.140.245, 172.66.0.243
Connecting to api.openai.com (api.openai.com)|162.159.140.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 730814739 (697M) [application/octet-stream]
Saving to: ‘file.jsonl’

file.jsonl                                                         10%[===============>                                                                                                                                           ]  73.76M  25.1MB/s    in 2.9s    

2025-11-13 03:00:33 (25.1 MB/s) - Connection closed at byte 77340672. Retrying.

--2025-11-13 03:00:36--  (try: 2)  https://api.openai.com/v1/files/{fileId}/content
Connecting to api.openai.com (api.openai.com)|162.159.140.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 730814739 (697M) [application/octet-stream]
Saving to: ‘file.jsonl’

file.jsonl                                                          3%[====>                                                                                                                                                      ]  23.03M  24.7MB/s    in 0.9s    

2025-11-13 03:00:51 (24.7 MB/s) - Connection closed at byte 77340672. Retrying.

--2025-11-13 03:00:55--  (try: 3)  https://api.openai.com/v1/files/{fileId}/content
Connecting to api.openai.com (api.openai.com)|162.159.140.245|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 730814739 (697M) [application/octet-stream]
Saving to: ‘file.jsonl’

file.jsonl                                                          6%[========>                                                                                                                                                  ]  42.78M  19.3MB/s    in 2.2s    

2025-11-13 03:01:10 (19.3 MB/s) - Connection closed at byte 77340672. Retrying.
HTTP request sent, awaiting response... ^C
[root@scripts]# wc -l file.jsonl 
851 file.jsonl

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions