Whisper V3 model can support long Audio input, can you add an API to support large Audio file as a whole? #1733

ER-EPR · 2024-02-21T10:58:37Z

Is your feature request related to a problem? Please describe.

Current whisper api can only take around 20MB audio file per request. Now whisper V3 can work with large audio file. Can you support it too. Eliminate the limitation on audio file size for whisper V3.
Describe the solution you'd like

Allow upload a large audio file.
call whisper with V3 model, transcribe.
Stream the text back to the user.
Describe alternatives you've considered

Additional context

mudler · 2024-02-23T17:53:13Z

good point, we need to update whisper

ER-EPR · 2024-04-09T11:24:24Z

good point, we need to update whisper

but I still can't send 17MByte MP3 audio file to the API :

ClientException
Client error: `POST http://192.168.110.70:8080/v1/audio/transcriptions` resulted in a `413 Request Entity Too Large` response: {"error":{"code":413,"message":"Request Entity Too Large","type":""}}
API request error : Client error: `POST http://192.168.110.70:8080/v1/audio/transcriptions` resulted in a `413 Request Entity Too Large` response: {"error":{"code":413,"message":"Request Entity Too Large","type":""}}

I'm using ggml-whisper-largev3 model with localai:v2.11.0-cublas-cuda12-ffmpeg
Is there something else needs to be changed? @mudler

sfxworks · 2024-05-04T17:45:21Z

curl http://172.16.1.193/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@/tmp/audio.mp3" -F model="whisper-3"
{"error":{"code":413,"message":"Request Entity Too Large","type":""}}%

Yeah even a direct curl on a v3 model. I'm using the all in one cuda 12 image here.

sfxworks · 2024-05-04T17:47:27Z

Downsampled audio file and saved to /tmp/audio_downsampled.mp3 with bitrate 32k
Audio file size in MB: 16.477432

Even under that size,

openai.APIStatusError: Error code: 413 - {'error': {'code': 413, 'message': 'Request Entity Too Large', 'type': ''}}

sfxworks · 2024-05-05T21:05:10Z

Hey so its currently undocumented but found it in discord

UPLOAD_LIMIT needs to be set to something higher. Set it to 50 and it works.

ER-EPR · 2024-05-06T12:14:21Z

set environment LOCALAI_UPLOAD_LIMIT right? Can I set it to 512?

sfxworks · 2024-05-06T14:52:06Z

donno but I've managed to upload at least to 100 give it a shot

fullstackwebdev · 2024-06-24T21:37:45Z

just for future reference, yes it works

docker run -p 8080:8080 --gpus all --name local-ai2 -ti -e LOCALAI_UPLOAD_LIMIT=512 localai/localai:latest-aio-gpu-nvidia-cuda-12

ER-EPR added the enhancement New feature or request label Feb 21, 2024

mudler added the roadmap label Feb 22, 2024

mudler added the up for grabs Tickets that no-one is currently working on label Feb 23, 2024

mudler mentioned this issue Mar 16, 2024

deps(whisper.cpp): update, fix cublas build #1846

Merged

1 task

mudler closed this as completed in #1846 Mar 18, 2024

marcelklehr mentioned this issue Apr 22, 2024

Error message in audio transcription with large files. nextcloud/integration_openai#89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper V3 model can support long Audio input, can you add an API to support large Audio file as a whole? #1733

Whisper V3 model can support long Audio input, can you add an API to support large Audio file as a whole? #1733

ER-EPR commented Feb 21, 2024

mudler commented Feb 23, 2024

ER-EPR commented Apr 9, 2024 •

edited

Loading

sfxworks commented May 4, 2024

sfxworks commented May 4, 2024

sfxworks commented May 5, 2024

ER-EPR commented May 6, 2024

sfxworks commented May 6, 2024

fullstackwebdev commented Jun 24, 2024

Whisper V3 model can support long Audio input, can you add an API to support large Audio file as a whole? #1733

Whisper V3 model can support long Audio input, can you add an API to support large Audio file as a whole? #1733

Comments

ER-EPR commented Feb 21, 2024

mudler commented Feb 23, 2024

ER-EPR commented Apr 9, 2024 • edited Loading

sfxworks commented May 4, 2024

sfxworks commented May 4, 2024

sfxworks commented May 5, 2024

ER-EPR commented May 6, 2024

sfxworks commented May 6, 2024

fullstackwebdev commented Jun 24, 2024

ER-EPR commented Apr 9, 2024 •

edited

Loading