Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper V3 model can support long Audio input, can you add an API to support large Audio file as a whole? #1733

Closed
ER-EPR opened this issue Feb 21, 2024 · 8 comments · Fixed by #1846
Labels
enhancement New feature or request roadmap up for grabs Tickets that no-one is currently working on

Comments

@ER-EPR
Copy link

ER-EPR commented Feb 21, 2024

Is your feature request related to a problem? Please describe.

Current whisper api can only take around 20MB audio file per request. Now whisper V3 can work with large audio file. Can you support it too. Eliminate the limitation on audio file size for whisper V3.
Describe the solution you'd like

Allow upload a large audio file.
call whisper with V3 model, transcribe.
Stream the text back to the user.
Describe alternatives you've considered

Additional context

@ER-EPR ER-EPR added the enhancement New feature or request label Feb 21, 2024
@mudler mudler added the roadmap label Feb 22, 2024
@mudler
Copy link
Owner

mudler commented Feb 23, 2024

good point, we need to update whisper

@mudler mudler added the up for grabs Tickets that no-one is currently working on label Feb 23, 2024
@ER-EPR
Copy link
Author

ER-EPR commented Apr 9, 2024

good point, we need to update whisper

but I still can't send 17MByte MP3 audio file to the API :

ClientException
Client error: `POST http://192.168.110.70:8080/v1/audio/transcriptions` resulted in a `413 Request Entity Too Large` response: {"error":{"code":413,"message":"Request Entity Too Large","type":""}}
API request error : Client error: `POST http://192.168.110.70:8080/v1/audio/transcriptions` resulted in a `413 Request Entity Too Large` response: {"error":{"code":413,"message":"Request Entity Too Large","type":""}}

I'm using ggml-whisper-largev3 model with localai:v2.11.0-cublas-cuda12-ffmpeg
Is there something else needs to be changed? @mudler

@sfxworks
Copy link
Contributor

sfxworks commented May 4, 2024

curl http://172.16.1.193/v1/audio/transcriptions -H "Content-Type: multipart/form-data" -F file="@/tmp/audio.mp3" -F model="whisper-3"
{"error":{"code":413,"message":"Request Entity Too Large","type":""}}%

Yeah even a direct curl on a v3 model. I'm using the all in one cuda 12 image here.

@sfxworks
Copy link
Contributor

sfxworks commented May 4, 2024

Downsampled audio file and saved to /tmp/audio_downsampled.mp3 with bitrate 32k
Audio file size in MB: 16.477432

Even under that size,

openai.APIStatusError: Error code: 413 - {'error': {'code': 413, 'message': 'Request Entity Too Large', 'type': ''}}

@sfxworks
Copy link
Contributor

sfxworks commented May 5, 2024

Hey so its currently undocumented but found it in discord

UPLOAD_LIMIT needs to be set to something higher. Set it to 50 and it works.

@ER-EPR
Copy link
Author

ER-EPR commented May 6, 2024

set environment LOCALAI_UPLOAD_LIMIT right? Can I set it to 512?

@sfxworks
Copy link
Contributor

sfxworks commented May 6, 2024

donno but I've managed to upload at least to 100 give it a shot

@fullstackwebdev
Copy link

just for future reference, yes it works

docker run -p 8080:8080 --gpus all --name local-ai2 -ti -e LOCALAI_UPLOAD_LIMIT=512 localai/localai:latest-aio-gpu-nvidia-cuda-12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap up for grabs Tickets that no-one is currently working on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants