feat: add POST /v1/audio/transcriptions and /v1/audio/speech endpoints by tbille · Pull Request #44 · mozilla-ai/gateway

tbille · 2026-04-14T20:19:13Z

Summary

Add OpenAI-compatible audio transcription (POST /v1/audio/transcriptions) and speech/TTS (POST /v1/audio/speech) endpoints
Proxies through any_llm's atranscription() and aspeech() functions
Adds python-multipart dependency for file upload support

Details

Transcription endpoint (POST /v1/audio/transcriptions):

Accepts multipart/form-data with audio file upload via FastAPI UploadFile
Form fields: model (required), file (required), language, prompt, response_format, temperature, user
Reads file bytes, forwards to atranscription() via any_llm
Returns JSON transcription response

Speech endpoint (POST /v1/audio/speech):

Accepts JSON body: model, input, voice (required), instructions, response_format, speed, user
Returns StreamingResponse with raw binary audio
Content-Type set per format: audio/mpeg (mp3), audio/opus, audio/aac, audio/flac, audio/wav, audio/L16 (pcm)

Both endpoints follow standard gateway flow: auth, rate limiting, budget validation, usage logging, error handling.

New dependency: python-multipart>=0.0.18 (required by FastAPI for UploadFile/File/Form)

Tests: tests/integration/test_audio_endpoint.py — 22 tests:

11 transcription: auth, API key, master key, provider error, usage logging, error logging, optional fields (language, prompt, response_format, temperature)
11 speech: auth, API key, content type mapping, master key, provider error, usage logging, error logging, optional fields (response_format, speed, instructions)

Dependencies: Requires mozilla-ai/any-llm#1036 for atranscription()/aspeech() support in the SDK.

Add OpenAI-compatible audio transcription and speech endpoints to the gateway, proxying requests through any_llm's atranscription() and aspeech() functions. Transcription endpoint (POST /v1/audio/transcriptions): - Accepts multipart/form-data with audio file upload - Supports optional language, prompt, response_format, temperature fields - Returns JSON transcription response Speech endpoint (POST /v1/audio/speech): - Accepts JSON body with model, input text, and voice - Returns raw binary audio with correct Content-Type per format - Supports optional instructions, response_format, speed fields Both endpoints include full auth, rate limiting, budget validation, and usage logging. Added python-multipart dependency for file uploads. - 22 integration tests covering auth, usage logging, error handling, optional fields, and content type mapping - OpenAPI spec regenerated Depends on: mozilla-ai/any-llm#1036

tbille had a problem deploying to integration-tests April 14, 2026 20:19 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add POST /v1/audio/transcriptions and /v1/audio/speech endpoints#44

feat: add POST /v1/audio/transcriptions and /v1/audio/speech endpoints#44
tbille wants to merge 1 commit intomainfrom
feat/audio-endpoints

tbille commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tbille commented Apr 14, 2026

Summary

Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant