Skip to content

feat: add POST /v1/audio/transcriptions and /v1/audio/speech endpoints#44

Open
tbille wants to merge 1 commit intomainfrom
feat/audio-endpoints
Open

feat: add POST /v1/audio/transcriptions and /v1/audio/speech endpoints#44
tbille wants to merge 1 commit intomainfrom
feat/audio-endpoints

Conversation

@tbille
Copy link
Copy Markdown
Contributor

@tbille tbille commented Apr 14, 2026

Summary

  • Add OpenAI-compatible audio transcription (POST /v1/audio/transcriptions) and speech/TTS (POST /v1/audio/speech) endpoints
  • Proxies through any_llm's atranscription() and aspeech() functions
  • Adds python-multipart dependency for file upload support

Details

Transcription endpoint (POST /v1/audio/transcriptions):

  • Accepts multipart/form-data with audio file upload via FastAPI UploadFile
  • Form fields: model (required), file (required), language, prompt, response_format, temperature, user
  • Reads file bytes, forwards to atranscription() via any_llm
  • Returns JSON transcription response

Speech endpoint (POST /v1/audio/speech):

  • Accepts JSON body: model, input, voice (required), instructions, response_format, speed, user
  • Returns StreamingResponse with raw binary audio
  • Content-Type set per format: audio/mpeg (mp3), audio/opus, audio/aac, audio/flac, audio/wav, audio/L16 (pcm)

Both endpoints follow standard gateway flow: auth, rate limiting, budget validation, usage logging, error handling.

New dependency: python-multipart>=0.0.18 (required by FastAPI for UploadFile/File/Form)

Tests: tests/integration/test_audio_endpoint.py — 22 tests:

  • 11 transcription: auth, API key, master key, provider error, usage logging, error logging, optional fields (language, prompt, response_format, temperature)
  • 11 speech: auth, API key, content type mapping, master key, provider error, usage logging, error logging, optional fields (response_format, speed, instructions)

Dependencies: Requires mozilla-ai/any-llm#1036 for atranscription()/aspeech() support in the SDK.

Add OpenAI-compatible audio transcription and speech endpoints to the
gateway, proxying requests through any_llm's atranscription() and
aspeech() functions.

Transcription endpoint (POST /v1/audio/transcriptions):
- Accepts multipart/form-data with audio file upload
- Supports optional language, prompt, response_format, temperature fields
- Returns JSON transcription response

Speech endpoint (POST /v1/audio/speech):
- Accepts JSON body with model, input text, and voice
- Returns raw binary audio with correct Content-Type per format
- Supports optional instructions, response_format, speed fields

Both endpoints include full auth, rate limiting, budget validation,
and usage logging. Added python-multipart dependency for file uploads.

- 22 integration tests covering auth, usage logging, error handling,
  optional fields, and content type mapping
- OpenAPI spec regenerated

Depends on: mozilla-ai/any-llm#1036
@tbille tbille had a problem deploying to integration-tests April 14, 2026 20:19 — with GitHub Actions Failure
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant