Skip to content

Feature Request: Add FunASR / SenseVoice as STT backend #5913

@LauraGPT

Description

@LauraGPT

Summary

It would be great to support FunASR and SenseVoice as speech-to-text backends in LiveKit Agents.

Why

  • FunASR is an open-source industrial-grade speech recognition toolkit with real-time streaming support, punctuation restoration, and inverse text normalization built-in. It supports 50+ languages and can run fully offline.
  • SenseVoice is a speech foundation model excelling in multilingual speech recognition, emotion recognition, and audio event detection — with ultra-low latency (~70ms for 10s audio via SenseVoice-Small).
  • Both are Apache 2.0 licensed and available on HuggingFace.

Comparison with Whisper

Feature FunASR/SenseVoice Whisper
Real-time streaming ✅ Native ❌ Chunked
Punctuation & ITN ✅ Built-in ❌ Requires post-processing
Latency ~70ms (SenseVoice-Small) Higher
Languages 50+ 99
License Apache 2.0 MIT

References

Would be happy to help with integration if there is interest!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions