Skip to content

rtc.AudioStream(track) returns silence for SIP-originated Opus tracks in livekit 1.1.9 #690

@Zoubag92

Description

@Zoubag92

Bug: rtc.AudioStream(track) returns silence (zeros) for SIP-originated Opus tracks in livekit 1.1.9

Summary

Subscribing to an Opus audio track that originates from a SIP participant produces a stream of correctly-shaped but completely silent (abs_max == 0) AudioFrames in livekit==1.1.9. The exact same setup (same LiveKit server, same livekit-sip gateway, same Asterisk source, same room, same SIP track) produces non-zero audio when consumed by livekit==1.0.23.

Versions

  • livekit==1.1.9 (PyPI, latest)
  • livekit-agents==1.5.14
  • livekit-plugins-silero==1.5.14
  • livekit-server v1.12.0 (latest)
  • livekit-sip v1.3.1 (latest)
  • Python 3.10 on Ubuntu 22.04
  • GPU: NVIDIA T4 (tested with and without CUDA_VISIBLE_DEVICES="" — same result)

Source

Audio originates from Asterisk on the same VM, forwarded via SIP INVITE to livekit-sip (PCMA/PCMU codec on the SIP side). LiveKit SIP transcodes to Opus and publishes as a WebRTC track in a room.

Track properties observed (from the SDK)

participant.kind = 3 (SIP)
publication.source = 2 (SOURCE_MICROPHONE)
publication.kind = 1 (AUDIO)
publication.mime_type = "audio/opus"
publication.muted = False
publication.subscribed = True
track.muted = False
track.sid = "TR_..."

Reproduction

import asyncio
import numpy as np
from livekit import rtc
from livekit.agents import AgentSession, Agent, WorkerOptions, cli, JobContext
from livekit.agents.voice.io import AudioInput

class TestInput(AudioInput):
    def __init__(self, room):
        super().__init__(label="test")
        self._q = asyncio.Queue(maxsize=400)
        self._count = 0
        room.on("track_subscribed", self._on_sub)
        for p in room.remote_participants.values():
            for pub in p.track_publications.values():
                if pub.track and pub.track.kind == rtc.TrackKind.KIND_AUDIO:
                    self._start(pub.track)
                    return

    def _on_sub(self, track, pub, p):
        if track.kind == rtc.TrackKind.KIND_AUDIO:
            self._start(track)

    def _start(self, track):
        stream = rtc.AudioStream(track)  # simplest constructor, defaults
        asyncio.create_task(self._loop(stream))

    async def _loop(self, stream):
        async for ev in stream:
            f = ev.frame
            self._count += 1
            arr = np.frombuffer(f.data, dtype=np.int16)
            if self._count % 250 == 0:
                print(f"frame#{self._count} sr={f.sample_rate} "
                      f"samples={f.samples_per_channel} "
                      f"abs_max={int(np.abs(arr).max())}")
            try: self._q.put_nowait(f)
            except asyncio.QueueFull: pass

    async def __anext__(self):
        return await self._q.get()


async def entrypoint(ctx: JobContext):
    await ctx.connect()
    session = AgentSession()  # any minimal config
    session.input.audio = TestInput(ctx.room)
    await session.start(agent=Agent(instructions="test"), room=ctx.room,
                        room_options=...)  # audio_input=False
## Observed output

frame#1 sr=48000 samples=480 abs_max=0
frame#250 sr=48000 samples=480 abs_max=0
frame#500 sr=48000 samples=480 abs_max=0
frame#750 sr=48000 samples=480 abs_max=0
...


Frames arrive at the expected cadence (100 fps, 10 ms each), the track is reported as `audio/opus` not muted, but **every PCM sample is zero**.

## Expected output

Non-zero amplitude when the SIP caller speaks (caller voice should be audible / decodable).

Confirmed: the exact same room/track/SIP setup with `livekit==1.0.23` produces frames with `abs_max ≈ 1000–8000` (typical telephony audio amplitude), and faster-whisper transcribes correctly downstream.

## What I have tried (none fix it)

| Attempt | Result |
|---|---|
| Use `RoomIO` (default audio capture via `AgentSession.start(room=...)`) | Same silence |
| Use direct `rtc.AudioStream(track)` (manual subscription, AudioInput override) | Same silence |
| `auto_gain_control=False`, `pre_connect_audio=False`, `noise_cancellation=None` in `AudioInputOptions` | No change |
| `audio_sample_rate=16000`, `24000`, `48000` | All produce zeros |
| `CUDA_VISIBLE_DEVICES=""` (force CPU-only, no NVDEC path) | No change, `Nvidia Decoder is supported.` message disappears as expected |
| Upgrade `livekit-server` 1.9.11 → 1.12.0 (includes "silent frame for pcmu/a" fix) | No change |
| Upgrade `livekit-sip` v1.2.0 → v1.3.1 (latest, accumulates 7 months of SIP fixes) | No change |
| Install `livekit` from git main subdirectory=livekit-rtc | Wheel built but no FFI binary committed; not testable without source compile |

## What this rules out

- It is NOT a server-side encoding bug — server stack is at latest.
- It is NOT a SIP gateway bug — gateway is at latest.
- It is NOT the GPU/NVDEC decoder path.
- It is NOT a configuration / audio processing option.
- It is NOT a subscription / track-availability bug — track is subscribed, not muted, mime is reported correctly.
- It IS specific to Python `livekit==1.1.x` (compared with 1.0.23 working on identical setup).

## Suspected location

The Opus decoder in `liblivekit_ffi.so` (1.1.9 binary) appears to be producing silent PCM output for Opus packets originating from livekit-sip's PCMU/PCMA → Opus transcoder. The packets are received (correct cadence and frame size), but every decoded sample is 0.

May be related to Opus DTX, FEC, or frame size negotiation parameters that the Go-side transcoder uses but the new Rust/C++ decoder doesn't accept.

## Asks

1. Confirmation whether a regression has been introduced in the 1.0.x → 1.1.x Opus decoder path.
2. If yes: a fix, or a way to disable / configure around the affected mode.
3. If no: pointer to what configuration could cause `rtc.AudioStream(track)` to return zero samples on a publicly-mime'd `audio/opus` track that the same SDK previously decoded fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions