rtc.AudioStream(track) returns silence for SIP-originated Opus tracks in livekit 1.1.9

# Bug: `rtc.AudioStream(track)` returns silence (zeros) for SIP-originated Opus tracks in livekit 1.1.9

## Summary

Subscribing to an Opus audio track that originates from a SIP participant produces a stream of correctly-shaped but completely silent (`abs_max == 0`) AudioFrames in `livekit==1.1.9`. The exact same setup (same LiveKit server, same livekit-sip gateway, same Asterisk source, same room, same SIP track) produces non-zero audio when consumed by `livekit==1.0.23`.

## Versions

- `livekit==1.1.9` (PyPI, latest)
- `livekit-agents==1.5.14`
- `livekit-plugins-silero==1.5.14`
- `livekit-server` v1.12.0 (latest)
- `livekit-sip` v1.3.1 (latest)
- Python 3.10 on Ubuntu 22.04
- GPU: NVIDIA T4 (tested with and without `CUDA_VISIBLE_DEVICES=""` — same result)

## Source

Audio originates from Asterisk on the same VM, forwarded via SIP INVITE to `livekit-sip` (PCMA/PCMU codec on the SIP side). LiveKit SIP transcodes to Opus and publishes as a WebRTC track in a room.

## Track properties observed (from the SDK)

```
participant.kind = 3 (SIP)
publication.source = 2 (SOURCE_MICROPHONE)
publication.kind = 1 (AUDIO)
publication.mime_type = "audio/opus"
publication.muted = False
publication.subscribed = True
track.muted = False
track.sid = "TR_..."
```

## Reproduction

```python
import asyncio
import numpy as np
from livekit import rtc
from livekit.agents import AgentSession, Agent, WorkerOptions, cli, JobContext
from livekit.agents.voice.io import AudioInput

class TestInput(AudioInput):
    def __init__(self, room):
        super().__init__(label="test")
        self._q = asyncio.Queue(maxsize=400)
        self._count = 0
        room.on("track_subscribed", self._on_sub)
        for p in room.remote_participants.values():
            for pub in p.track_publications.values():
                if pub.track and pub.track.kind == rtc.TrackKind.KIND_AUDIO:
                    self._start(pub.track)
                    return

    def _on_sub(self, track, pub, p):
        if track.kind == rtc.TrackKind.KIND_AUDIO:
            self._start(track)

    def _start(self, track):
        stream = rtc.AudioStream(track)  # simplest constructor, defaults
        asyncio.create_task(self._loop(stream))

    async def _loop(self, stream):
        async for ev in stream:
            f = ev.frame
            self._count += 1
            arr = np.frombuffer(f.data, dtype=np.int16)
            if self._count % 250 == 0:
                print(f"frame#{self._count} sr={f.sample_rate} "
                      f"samples={f.samples_per_channel} "
                      f"abs_max={int(np.abs(arr).max())}")
            try: self._q.put_nowait(f)
            except asyncio.QueueFull: pass

    async def __anext__(self):
        return await self._q.get()


async def entrypoint(ctx: JobContext):
    await ctx.connect()
    session = AgentSession()  # any minimal config
    session.input.audio = TestInput(ctx.room)
    await session.start(agent=Agent(instructions="test"), room=ctx.room,
                        room_options=...)  # audio_input=False
## Observed output

```
frame#1   sr=48000 samples=480 abs_max=0
frame#250 sr=48000 samples=480 abs_max=0
frame#500 sr=48000 samples=480 abs_max=0
frame#750 sr=48000 samples=480 abs_max=0
...
```

Frames arrive at the expected cadence (100 fps, 10 ms each), the track is reported as `audio/opus` not muted, but **every PCM sample is zero**.

## Expected output

Non-zero amplitude when the SIP caller speaks (caller voice should be audible / decodable).

Confirmed: the exact same room/track/SIP setup with `livekit==1.0.23` produces frames with `abs_max ≈ 1000–8000` (typical telephony audio amplitude), and faster-whisper transcribes correctly downstream.

## What I have tried (none fix it)

| Attempt | Result |
|---|---|
| Use `RoomIO` (default audio capture via `AgentSession.start(room=...)`) | Same silence |
| Use direct `rtc.AudioStream(track)` (manual subscription, AudioInput override) | Same silence |
| `auto_gain_control=False`, `pre_connect_audio=False`, `noise_cancellation=None` in `AudioInputOptions` | No change |
| `audio_sample_rate=16000`, `24000`, `48000` | All produce zeros |
| `CUDA_VISIBLE_DEVICES=""` (force CPU-only, no NVDEC path) | No change, `Nvidia Decoder is supported.` message disappears as expected |
| Upgrade `livekit-server` 1.9.11 → 1.12.0 (includes "silent frame for pcmu/a" fix) | No change |
| Upgrade `livekit-sip` v1.2.0 → v1.3.1 (latest, accumulates 7 months of SIP fixes) | No change |
| Install `livekit` from git main subdirectory=livekit-rtc | Wheel built but no FFI binary committed; not testable without source compile |

## What this rules out

- It is NOT a server-side encoding bug — server stack is at latest.
- It is NOT a SIP gateway bug — gateway is at latest.
- It is NOT the GPU/NVDEC decoder path.
- It is NOT a configuration / audio processing option.
- It is NOT a subscription / track-availability bug — track is subscribed, not muted, mime is reported correctly.
- It IS specific to Python `livekit==1.1.x` (compared with 1.0.23 working on identical setup).

## Suspected location

The Opus decoder in `liblivekit_ffi.so` (1.1.9 binary) appears to be producing silent PCM output for Opus packets originating from livekit-sip's PCMU/PCMA → Opus transcoder. The packets are received (correct cadence and frame size), but every decoded sample is 0.

May be related to Opus DTX, FEC, or frame size negotiation parameters that the Go-side transcoder uses but the new Rust/C++ decoder doesn't accept.

## Asks

1. Confirmation whether a regression has been introduced in the 1.0.x → 1.1.x Opus decoder path.
2. If yes: a fix, or a way to disable / configure around the affected mode.
3. If no: pointer to what configuration could cause `rtc.AudioStream(track)` to return zero samples on a publicly-mime'd `audio/opus` track that the same SDK previously decoded fine.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rtc.AudioStream(track) returns silence for SIP-originated Opus tracks in livekit 1.1.9 #690

Bug: `rtc.AudioStream(track)` returns silence (zeros) for SIP-originated Opus tracks in livekit 1.1.9

Summary

Versions

Source

Track properties observed (from the SDK)

Reproduction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

rtc.AudioStream(track) returns silence for SIP-originated Opus tracks in livekit 1.1.9 #690

Description

Bug: rtc.AudioStream(track) returns silence (zeros) for SIP-originated Opus tracks in livekit 1.1.9

Summary

Versions

Source

Track properties observed (from the SDK)

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Bug: `rtc.AudioStream(track)` returns silence (zeros) for SIP-originated Opus tracks in livekit 1.1.9