silero VAD silently falls back to CPU when TensorRT is absent, despite CUDA being available

### Bug Description

When `silero.VAD.load(force_cpu=False)` is called on a machine with a CUDA-capable GPU
but without TensorRT installed, the ONNX Runtime session silently falls back to
CPUExecutionProvider instead of using CUDAExecutionProvider.

The issue is in `livekit/plugins/silero/onnx_model.py`, in the `new_inference_session()`
function. When `force_cpu=False`, the session is created with no explicit `providers`
argument:

    session = onnxruntime.InferenceSession(path, sess_options=opts)

ONNX Runtime's default provider priority list includes TensorrtExecutionProvider before
CUDAExecutionProvider. When TensorRT is not installed (missing `libnvinfer.so.10`),
ORT fails to load the TRT provider and silently falls all the way back to
CPUExecutionProvider — skipping CUDAExecutionProvider entirely.

Verified: explicitly passing `providers=["CUDAExecutionProvider", "CPUExecutionProvider"]`
to the same InferenceSession call works correctly and uses the GPU.


### Expected Behavior

When `force_cpu=False` and a CUDA GPU is available, `silero.VAD.load()` should use
CUDAExecutionProvider regardless of whether TensorRT is installed.

The fix is to explicitly build the provider list in `new_inference_session()` instead
of relying on ORT's default auto-detection, which does not gracefully cascade from
a failed TensorrtExecutionProvider to CUDAExecutionProvider:

    available = onnxruntime.get_available_providers()
    if "CUDAExecutionProvider" in available:
        providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
    else:
        providers = ["CPUExecutionProvider"]
    session = onnxruntime.InferenceSession(path, providers=providers, sess_options=opts)


### Reproduction Steps

```bash
1. Set up a machine with a CUDA GPU and CUDA 12.x drivers (e.g. Azure NC/NV VM with Tesla V100)
2. Install onnxruntime-gpu but NOT TensorRT (libnvinfer.so.10 absent)
3. Verify CUDA provider is listed: `onnxruntime.get_available_providers()`
   → ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
4. Run the following:

    from livekit.plugins import silero
    vad = silero.VAD.load(force_cpu=False)
    # Inspect which provider the underlying ONNX session is using:
    from livekit.plugins.silero.onnx_model import new_inference_session
    sess = new_inference_session(force_cpu=False)
    print(sess.get_providers())  # Prints: ['CPUExecutionProvider']  ← BUG

5. Expected: ['CUDAExecutionProvider', 'CPUExecutionProvider']
6. Confirmed fix: passing providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
   explicitly to InferenceSession produces the correct result.
```

### Operating System

Ubuntu 24.04.4 LTS (Azure VM, NVIDIA Tesla V100 PCIe 16GB, CUDA 12.2, Driver 535.309.01)

### Models Used

Deepgram Nova-3 (STT), Azure OpenAI GPT (LLM), Deepgram Aura-2 (TTS), Silero VAD, LiveKit MultilingualModel (turn detection)

### Package Versions

```bash
livekit-agents==1.5.13
livekit-plugins-silero==1.5.13
onnxruntime-gpu==1.26.0
```

### Session/Room/Call IDs

_No response_

### Proposed Solution

```python
In `livekit/plugins/silero/onnx_model.py`, replace the implicit provider
auto-detection with an explicit provider list in `new_inference_session()`:

Current code (force_cpu=False path):

    else:
        session = onnxruntime.InferenceSession(path, sess_options=opts)

Proposed fix:

    else:
        available = onnxruntime.get_available_providers()
        if "CUDAExecutionProvider" in available:
            providers = ["CUDAExecutionProvider", "CPUExecutionProvider"]
        else:
            providers = ["CPUExecutionProvider"]
        session = onnxruntime.InferenceSession(
            path, providers=providers, sess_options=opts
        )

This skips TensorrtExecutionProvider intentionally — TRT requires a separate
heavyweight install (libnvinfer) that most users won't have, and Silero VAD
does not benefit meaningfully from TRT over plain CUDA. Users who do have TRT
installed and want it can still pass it explicitly via a future `providers`
parameter on `VAD.load()`.
```

### Additional Context

This affects any deployment running on a CUDA-capable machine without TensorRT
(common in cloud VMs, Docker containers, and CI environments). The failure is
completely silent — no warning or error is raised, and the agent runs on CPU
without the developer knowing.

The root cause is a known quirk in ONNX Runtime's Python bindings: when no
`providers` list is given to `InferenceSession`, ORT iterates its internal
priority list (TRT → CUDA → CPU). If TRT fails to load its shared library, the
fallback does NOT cascade to CUDA — it drops straight to CPU. This is different
from the behaviour when providers are listed explicitly, where ORT correctly
falls back through the list (confirmed with onnxruntime-gpu==1.26.0).

Workaround applied locally until this is fixed upstream:
- Patched the installed `onnx_model.py` with the proposed fix above.
- Registered the pip-installed cuDNN path system-wide via ldconfig
  (`/etc/ld.so.conf.d/nvidia-pip-cudnn.conf`) so `libcudnn.so.9` is
  discoverable without setting LD_LIBRARY_PATH manually.

After the patch, `sess.get_providers()` correctly returns
`['CUDAExecutionProvider', 'CPUExecutionProvider']` on every run.


### Screenshots and Recordings

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

silero VAD silently falls back to CPU when TensorRT is absent, despite CUDA being available #5860

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

silero VAD silently falls back to CPU when TensorRT is absent, despite CUDA being available #5860

Description

Bug Description

Expected Behavior

Reproduction Steps

Operating System

Models Used

Package Versions

Session/Room/Call IDs

Proposed Solution

Additional Context

Screenshots and Recordings

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions