Performance Regression: `AudioDecoder(source=bytes)` is ~3.6x slower than `torchaudio.load` for in-memory WAV data.

I've discovered a significant performance regression when using `torchcodec.AudioDecoder` compared to the legacy `torchaudio.load` for a common data loading pattern with `webdataset`.

**Use Case:**
I'm loading millions of short (1-second) WAV files from `.tar` archives using `webdataset`. This means the audio data is passed to the decoder as an in-memory `bytes` object.

**Benchmarks:**
I ran a benchmark on a dataset of 100,000 samples with 8 worker processes.
- **`torchaudio.load(io.BytesIO(wav_bytes), format="wav")`**: ~3.6 iterations/second.
- **`torchcodec.AudioDecoder(source=wav_bytes)`**: ~0.99 iterations/second.

This is a **~3.6x slowdown**, which makes `torchcodec` unusable for this high-throughput scenario, as the data preprocessing becomes a major bottleneck for the GPU.

**Hypothesis:**
The performance issue is likely due to high overhead from initializing the full FFmpeg context for every single small sample. The older `torchaudio.load` likely used a more lightweight backend (like `libsndfile`) for this simple WAV decoding task.

**Expected Behavior:**
The modern, recommended API (`torchcodec`) should be at least as fast as, if not faster than, the legacy API for such a fundamental use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance Regression: `AudioDecoder(source=bytes)` is ~3.6x slower than `torchaudio.load` for in-memory WAV data. #966

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Regression: AudioDecoder(source=bytes) is ~3.6x slower than torchaudio.load for in-memory WAV data. #966

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Performance Regression: `AudioDecoder(source=bytes)` is ~3.6x slower than `torchaudio.load` for in-memory WAV data. #966