Skip to content

Performance Regression: AudioDecoder(source=bytes) is ~3.6x slower than torchaudio.load for in-memory WAV data. #966

@imbecility

Description

@imbecility

I've discovered a significant performance regression when using torchcodec.AudioDecoder compared to the legacy torchaudio.load for a common data loading pattern with webdataset.

Use Case:
I'm loading millions of short (1-second) WAV files from .tar archives using webdataset. This means the audio data is passed to the decoder as an in-memory bytes object.

Benchmarks:
I ran a benchmark on a dataset of 100,000 samples with 8 worker processes.

  • torchaudio.load(io.BytesIO(wav_bytes), format="wav"): ~3.6 iterations/second.
  • torchcodec.AudioDecoder(source=wav_bytes): ~0.99 iterations/second.

This is a ~3.6x slowdown, which makes torchcodec unusable for this high-throughput scenario, as the data preprocessing becomes a major bottleneck for the GPU.

Hypothesis:
The performance issue is likely due to high overhead from initializing the full FFmpeg context for every single small sample. The older torchaudio.load likely used a more lightweight backend (like libsndfile) for this simple WAV decoding task.

Expected Behavior:
The modern, recommended API (torchcodec) should be at least as fast as, if not faster than, the legacy API for such a fundamental use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions