I've discovered a significant performance regression when using torchcodec.AudioDecoder compared to the legacy torchaudio.load for a common data loading pattern with webdataset.
Use Case:
I'm loading millions of short (1-second) WAV files from .tar archives using webdataset. This means the audio data is passed to the decoder as an in-memory bytes object.
Benchmarks:
I ran a benchmark on a dataset of 100,000 samples with 8 worker processes.
torchaudio.load(io.BytesIO(wav_bytes), format="wav"): ~3.6 iterations/second.
torchcodec.AudioDecoder(source=wav_bytes): ~0.99 iterations/second.
This is a ~3.6x slowdown, which makes torchcodec unusable for this high-throughput scenario, as the data preprocessing becomes a major bottleneck for the GPU.
Hypothesis:
The performance issue is likely due to high overhead from initializing the full FFmpeg context for every single small sample. The older torchaudio.load likely used a more lightweight backend (like libsndfile) for this simple WAV decoding task.
Expected Behavior:
The modern, recommended API (torchcodec) should be at least as fast as, if not faster than, the legacy API for such a fundamental use case.