-
Notifications
You must be signed in to change notification settings - Fork 64
Description
🐛 Describe the bug
Hello torchcodec team !
Thank you for the last 0.3 release and the long-awaited support for ✨audio decoding✨ ! For now, AudioDecoder
works like a charm, but I have a few remarks that would make it even more pleasant to use :
AudioSamples
shape format
As for now, audio data stored in AudioSamples
is stored in a tensor of shape [num_channels, num_samples]
whereas most audio library (including sounddevice
and soundfile
) opt for a [num_samples, num_channels]
shape. It's just a transposition, and it would be nice to match the reference shape !
- Output format when
start_seconds
andstop_seconds
are identical
When start_seconds
and stop_seconds
are set to the same value the audio data stored in AudioSamples
"loses" the num_channels
dimension, which is set to 0. This actually led to a bunch of issues on my side, and I think it would be nice to either raise the Invalid start seconds
issue when start_seconds
equals stop_seconds
or keep the first dimension equals to the number of channels.
Thank you again for your work on audio decoding, and for taking my remarks into account !
Versions
PyTorch version: 2.7.0
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: macOS 15.4.1 (arm64)
GCC version: Could not collect
Clang version: 16.0.0 (clang-1600.0.26.6)
CMake version: version 4.0.0
Libc version: N/A
Python version: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ] (64-bit runtime)
Python platform: macOS-15.4.1-arm64-arm-64bit
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Apple M4 Pro
Versions of relevant libraries:
[pip3] mypy-extensions==1.0.0
[pip3] numpy==2.2.4
[pip3] torch==2.7.0
[pip3] torchaudio==2.7.0
[pip3] torchcodec==0.3.0
[pip3] torchvision==0.22.0
[conda] libopenvino-pytorch-frontend 2024.4.0 h5833ebf_2 conda-forge
[conda] numpy 2.2.4 pypi_0 pypi
[conda] torch 2.7.0 pypi_0 pypi
[conda] torchaudio 2.7.0 pypi_0 pypi
[conda] torchcodec 0.3.0 pypi_0 pypi