torchaudio as an extension of PyTorch
torchaudio has been redesigned to be an extension of PyTorch and part of the domain APIs (DAPI) ecosystem. Domain specific libraries such as this one are kept separated in order to maintain a coherent environment for each of them. As such, torchaudio is an ML library that provides relevant signal processing functionality, but it is not a general signal processing library. The full rationale of this new standardization can be found in the README.md.
In light of these changes some transforms have been removed or have different argument names and conventions. See the section on backwards breaking changes for a migration guide.
We would like to thank our contributors and the wider community for their significant contributions to this release. We are happy to see an active community around torchaudio and are eager to further grow and support it.
In particular we'd like to thank @keunwoochoi, @ksanjeevan, and all the other maintainers and contributors of torchaudio-contrib for their significant and valuable additions around standardization and the support of complex numbers (#131, #110, keunwoochoi/torchaudio-contrib#61, keunwoochoi/torchaudio-contrib#36).
Kaldi Compliance Interface
An implementation of basic transforms with a Kaldi-like interface.
We added the functions spectrogram, fbank, and resample_waveform (#119, #127, and #134). For more details see the documentation on torchaudio.compliance.kaldi which mirrors the arguments and outputs of Kaldi features.
As an example we can look at the sinc interpolation resampling similar to Kaldi’s implementation. In the figure below, the blue dots are the original signal and red dots are the downsampled signal with half the original frequency. The red dot elements are approximately every other original element.
specgram = torchaudio.compliance.kaldi.spectrogram(waveform, frame_length=...) fbank = torchaudio.compliance.kaldi.fbank(waveform, num_mel_bins=...) resampled_waveform = torchaudio.compliance.kaldi.resample_waveform(waveform, orig_freq=...)
Inverse short time Fourier transform
Constructing a signal from a spectrogram can be used in applications like source separation or to generate audio signals to listen to. More specifically torchaudio.functional.istft is the inverse of torch.stft. It has the same parameters (+ additional optional parameter of
length) and returns the least squares estimation of an original signal.
torch.manual_seed(0) n_fft = 5 waveform = torch.rand(2, 5) stft = torch.stft(waveform, n_fft=n_fft) approx_waveform = torchaudio.functional.istft(stft, n_fft=n_fft, length=waveform.size(1)) >>> waveform tensor([[0.4963, 0.7682, 0.0885, 0.1320, 0.3074], [0.6341, 0.4901, 0.8964, 0.4556, 0.6323]]) >>> approx_waveform tensor([[0.4963, 0.7682, 0.0885, 0.1320, 0.3074], [0.6341, 0.4901, 0.8964, 0.4556, 0.6323]])
Please use core abstractions such as nn.Sequential() or a for-loop over a list of transforms.
MELhave been removed. Please use
- Removed formatting transforms (
BLC2CBL): While the LC layout might be common in signal processing, support for it is out of scope of this library and transforms such as LC2CL only aid their proliferation. Please use transpose if you need this behavior.
DownmixMono: Please use division in place of
Scaletorch.nn.functional.pad/trim in place of
PadTrim, torch.mean on the channel dimension in place of
torchaudio.legacyhas been removed. Please use
Spectrogramused to be of dimension (channel, time, freq) and is now (channel, freq, time). Similarly for
MFCC, time is the last dimension. Please see our README for an explanation of the rationale behind these changes. Please use transpose to get the previous behavior.
MuLawExpandingwas renamed to
MuLawDecodingas the inverse of
SpectrogramToDBwas renamed to
AmplitudeToDB( #170). The input does not necessarily have to be a spectrogram and as such can be used in many more cases as the name should reflect.
- torchaudio.compliance.kaldi.spectrogram (#119)
- torchaudio.compliance.kaldi.fbank (#127 )
- torchaudio.compliance.kaldi.resample_waveform (#134)
- torchaudio.functional.istft ( #135 )
- torchaudio.functional.complex_norm (#131)
- torchaudio.functional.angle (#131)
- torchaudio.functional.magphase (#131)
- torchaudio.functional.phase_vocoder (#131)
JIT and CUDA
- JIT support added to
- CUDA support added to