Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torchaudio.compliance.kaldi.fbank does NOT support GPU #613

Closed
csukuangfj opened this issue May 6, 2020 · 9 comments · Fixed by #619
Closed

torchaudio.compliance.kaldi.fbank does NOT support GPU #613

csukuangfj opened this issue May 6, 2020 · 9 comments · Fixed by #619
Assignees

Comments

@csukuangfj
Copy link
Collaborator

The readme.md says

The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration,

However, strong GPU acceleration is not available for torchaudio.compliance.kaldi.fbank.


The following test code

    filename = '/path/to/test.wav'
    waveform, sample_rate = torchaudio.load(filename)
    device = torch.device('cuda', 0)
    waveform = waveform.to(device)

    fbank = torchaudio.compliance.kaldi.fbank(waveform)

produces the following error:

Traceback (most recent call last):
  File "./fbank_test.py", line 27, in <module>
    main()
  File "./fbank_test.py", line 23, in main
    fbank = torchaudio.compliance.kaldi.fbank(waveform)
  File "/xxxxxx/py35/lib/python3.5/site-packages/torchaudio/compliance/kaldi.py", 
line 554, in fbank
    snip_edges, raw_energy, energy_floor, dither, remove_dc_offset, preemphasis_coefficient)
  File "/xxxxx/py35/lib/python3.5/site-packages/torchaudio/compliance/kaldi.py", 
line 180, in _get_window
    signal_log_energy = _get_log_energy(strided_input, EPSILON, energy_floor)  # size (m)
  File "/xxxxx/py35/lib/python3.5/site-packages/torchaudio/compliance/kaldi.py", 
line 113, in _get_log_energy
    log_energy = torch.max(strided_input.pow(2).sum(1), epsilon).log()  # size (m)
RuntimeError: iter.device(arg).is_cuda() INTERNAL ASSERT FAILED at /pytorch/aten/src/ATen/native/cuda/Loops.cuh:56, 
please report a bug to PyTorch.

pytorchaudito is installed using

pip install torchaudio
    print(torch.version.__version__)
    print(torch.version.git_version)
    print(torchaudio.version.__version__)
    print(torchaudio.version.git_version)

produces

1.5.0
4ff3872a2099993bf7e8c588f7182f3df777205b
0.5.0
3305d5c2f935b1512c58faad0aa770432dea6d21
@mthrok
Copy link
Collaborator

mthrok commented May 7, 2020

I could reproduce it with 'test/assets/kaldi_file_8000.wav'.

Env
$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 9.2

OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: version 3.10.2

Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 418.116.00
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.14.2
[pip] torch==1.5.0
[pip] torchaudio==0.5.0a0+3305d5c
[conda] blas                      1.0                         mkl
[conda] mkl                       2020.0                      166
[conda] pytorch                   1.5.0           py3.5_cuda9.2.148_cudnn7.6.3_0    pytorch
[conda] torchaudio                0.5.0                      py35    pytorch

It happens both on CUDA 9.2 and CUDA 10.1.

It happens on torchaudio master. Something is definitely wrong on either torchaudio or pytorch. I will take a look at torchaudio side.

@mthrok
Copy link
Collaborator

mthrok commented May 8, 2020

Hi @csukuangfj

So it turned out that fbank was never GPU-compatible. Sorry about that.
I am working on fixing this issue and so far I have a fix for fbank.
If you have a list of functions from torchaudio.compliance.kaldi you intend to use, can you give me that?
I will also fix them.

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented May 8, 2020

I changed torch audio locally to support GPU.

I only need torchaudio.compliance.kaldi.fbank. It would be great if

  • the window function and the melbank matrix are NOT recomputed every time fbank is called
  • an interface of computing fbank for a batch of audio files is available
  • a similar interface FBank as the following MFCC is implemented
    class MFCC(torch.nn.Module):

I guess during an application, the sample rate is fixed, so the window function and
melbank matrix can be reused.


So it turned out that fbank was never GPU-compatible

Yes, I found that. For example, the EPSILON tensor is always a CPU tensor.

@mthrok
Copy link
Collaborator

mthrok commented May 8, 2020

I only need torchaudio.compliance.kaldi.fbank. It would be great if

  • the window function and the melbank matrix are NOT recomputed every time fbank is called
  • an interface of computing fbank for a batch of audio files is available
  • a similar interface FBank as the following MFCC is implemented
    class MFCC(torch.nn.Module):

I guess during an application, the sample rate is fixed, so the window function and
melbank matrix can be reused.

That is a very valid point. I think a simple approach for torch audio to support that is

  • have a version of fbank where user can provide precomputed melbank and window function
  • then put them in a Transform.

@csukuangfj
Copy link
Collaborator Author

That is a very valid point. I think a simple approach for torch audio to support that is

  • have a version of fbank where user can provide precomputed melbank and window function
  • then put them in a Transform.

Are there any plans to implement the items listed above?

@mthrok
Copy link
Collaborator

mthrok commented May 13, 2020

That is a very valid point. I think a simple approach for torch audio to support that is

  • have a version of fbank where user can provide precomputed melbank and window function
  • then put them in a Transform.

Are there any plans to implement the items listed above?

No, we do not have resource for that at this moment, but we welcome contributions.

@shanesyy-1992
Copy link

shanesyy-1992 commented Nov 22, 2020

Hi @csukuangfj

So it turned out that fbank was never GPU-compatible. Sorry about that.
I am working on fixing this issue and so far I have a fix for fbank.
If you have a list of functions from torchaudio.compliance.kaldi you intend to use, can you give me that?
I will also fix them.

Hello, I got the same issue here. Because of some environment problems, I need to stick on using torch 1.5.1, so the corresponding torchaudio version is 0.5.1. However, I found that only the version starts from 0.6.0 has the fix. I wonder is that possible that the changes here can be updated into the 0.5.1 released version? Many Thanks!

@mthrok
Copy link
Collaborator

mthrok commented Nov 22, 2020

Hi @shanesyy-1992

Unfortunately, we do not have a plan to update the past releases. Since the code is pure Python, the best you can do is to create a patch file, then apply it to your installation.

@shanesyy-1992
Copy link

Got it. Thanks a lot for the reply!

mthrok pushed a commit to mthrok/audio that referenced this issue Feb 26, 2021
`torch::jit::RegisterOperators` is being deprecated. Switch to using `torch::RegisterOperators`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants