Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of Memory on Synthesis #37

Closed
dyelax opened this issue Mar 13, 2018 · 14 comments
Closed

Out of Memory on Synthesis #37

dyelax opened this issue Mar 13, 2018 · 14 comments

Comments

@dyelax
Copy link
Contributor

dyelax commented Mar 13, 2018

When running python synthesis.py <model_checkpoint_path> <output_dir> --conditional <mel_path> , I consistently run out of GPU memory about 4 minutes into synthesis. I have a GTX 1080Ti (11GB memory), and when I watch nvidia-smi while synthesis is running, the memory usage continually increases until it runs out. How much GPU memory is generally required to synthesize a clip?

For reference, here is the progress on ljspeech-mel-00001.npy before it failed most recently:
33249/195328 [03:57<19:18, 139.90it/s]

@imdatceleste
Copy link

imdatceleste commented Mar 14, 2018

@dyelax : are you getting OOM from NVIDIA Cuda or are you running out of main RAM? I can't see anything in r9y9's code where it would run OOM on GPU, but rather a place where it could run out of memory on main RAM if the audio to be generated is too long.

I have just generated an audio of 240,000 frames (yours is 195,328) and I had no problems with GTX 1080 Ti (11GB). BUT: I have 128GB of RAM...

Also: what sample-rate are you using? You should not go beyond 22-24KHz

@dyelax
Copy link
Contributor Author

dyelax commented Mar 14, 2018

It's definitely a CUDA memory issue. Here's the error I'm getting:

THCudaCheck FAIL file=/tmp/pip-yxt749na-build/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "synthesis.py", line 182, in <module>
    waveform = wavegen(model, length, c=c, g=speaker_id, initial_value=initial_value, fast=True)
  File "synthesis.py", line 124, in wavegen
    log_scale_min=hparams.log_scale_min)
  File "/workspace/wavenet_vocoder/wavenet_vocoder/wavenet.py", line 335, in incremental_forward
    x, h = f.incremental_forward(x, ct, gt)
  File "/workspace/wavenet_vocoder/wavenet_vocoder/modules.py", line 125, in incremental_forward
    return self._forward(x, c, g, True)
  File "/workspace/wavenet_vocoder/wavenet_vocoder/modules.py", line 143, in _forward
    x = self.conv.incremental_forward(x)
  File "/opt/conda/lib/python3.6/site-packages/deepvoice3_pytorch/conv.py", line 40, in incremental_forward
    self.input_buffer[:, :-1, :] = self.input_buffer[:, 1:, :].clone()
RuntimeError: cuda runtime error (2) : out of memory at /tmp/pip-yxt749na-build/aten/src/THC/generic/THCStorage.cu:58

I have 16GB RAM, but again, this definitely seems like a GPU memory problem. Especially sice I can see the GPU memory climbing and hitting max in nvidia-smi. I'm using --preset presets/ljspeech_mixture.json, which should have a sample rate of 22050.

I'm running inside of a docker container (built off the pytorch docker) if that could be a potential issue.

@imdatceleste
Copy link

Hmm, we are using Python 3.5 and no Docker-images. The problem is that .clone() is actually copying data, you you might actually reach a limit. I never had the problem but, you are right, you might be running OOM in Cuda because of cloning the data too many times...
Try reducing samplerate to 16000 or even 12000 Hz, pre-process again and try with that.

Sorry that I can't help more...

@npuichigo
Copy link

@dyelax What about synthesizing on cpu? I think it may run faster.

@neverjoe
Copy link

neverjoe commented Mar 15, 2018

I got this problem in a few days ago, i will sent pr later, i fixed it.
@dyelax @r9y9 @imdatsolak

@r9y9
Copy link
Owner

r9y9 commented Mar 17, 2018

I'm not sure if I can write pytorch code that triggers OOM, without accessing low-level APIs of CUDA. Isn't it a GPU driver bug or pytorch bug?

@r9y9
Copy link
Owner

r9y9 commented Mar 17, 2018

I'm curious to see a fix by @neverjoe.

@aleksas
Copy link
Contributor

aleksas commented Apr 16, 2018

@dyelax Also try to restart server/computer and execute same command again and see if problem persists. Sometimes I run into similar problems after killing training process which seams to later cause memory allocation issues.

@azraelkuan
Copy link
Contributor

azraelkuan commented Apr 18, 2018

I have checked the synthesis process, in the wavenet.py


the x is the type of Variable when we save it in the list, this will case the memory increase in the sample process.
so we should change this to
outputs += [x.cpu().data.numpy()]
and change
current_input = outputs[-1]

to
current_input=Variable(torch.from_numpy(current_input))
if next(self.parameters()).is_cuda: current_input=current_input.cuda()

@butterl
Copy link

butterl commented Apr 26, 2018

@azraelkuan do you meet error when changed to
current_input=Variable(torch.from_numpy(current_input))

Traceback (most recent call last):
  File "synthesis.py", line 187, in <module>
    waveform = wavegen(model, length, c=c, g=speaker_id, initial_value=initial_value, fast=True)
  File "synthesis.py", line 125, in wavegen
    log_scale_min=hparams.log_scale_min)
  File "D:\code\wavenet_vocoder-master\wavenet_vocoder\wavenet.py", line 326, in incremental_forward
    current_input = Variable(torch.from_numpy(current_input))
TypeError: expected np.ndarray (got Tensor)

@azraelkuan
Copy link
Contributor

@butterl i guess that you forget to change the tensor to numpy https://github.com/azraelkuan/wavenet_vocoder/blob/828da55c4e5dd29f05413b4ec7b9afa04bfe39a3/wavenet_vocoder/wavenet.py#L359
you can compare your incremental_forward code with mine

@butterl
Copy link

butterl commented Apr 27, 2018

@azraelkuan Thanks Kuan , checked your repo, after merged all the related code it's OK now 😄

but from my server, the evalution(50+it/s) is much faster than the one before patch(10-13 it/s) , any instruction on the speed up modification ?

@r9y9
Copy link
Owner

r9y9 commented Apr 27, 2018

@azraelkuan

the x is the type of Variable when we save it in the list, this will case the memory increase in the sample process.
so we should change this to
outputs += [x.cpu().data.numpy()]
and change

Sorry for chiming in late. I understand the memory usage increases in the sampling process but I don't think it triggers OOM unless you are trying to synthesis too long audio. I'm wondering CPU<->GPU data transform per-sample is inefficient, though I don't care the speed so much since it's already super slow.

@r9y9
Copy link
Owner

r9y9 commented Apr 27, 2018

This should be fixed by #55. Feel free to reopen if the issue persists.

@r9y9 r9y9 closed this as completed Apr 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants