-
Notifications
You must be signed in to change notification settings - Fork 299
RuntimeError: CUDA out of memory #19
Comments
I also tried:
Before running And it has no effect. |
The way I replaced the noisy / clean json files, is I created them manually, so instead of having something like this:
Through python, I created ones like so:
and for the clean files I would give each file the corresponding index (for the corresponding clean audio file):
etc. Where the second number would be the index of the file. Not sure if that's what's responsible for the problem or not. |
To be clear, in this same session, when I run the debug noisy and clean json files with their dataset, they work great. But when I replace my dataset, and clean and noisy json files with the corresponding files, I get this error. |
I'm also curious what the test set files look like, for example, for dns and valentini, those seem to be |
Hi @youssefavx, |
Thanks @adiyoss ! I'll try and see. |
So if I understand what you mean by 'batch size' I thought you were referring to number of files. It turns out that I'm still getting this error when I run it on only 200 files. However, I discovered something. In the config file, when I change the What does this segment value refer to? Number of seconds? My examples are around 5 seconds, is this advisable to keep it at 2? |
batch_size is the number of files you give to the model at each batch iteration, not the number of files in the train directory. Even if you change the number of files in the directory to 200, since the batch_size is equal to 64 (in the config.yaml file), you'll get an error in the first batch. |
Thank you so much for taking the time to explain! Much clearer now. |
The second number should always be the size of the file, you are going to get issues otherwise! it should be automatically generated using like in https://github.com/facebookresearch/denoiser/blob/master/make_debug.sh#L13:
Then you will indeed need to reduce the batch size. Don't forget to set the Good luck! |
@adefossez Thank you so much! You guys are awesome. I'm gonna try this next. I actually did get the generation of the noisy and clean json files to work. Now I'm curious what |
By default demucs upsamples the audio by a factor of 4 before feeding it to the model, so effectively our model handles audio at 64kHz, but because you have audio at 44kHz, its not a great idea to upsample 4 times, as it will become very expansive. |
Thank you! makes sense now. |
Hey guys, in trying to make the first 'hello world' into training this model / fine-tuning it, I basically replaced the debug files
noisy.json
andclean.json
with my own json file content that pointed to my own dataset. The dataset contains around 2.5K files and is around 1GB when at 44kHz and lower when at 16kHz as expected.The problem is that when trying to run this on Colab (which worked with the original toy dataset provided, I'm now getting this unexpected error:
I have different versions of my dataset, from 16kHz, to 22kHz, 32kHz, and 44.1khz. Every time I try one, I get a variant of the same error above. For example, when I try with 44.1kHz:
!python3 train.py demucs.hidden=64 sample_rate=44100
I get:
Whereas when I try 16kHz, I get:
And the 'memory tried to allocate' just varies, the free memory is always just underneath the 'available' memory.
The fact that the 'free memory' varies when I change the dataset (which have different sizes) makes me think it's something entirely different than CUDA being out of memory, though I could be wrong.
The version pytorch I'm running is 1.4.0 and 0.4.0 for torchaudio because otherwise I get an error saying the Cuda driver is out of date.
I get this error when I try to train, and when I try to fine tune.
Am I doing something wrong in setting all this up? Should I be arranging my files differently than the debug ones? I tried to place everything its correct directory, and point everything to its correct directory.
The text was updated successfully, but these errors were encountered: