-
Notifications
You must be signed in to change notification settings - Fork 414
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
2 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5b093d3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this fix the RAM consumption bug, where you can only use up to ~40% of GPU RAM because memory consumption more than doubles at the first epoch and then remains at ~90% thereafter? I was meaning to get around to asking about that since it would seriously get in the way of adding more layers / adding more filters / using bigger inputs / running multiple dcgans with different settings, but if you've fixed it already, great!
5b093d3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gwern it should afaik.
5b093d3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm, I guess not. I set up a new run (with a 256x256px version to take a look now that I thought dcgan.torch might run in constant-memory), tuned to leave ~100MB free on the GPU (balancing a dcgan that big isn't too hard; you just need to lower the learning rate and mini-batch size and keep the usual 2x gen:dis ratio) but no, errored out at epoch 1 with a suggestion that the RAM consumption isn't fixed:
I'm not sure what's going on here... I've tried out several Torch-based applications and dcgan.torch is the only one which has such enormous spikes in RAM usage at checkpoints; eg
char-rnn
andneuraltalk2
can both be pushed to <50MB free without crashing at checkpoints.5b093d3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gwern ok i am simulating a ~100MB free memory setting on my side, and will fix it now.
5b093d3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the spiking issues via: 29b8dbc
You need an updated cutorch (luarocks install cutorch) and the latest dcgan.torch. After that you should be all set. The training took 400MB of GPU memory (less than that, even 380MB did not work). It never exceeds 400MB, even after checkpointing.
5b093d3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I will give that a try and let you know if that fixes it.