-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion on convergence and memory requirements using ResNet #84
Comments
Thanks for bringing this up. I believe that this is true for training, but not for prediction, so I'll get the info on the website corrected accordingly. At the moment, doing a quick image classification test on a single image prediction task, the If you wish to train a model, there's some more info regarding resnets in #60. |
Hi @beniz and @anguoyang, if the network architecture is fixed (resnet_50, resnet_101,...) then batch_size is the important variable that determines if the algo can be run on a GPU with given RAM. That means for quite a few prediction setups the batch size could be even set to as small as 1, thus reducing necessary GPU RAM to relatively small amounts as pointed out by Emmanuel. Training a network is another matter. However according to my experience this also very much depends on the task. For example I have been able to finetune resnet_50, resnet_101 and even resnet_152 on certain classification tasks with batch sizes as low as 4 or 8 on a single GPU, which requires GPU RAM for resnet_50 of less than 4GB (batch_size=8, according to nvidia-smi), for resnet_101 of less than 6GB (batch_size=8) and for resnet_152 a little more than 5GB (batch_size=4). Classification error was low throughout those experiments. But of course my task was much, much simpler than training ImageNet from scratch, which I do not think is possible that way. All I would like to point out is that depending on the complexity of the task (for finetuning sometimes I set the learning rate of all lower level layers to small numbers or even zero) sometimes relatively small batch sizes can be afforded allowing to use transfer learning / finetuning on the really ultra-deep nets with moderately large (single) GPUs. As a consequence even with limited GPU ressources (such as single 4Gb or 6GB GPU) it really is sometimes possible to use the high-quality ultra-deep nets to learn interesting task and later publish them for prediction on moderately expensive Amazon 4GB GPUs. |
Thanks for all the detailed info. FTR, some people have reported difficulty with convergence, see KaimingHe/deep-residual-networks#6 |
Hi @beniz I just find your quick analysis on the memory usage. Is the GPU memory depends on the input image size? For example, when I use resnet_50 and resize the input image to around 1200 x 4000, out of memory occurs. But when downsize the image to around 900 x 3000, it works. I hope you can provide another quick analysis about the relationship btw image size and memory (fix the batchsize to a small constant). |
The ResNets are fully convolutional, i.e. any size above the initial 224x224 training size works. Of course the memory requirement increases with size. I'd expect a square increase or even above due to the increased number of feature maps. |
Thank you. |
@beniz if input image size larger than 224*224, the units of output of flatten layer will increase. I think that's the reason of memory requirement increases with size. |
I trained faster-rcnn-resnet50 well , but when I use the trained model to predict on same machine , check failed " out of memory", Anyone knows why? |
Hi, I found a description on the dede website that resnet_50 need 6G+ memory to run, is it a "MUST" requirement or just a "PREFER" one? thanks.
The text was updated successfully, but these errors were encountered: