-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Densenet121: Training with batch size 256 will encounter CUDA OOM on CI's Tesla T4 #598
Comments
This was referenced Nov 30, 2021
Hi @fmassa, author of memory efficient densenet121 in torchvision. Do you know if anyone would be interested in taking a look at implementing this For reference, Memory Efficient PR: pytorch/vision#1003 |
facebook-github-bot
pushed a commit
that referenced
this issue
Mar 10, 2022
…city (#781) Summary: When a test is flagged as "NotImplemented", there are actually two cases: 1. The test itself doesn't implement or handle the configs, e.g., unsupervised-learning models like pytorch_struct doesn't have `eval()` tests, or the pyhpc models don't have `train()` tests. 2. The test doesn't support running on our T4 CI GPU machine, but it runs totally fine on other GPUs, such as `V100` or `A100`. This PR is to eliminate the second case, so that the test can still run through `run.py` or `run_sweep.py` interfaces. Instead, we flag the test to be `not_implemented` in the `metadata.yaml`, and the CI script `test.py` or `test_bench.py` will read from the metadata and determine they are not suitable to run on the CI machine. This fixes #688, #626, and #598 Pull Request resolved: #781 Reviewed By: aaronenyeshi Differential Revision: D34786277 Pulled By: xuzhao9 fbshipit-source-id: d5d3d884839345f4fcad21ccf541a02d8e705f5f
Fixed by #781 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the original paper, densenet121 is trained with a batch size of 256 on the ImageNet dataset. They are also training densenet121 on the TitanX GPU with 12GB device memory. However CI's Tesla T4 gets out of memory with 16GB device memory.
Paper: https://arxiv.org/pdf/1608.06993.pdf
In the original repo, DenseNet has an option
-optMemory 4
which significantly reduces the memory footprint and is able to train on TitanX. TorchVision's version of densenet121 has memory_efficient option, that still fails to run on T4 (16GB mem).Repo: https://github.com/liuzhuang13/DenseNet#memory-efficient-implementation-newly-added-feature-on-june-6-2017
Densenet121 Torchvision: https://github.com/pytorch/vision/blob/d367a01a18a3ae6bee13d8be3b63fd6a581ea46f/torchvision/models/densenet.py#L162
Densenet121 train/example on T4 fails with CUDA OOM:
Torchvision authors may want to implement
-optMemory 4
optimization to allow training on a single device.The text was updated successfully, but these errors were encountered: