New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

环境使用内存的问题 #21

Closed

chiangandy opened this issue Aug 14, 2019 · 4 comments

chiangandy commented Aug 14, 2019

我照配置使用small.json去配置，在4GPU每个GPU有8GB的环境去跑还是会出现既济体不够的问题

File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_transformers-1.0.0-py3.6.egg/pytorch_transformers/modeling_gpt2.py", line 100, in gelu
return 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 7.44 GiB total capacity; 6.88 GiB already allocated; 21.50 MiB free; 128.30 MiB cached)

我执行NVIDIA-smi
(pytorch_p36) ubuntu@ip-172-31-38-29:~/GPT2$ nvidia-smi
Wed Aug 14 06:03:59 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:00:1B.0 Off | 0 |
| N/A 34C P8 23W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:00:1C.0 Off | 0 |
| N/A 39C P8 22W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:00:1D.0 Off | 0 |
| N/A 36C P8 22W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:00:1E.0 Off | 0 |
| N/A 39C P8 22W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

确定每台有8GB GRAM, 但是为何看起来只使用到8G做配置

执行命令如下...
python3 train.py --raw --device="0,1,2,3"

这会是哪方面问题？

Owner

Morizeyao commented Aug 14, 2019

把config_small.json里的参数再改小，直到可以放得下为止，现在还是太大了

Author

chiangandy commented Aug 14, 2019

我将batch_size设成4就可以过去(原先是8), 请问batch_size会影响输出的结果吗？

现在环境使用状况如下...
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:00:1B.0 Off | 0 |
| N/A 61C P0 128W / 150W | 5633MiB / 7618MiB | 89% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:00:1C.0 Off | 0 |
| N/A 74C P0 123W / 150W | 3857MiB / 7618MiB | 32% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:00:1D.0 Off | 0 |
| N/A 65C P0 126W / 150W | 3857MiB / 7618MiB | 50% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:00:1E.0 Off | 0 |
| N/A 73C P0 121W / 150W | 3857MiB / 7618MiB | 69% Default |
+-------------------------------+----------------------+----------------------+
我将batch_size调成4就可以过去
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 4291 C python3 5622MiB |
| 1 4291 C python3 3846MiB |
| 2 4291 C python3 3846MiB |
| 3 4291 C python3 3846MiB |
+-----------------------------------------------------------------------------+

Owner

Morizeyao commented Aug 14, 2019

会有一点影响，但是就现在你的硬件条件看，能跑已经不错了，不用奢求太多

Author

chiangandy commented Aug 14, 2019

了解～感谢指导～

chiangandy changed the title ~~使用记忆体问题~~ 环境使用内存的问题

Morizeyao closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment