Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

环境使用内存的问题 #21

Closed
chiangandy opened this issue Aug 14, 2019 · 4 comments
Closed

环境使用内存的问题 #21

chiangandy opened this issue Aug 14, 2019 · 4 comments

Comments

@chiangandy
Copy link

我照配置使用small.json去配置,在4GPU每个GPU有8GB的环境去跑还是会出现既济体不够的问题

File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/pytorch_transformers-1.0.0-py3.6.egg/pytorch_transformers/modeling_gpt2.py", line 100, in gelu
return 0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 7.44 GiB total capacity; 6.88 GiB already allocated; 21.50 MiB free; 128.30 MiB cached)

我执行NVIDIA-smi
(pytorch_p36) ubuntu@ip-172-31-38-29:~/GPT2$ nvidia-smi
Wed Aug 14 06:03:59 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:00:1B.0 Off | 0 |
| N/A 34C P8 23W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:00:1C.0 Off | 0 |
| N/A 39C P8 22W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:00:1D.0 Off | 0 |
| N/A 36C P8 22W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:00:1E.0 Off | 0 |
| N/A 39C P8 22W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

确定每台有8GB GRAM, 但是为何看起来只使用到8G做配置

执行命令如下...
python3 train.py --raw --device="0,1,2,3"

这会是哪方面问题?

@Morizeyao
Copy link
Owner

把config_small.json里的参数再改小,直到可以放得下为止,现在还是太大了

@chiangandy
Copy link
Author

我将batch_size设成4就可以过去(原先是8), 请问batch_size会影响输出的结果吗?

现在环境使用状况如下...
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:00:1B.0 Off | 0 |
| N/A 61C P0 128W / 150W | 5633MiB / 7618MiB | 89% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M60 On | 00000000:00:1C.0 Off | 0 |
| N/A 74C P0 123W / 150W | 3857MiB / 7618MiB | 32% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M60 On | 00000000:00:1D.0 Off | 0 |
| N/A 65C P0 126W / 150W | 3857MiB / 7618MiB | 50% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M60 On | 00000000:00:1E.0 Off | 0 |
| N/A 73C P0 121W / 150W | 3857MiB / 7618MiB | 69% Default |
+-------------------------------+----------------------+----------------------+
我将batch_size调成4就可以过去
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 4291 C python3 5622MiB |
| 1 4291 C python3 3846MiB |
| 2 4291 C python3 3846MiB |
| 3 4291 C python3 3846MiB |
+-----------------------------------------------------------------------------+

@Morizeyao
Copy link
Owner

会有一点影响,但是就现在你的硬件条件看,能跑已经不错了,不用奢求太多

@chiangandy
Copy link
Author

了解~感谢指导~

@chiangandy chiangandy changed the title 使用记忆体问题 环境使用内存的问题 Aug 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants