How to test the model with multi-GPU? #30

lewisyangliu · 2018-04-26T03:19:53Z

Thank you for your excellent code.

I encounter a problem when I train this code. As the code wrote in 'main.py':

while not t.terminate():
t.train()
t.test()

where we can see that the test phase begins immediately after the train phase. However, since the GPU memory is not released, and the test model only runs on a single GPU even though this model can run on 4 GPUs on the train phase. The problem of out of memory occurs then.

Actually, I can run this code successfully on 4 GPUs of GTX 1080Ti, even though the test model only runs on a single GPU. In recent days my work environment changes and I train these netwoks on 4 GPUs of Titan Xp. Although the GPU memory increases the problem of out of memory occurs.

I wonder if we can test the model with multi-GPU just like the train phase. By the way, setting --chop_forward doesn't work for me.

Thank you!

sanghyun-son · 2018-04-26T04:03:49Z

Hello.

Multi-GPU evaluation problem is not related to the memory release you mentioned.

It just happens because the test batch-size is always 1, and only a single GPU takes the batch.

However, our forward_chop function is implemented to utilize multiple GPUs.

Since --chop_forward argument is renamed to --chop, you can use multiple GPUs in evaluation stage with --n_GPUs 4 --chop arguments.

Thank you!

lewisyangliu · 2018-04-26T07:42:08Z

It works now, thanks!

lewisyangliu closed this as completed Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to test the model with multi-GPU? #30

How to test the model with multi-GPU? #30

lewisyangliu commented Apr 26, 2018

sanghyun-son commented Apr 26, 2018 •

edited

Loading

lewisyangliu commented Apr 26, 2018

How to test the model with multi-GPU? #30

How to test the model with multi-GPU? #30

Comments

lewisyangliu commented Apr 26, 2018

sanghyun-son commented Apr 26, 2018 • edited Loading

lewisyangliu commented Apr 26, 2018

sanghyun-son commented Apr 26, 2018 •

edited

Loading