Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM #12

Closed
happyvictor008 opened this issue Aug 25, 2020 · 11 comments
Closed

Comments

@happyvictor008
Copy link

Hello! I was trying to run the code but my process was always being killed by system which said 'out of memory'.
I already set batch size ==1, so what else can I do to run the code?

@zizhaozhang
Copy link
Collaborator

Thanks for using the code.

Are you using tf 1.15? I met the OOM issue at that version. Pls follow closely of the environment requirements.

@happyvictor008
Copy link
Author

happyvictor008 commented Aug 25, 2020 via email

@zizhaozhang
Copy link
Collaborator

What GPU type do you use? It requires 16GB mem and we tested on V100

@happyvictor008
Copy link
Author

happyvictor008 commented Aug 26, 2020 via email

@happyvictor008
Copy link
Author

Hi,
May I ask is there any way to change the code to reduce the memory use of GPU? I already tried to set batch size == 1, but it still didn't work.
Thanks!

@zizhaozhang
Copy link
Collaborator

No it can not, we already use batch size = 1.

If you are doing experiments on your own datasets and do not care about COCO benchmarks, you can consider reduce the input image size.

@Chen-Song
Copy link

@zizhaozhang
Hi, how long does it take to get an experimental result on a single V100? For example, the 1% COCO setting of STAC on Table 1.

@zizhaozhang
Copy link
Collaborator

@Chen-Song I do not think you can train and get reasonable results using 1 gpu. Since the batch size per GPU is only 1. Too small total batch_size will make network hard to converge.

@sisrfeng
Copy link

sisrfeng commented Dec 1, 2020

What GPU type do you use? It requires 16GB mem and we tested on V100

I have two 1080 Ti GPUs, and each has 8G mem left.
Will it cause any bug? (I just want to make sure my environment can run the code, and train the model on more GPUs later.)

Many thx!

To help others:

image

@Chen-Song
Copy link

@Chen-Song I do not think you can train and get reasonable results using 1 gpu. Since the batch size per GPU is only 1. Too small total batch_size will make network hard to converge.

Thanks, how many GPUs do you use, and how long does it take to get an experimental result in COCO?

@zizhaozhang
Copy link
Collaborator

@sisrfeng Each GPU needs to have 16 RAM.

@Chen-Song We train on 8 V100 and it takes around 8-10 hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants