Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM #12

happyvictor008 · 2020-08-25T18:20:11Z

Hello! I was trying to run the code but my process was always being killed by system which said 'out of memory'.
I already set batch size ==1, so what else can I do to run the code?

zizhaozhang · 2020-08-25T22:58:31Z

Thanks for using the code.

Are you using tf 1.15? I met the OOM issue at that version. Pls follow closely of the environment requirements.

happyvictor008 · 2020-08-25T23:02:10Z

Hi, I’m using tensorflow 1.14-gpu and following all of the instructions.

…

On Tue, Aug 25, 2020 at 6:58 PM zzz ***@***.***> wrote: Thanks for using the code. Are you using tf 1.15? I met the OOM issue at that version. Pls follow closely of the environment requirements. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALUCCHUWZOAHWTZKQKALLCDSCQ62HANCNFSM4QK6L6NQ> .

zizhaozhang · 2020-08-25T23:21:30Z

What GPU type do you use? It requires 16GB mem and we tested on V100

happyvictor008 · 2020-08-26T00:31:19Z

Hi, I’m using rtx2070 super which only has 8GB RAM. I think that’s the problem.

…

On Tue, Aug 25, 2020 at 7:21 PM zzz ***@***.***> wrote: What GPU type do you use? It requires 16GB mem and we tested on V100 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALUCCHVPO6OHNJ5L652KM4LSCRBQNANCNFSM4QK6L6NQ> .

happyvictor008 · 2020-08-26T16:36:54Z

Hi,
May I ask is there any way to change the code to reduce the memory use of GPU? I already tried to set batch size == 1, but it still didn't work.
Thanks!

zizhaozhang · 2020-08-28T05:15:00Z

No it can not, we already use batch size = 1.

If you are doing experiments on your own datasets and do not care about COCO benchmarks, you can consider reduce the input image size.

Chen-Song · 2020-11-30T08:59:58Z

@zizhaozhang
Hi, how long does it take to get an experimental result on a single V100? For example, the 1% COCO setting of STAC on Table 1.

zizhaozhang · 2020-12-01T01:34:21Z

@Chen-Song I do not think you can train and get reasonable results using 1 gpu. Since the batch size per GPU is only 1. Too small total batch_size will make network hard to converge.

sisrfeng · 2020-12-01T09:57:45Z

What GPU type do you use? It requires 16GB mem and we tested on V100

I have two 1080 Ti GPUs, and each has 8G mem left.
Will it cause any bug? (I just want to make sure my environment can run the code, and train the model on more GPUs later.)

Many thx!

To help others:

Chen-Song · 2020-12-03T03:18:27Z

@Chen-Song I do not think you can train and get reasonable results using 1 gpu. Since the batch size per GPU is only 1. Too small total batch_size will make network hard to converge.

Thanks, how many GPUs do you use, and how long does it take to get an experimental result in COCO?

zizhaozhang · 2020-12-03T04:55:41Z

@sisrfeng Each GPU needs to have 16 RAM.

@Chen-Song We train on 8 V100 and it takes around 8-10 hours.

zizhaozhang closed this as completed Aug 28, 2020

nuschandra mentioned this issue Apr 13, 2021

Training on a single GPU (Losses keep fluctuating and do not converge) #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM #12

Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM #12

happyvictor008 commented Aug 25, 2020

zizhaozhang commented Aug 25, 2020

happyvictor008 commented Aug 25, 2020 via email

zizhaozhang commented Aug 25, 2020

happyvictor008 commented Aug 26, 2020 via email

happyvictor008 commented Aug 26, 2020

zizhaozhang commented Aug 28, 2020

Chen-Song commented Nov 30, 2020

zizhaozhang commented Dec 1, 2020

sisrfeng commented Dec 1, 2020 •

edited

Chen-Song commented Dec 3, 2020

zizhaozhang commented Dec 3, 2020

Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM #12

Troubles on running code with single GPU (RTX 2070 SUPER) and 16 GB RAM #12

Comments

happyvictor008 commented Aug 25, 2020

zizhaozhang commented Aug 25, 2020

happyvictor008 commented Aug 25, 2020 via email

zizhaozhang commented Aug 25, 2020

happyvictor008 commented Aug 26, 2020 via email

happyvictor008 commented Aug 26, 2020

zizhaozhang commented Aug 28, 2020

Chen-Song commented Nov 30, 2020

zizhaozhang commented Dec 1, 2020

sisrfeng commented Dec 1, 2020 • edited

Chen-Song commented Dec 3, 2020

zizhaozhang commented Dec 3, 2020

sisrfeng commented Dec 1, 2020 •

edited