About apex and the args "--fp16" #33

yumi-cn · 2020-12-11T10:36:42Z

I already install the nvidia/apex module in my env(which is optional said in your project README).

When I try to add args "--fp16" to the train script：

python -u train.py ${DATASET} \
    ... \
    --fp16 \
    ... \
    --tensorboard-logdir ${SAVE}/tensorboard \
    | tee -a $SAVE/train.log

It will occur some errors, the main Error Report is about c10:Error ：

...
terminate called after throwing an instance of 'c10::Error'
...

Something similar to fairsep issue#1683 - closed&no response

I try to find ways to solve this, like add args "--ddp-backend=no_c10d"，but this just cause the same error.

I haven't read all the main codes of project, but I guess you guys maybe more familiar with these problem, so I try to post this issue.

Thanks for replying.

BTW：train without "--fp16" is always fine, and the env is almost the same as the requirement file in README.

The text was updated successfully, but these errors were encountered:

MultiPath · 2020-12-11T17:21:59Z

Hi, I am sorry for replying late as I was busy with other things.
--fp16 (mixed precision training) only works for certain GPUs such as Nvidia V100. It will help to reduce GPU usage.
Maybe your GPU did not support that?

yumi-cn · 2020-12-12T02:46:53Z

Hi, I am sorry for replying late as I was busy with other things.
--fp16 (mixed precision training) only works for certain GPUs such as Nvidia V100. It will help to reduce GPU usage.
Maybe your GPU did not support that?

My GPUs are RTX2080Ti(11GB) x 4 in the server docker env, which I check, it is Turning Arch and has Tensor Core support.

Maybe I use the --fp16 in a wrong way in the command? or other env setting problem, confusing.

MultiPath · 2020-12-21T19:32:47Z

I will check --fp16 recently. I think it should work as I always use fp16 in my early experiments. However, I am afraid it may cause inaccurate rendering results, so I usually turned that off later.

yumi-cn mentioned this issue Dec 11, 2020

Reduce Memory Use of GPUs in one line code. #34

Open

MultiPath closed this as completed Jan 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About apex and the args "--fp16" #33

About apex and the args "--fp16" #33

yumi-cn commented Dec 11, 2020 •

edited

Loading

MultiPath commented Dec 11, 2020

yumi-cn commented Dec 12, 2020

MultiPath commented Dec 21, 2020

About apex and the args "--fp16" #33

About apex and the args "--fp16" #33

Comments

yumi-cn commented Dec 11, 2020 • edited Loading

MultiPath commented Dec 11, 2020

yumi-cn commented Dec 12, 2020

MultiPath commented Dec 21, 2020

yumi-cn commented Dec 11, 2020 •

edited

Loading