fp16 (half precision) training doesn't work with 2 or more GPU's

For instance when I use the code from @csarofeen 's fp16 example, everything works fine on 1 gpu for both --fp16 and regular 32 bit training. On 2 gpu's, 32 bit training still works fine, but 16 bit training broken.

Training become unstable or results in slower learning curves. Also, validation loss is often NaN.

Tested with several setups including 1 and 2 titan V's with cuda 9.1 and 390.xx and 9.0 on 384.xx

I tried adding  
        `torch.cuda.synchronize()`
around the special lines for fp16 as well as casting the output half back to a float before sending it into the criterion. No luck with either idea.

Any help would be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fp16 (half precision) training doesn't work with 2 or more GPU's #311

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fp16 (half precision) training doesn't work with 2 or more GPU's #311

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions