Skip to content

fp16 (half precision) training doesn't work with 2 or more GPU's #311

@tstandley

Description

@tstandley

For instance when I use the code from @csarofeen 's fp16 example, everything works fine on 1 gpu for both --fp16 and regular 32 bit training. On 2 gpu's, 32 bit training still works fine, but 16 bit training broken.

Training become unstable or results in slower learning curves. Also, validation loss is often NaN.

Tested with several setups including 1 and 2 titan V's with cuda 9.1 and 390.xx and 9.0 on 384.xx

I tried adding
torch.cuda.synchronize()
around the special lines for fp16 as well as casting the output half back to a float before sending it into the criterion. No luck with either idea.

Any help would be appreciated.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions