[perf] PyTorch AMP reverts the grads to fp32, meaning that reduce calls are overweight #402

blefaudeux · 2021-02-18T23:18:08Z

🚀 Feature

Either through torch amp or via a postfix .half() cast (not too nice), make sure that we don't reduce fp32 grads when they were actually computed on fp16..

Motivation

The gradient size are double what they should be if torch AMP is being used, which can be a bottleneck depending on internode coms

Pitch

free speed

Alternatives

Not doing anything, this does not affect correctness

Additional context

blefaudeux self-assigned this Feb 18, 2021

blefaudeux added the enhancement New feature or request label Feb 18, 2021

blefaudeux mentioned this issue Feb 22, 2021

[perf][ShardedDDP] fp16 gradient reduce #411

Merged

4 tasks

blefaudeux closed this as completed in #411 Feb 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[perf] PyTorch AMP reverts the grads to fp32, meaning that reduce calls are overweight #402

[perf] PyTorch AMP reverts the grads to fp32, meaning that reduce calls are overweight #402

blefaudeux commented Feb 18, 2021

[perf] PyTorch AMP reverts the grads to fp32, meaning that reduce calls are overweight #402

[perf] PyTorch AMP reverts the grads to fp32, meaning that reduce calls are overweight #402

Comments

blefaudeux commented Feb 18, 2021

🚀 Feature

Motivation

Pitch

Alternatives

Additional context