Quantizers are not DDP/AMP compliant #10

danieltudosiu · 2021-12-07T13:39:45Z

Hi Lucidrains,

Thanks for the amazing work you do by implementing all those papers!

Is there a plan to make the Quantizer be compliant with:

DDP - They need an all gather before calculating anything so the updates are exactly the same across all ranks
AMP - In my experience, if AMP touches upon the quantizers it screws up the gradient magnitudes making it NaN/Overflow

If you want I can have a go at it.

lucidrains · 2021-12-07T17:04:01Z

@danieltudosiu Hi Daniel! No that would be great! Always welcoming contributors :)

lucidrains · 2021-12-10T17:28:24Z

@danieltudosiu do you want to see if https://github.com/lucidrains/vector-quantize-pytorch/releases/tag/0.4.8 fixes the AMP issue?

lucidrains · 2021-12-10T18:54:55Z

as for DDP, i'm guessing just need an allreduce at these two lines? https://github.com/lucidrains/vector-quantize-pytorch/blob/master/vector_quantize_pytorch/vector_quantize_pytorch.py#L153-L155

danieltudosiu · 2021-12-10T19:12:32Z

as for DDP, i'm guessing just need an allreduce at these two lines? https://github.com/lucidrains/vector-quantize-pytorch/blob/master/vector_quantize_pytorch/vector_quantize_pytorch.py#L153-L155

One reduction should happen here but only for the summation of the one hot encoddings (embed_onehot.sum(0)).

And one here for the summation of the embeddings (embed_sum).

@danieltudosiu do you want to see if https://github.com/lucidrains/vector-quantize-pytorch/releases/tag/0.4.8 fixes the AMP issue?

Regarding the AMP part, I am not actively using this codebase since we are close to finishing the project and we have a more barebone implementation ourselves, I was just signalling the issues so after the project I can move to this library ;) .

But from a quick look, I would say it should work. In our case, we have just used the decorator to disable the AMP fully. And given my experience with the VQ logic, I would say it would be a good default (maybe even not giving a chance to enable AMP).

lucidrains · 2021-12-10T19:26:44Z

@danieltudosiu got it! thanks for your input :)

danieltudosiu · 2021-12-10T19:32:35Z

@lucidrains just to be clear the all reduce should be something like this:

    import torch.distributed.distributed_c10d as dist

    if dist.is_initialized():
        dist.all_reduce(tensor=encodings_sum, op=dist.ReduceOp.SUM)
        dist.all_reduce(tensor=dw, op=dist.ReduceOp.SUM)

Where encodings_sum is your embed_onehot.sum(0) and dw is your embed_sum.

lucidrains · 2021-12-10T20:04:05Z

@danieltudosiu hey yup! i think sum is by default anyways :)

https://github.com/lucidrains/vector-quantize-pytorch#ddp

lucidrains closed this as completed Dec 11, 2021

liangcl0928 mentioned this issue Jun 13, 2023

Crash on Mac M1/M2 chip when using MPS support #55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantizers are not DDP/AMP compliant #10

Quantizers are not DDP/AMP compliant #10

danieltudosiu commented Dec 7, 2021

lucidrains commented Dec 7, 2021

lucidrains commented Dec 10, 2021

lucidrains commented Dec 10, 2021

danieltudosiu commented Dec 10, 2021 •

edited

lucidrains commented Dec 10, 2021

danieltudosiu commented Dec 10, 2021

lucidrains commented Dec 10, 2021

Quantizers are not DDP/AMP compliant #10

Quantizers are not DDP/AMP compliant #10

Comments

danieltudosiu commented Dec 7, 2021

lucidrains commented Dec 7, 2021

lucidrains commented Dec 10, 2021

lucidrains commented Dec 10, 2021

danieltudosiu commented Dec 10, 2021 • edited

lucidrains commented Dec 10, 2021

danieltudosiu commented Dec 10, 2021

lucidrains commented Dec 10, 2021

danieltudosiu commented Dec 10, 2021 •

edited