Extra layer encoder_output_to_decoder_dim cause issue with distributed training #20

ncoop57 · 2022-05-04T21:03:05Z

Hiya, hope Ice Cream is doing well, as well as you of course!

I've been trying to get distributed training working with your library and I discovered this additional Linear layer encoder_output_to_decoder_dim not being used any where:

https://github.com/lucidrains/RETRO-pytorch/blob/main/retro_pytorch/retro_pytorch.py#L491

It seems to be a copy of the layer defined right above it to_decoder_model_dim, which does get used. Having this extra layer that is not part of the loss calculation causes the following error with data parallelism:

[RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.](https://github.com/pytorch/pytorch/issues/43259#)

Not sure if this layer is supposed to be there and it just didn't get used or if it is there by accident, so wanted to ask 🤓

The text was updated successfully, but these errors were encountered:

lucidrains · 2022-05-04T22:46:40Z

thanks for asking about Ice Cream 😄 she is well!

and yes, that is indeed redundant because the code was changed for the encoder to directly output to the decoder dimensions https://github.com/lucidrains/RETRO-pytorch/blob/main/retro_pytorch/retro_pytorch.py#L518

let me go get that fixed! 🙏

lucidrains · 2022-05-04T22:48:13Z

f166262

ncoop57 closed this as completed May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra layer encoder_output_to_decoder_dim cause issue with distributed training #20

Extra layer encoder_output_to_decoder_dim cause issue with distributed training #20

ncoop57 commented May 4, 2022

lucidrains commented May 4, 2022

lucidrains commented May 4, 2022

Extra layer encoder_output_to_decoder_dim cause issue with distributed training #20

Extra layer encoder_output_to_decoder_dim cause issue with distributed training #20

Comments

ncoop57 commented May 4, 2022

lucidrains commented May 4, 2022

lucidrains commented May 4, 2022