You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hiya, hope Ice Cream is doing well, as well as you of course!
I've been trying to get distributed training working with your library and I discovered this additional Linear layer encoder_output_to_decoder_dim not being used any where:
It seems to be a copy of the layer defined right above it to_decoder_model_dim, which does get used. Having this extra layer that is not part of the loss calculation causes the following error with data parallelism:
[RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.](https://github.com/pytorch/pytorch/issues/43259#)
Not sure if this layer is supposed to be there and it just didn't get used or if it is there by accident, so wanted to ask 🤓
The text was updated successfully, but these errors were encountered:
Hiya, hope Ice Cream is doing well, as well as you of course!
I've been trying to get distributed training working with your library and I discovered this additional Linear layer
encoder_output_to_decoder_dim
not being used any where:https://github.com/lucidrains/RETRO-pytorch/blob/main/retro_pytorch/retro_pytorch.py#L491
It seems to be a copy of the layer defined right above it
to_decoder_model_dim
, which does get used. Having this extra layer that is not part of the loss calculation causes the following error with data parallelism:Not sure if this layer is supposed to be there and it just didn't get used or if it is there by accident, so wanted to ask 🤓
The text was updated successfully, but these errors were encountered: