-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Description
While you took care of this in the tutorial on Transformers and nn.Transformer
. I just used nn.TransformerEncoder
and realized that this won't initialize parameters in a sensible way on its own. One would create an encoder like this:
encoder_layer = TransformerEncoderLayer(...)
transformer_encoder = TransformerEncoder(encoder_layer, nlayers)
This will yield a Transformer that has the same initialization in all layers, which I think rarely is what you want in neural networks, so it is unexpected. You always need to initialize from the outside again. This is not the usual case in PyTorch and not documented, see https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html#torch.nn.TransformerEncoder
I am not sure which way to better resolve this, through a changed nn.TransformerEncoder
API, just documentation or whether you are working on a rewrite anyways already. Thus, I raised this as an issue first.
Versions
This is a problem in the current version (1.10). This bug does not need to be reproduced, but can be seen relatively easy from the source code of nn.TransformerEncoder
.
cc @ezyang @gchanan @zou3519 @kadeng @svekars @brycebortree @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @bhosmer @cpuhrsch @erichan1 @drisspg @brianjo @kshitij12345