Transformer Initialization

While you took care of this in the tutorial on Transformers and `nn.Transformer`. I just used `nn.TransformerEncoder` and realized that this won't initialize parameters in a sensible way on its own. One would create an encoder like this:
```
encoder_layer = TransformerEncoderLayer(...)
transformer_encoder = TransformerEncoder(encoder_layer, nlayers) 
```
This will yield a Transformer that has the same initialization in all layers, which I think rarely is what you want in neural networks, so it is unexpected. You always need to initialize from the outside again. This is not the usual case in PyTorch and not documented, see https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html#torch.nn.TransformerEncoder

I am not sure which way to better resolve this, through a changed `nn.TransformerEncoder` API, just documentation or whether you are working on a rewrite anyways already. Thus, I raised this as an issue first.

### Versions

This is a problem in the current version (1.10). This bug does not need to be reproduced, but can be seen relatively easy from the source code of `nn.TransformerEncoder`.

cc @ezyang @gchanan @zou3519 @kadeng @svekars @brycebortree @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @bhosmer @cpuhrsch @erichan1 @drisspg @brianjo @kshitij12345

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformer Initialization #72253

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transformer Initialization #72253

Description

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions