Skip to content

Transformer Initialization #72253

@SamuelGabriel

Description

@SamuelGabriel

While you took care of this in the tutorial on Transformers and nn.Transformer. I just used nn.TransformerEncoder and realized that this won't initialize parameters in a sensible way on its own. One would create an encoder like this:

encoder_layer = TransformerEncoderLayer(...)
transformer_encoder = TransformerEncoder(encoder_layer, nlayers) 

This will yield a Transformer that has the same initialization in all layers, which I think rarely is what you want in neural networks, so it is unexpected. You always need to initialize from the outside again. This is not the usual case in PyTorch and not documented, see https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html#torch.nn.TransformerEncoder

I am not sure which way to better resolve this, through a changed nn.TransformerEncoder API, just documentation or whether you are working on a rewrite anyways already. Thus, I raised this as an issue first.

Versions

This is a problem in the current version (1.10). This bug does not need to be reproduced, but can be seen relatively easy from the source code of nn.TransformerEncoder.

cc @ezyang @gchanan @zou3519 @kadeng @svekars @brycebortree @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @bhosmer @cpuhrsch @erichan1 @drisspg @brianjo @kshitij12345

Metadata

Metadata

Assignees

No one assigned

    Labels

    actionablehigh prioritymodule: docsRelated to our documentation, both in docs/ and docblocksmodule: nnRelated to torch.nntriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions