Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with recurrent attention ValueError: too many values to unpack (expected 2) #25

Closed
hadaev8 opened this issue Sep 17, 2020 · 11 comments

Comments

@hadaev8
Copy link
Contributor

hadaev8 commented Sep 17, 2020

Colab Link:
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?authuser=1#scrollTo=cflC2xVxKb5M&line=8&uniqifier=1

Full trace:

<ipython-input-20-cd7d3f9fcf71> in forward(self, batch)
     59         src = self.encoder(batch['inp'])
     60         src = self.pos_encoder(src)
---> 61         src = self.transformer_encoder(src)
     62 
     63         trg = self.decoder(batch['out'][:,:-1])

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, state, memory)
    131         # Apply all the transformers
    132         for i, layer in enumerate(self.layers):
--> 133             x, s = layer(x, state[i])
    134             state[i] = s
    135 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, state, memory)
     77 
     78         # Run the self attention and add it to the input
---> 79         x2, state = self.attention(x, x, x, state)
     80         x = x + self.dropout(x2)
     81 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/attention/self_attention/attention_layer.py in forward(self, query, key, value, state, memory)
     83 
     84         # Reshape them into many heads and compute the attention
---> 85         N, D = query.shape
     86         H = self.n_heads
     87         new_value, state = self.inner_attention(

ValueError: too many values to unpack (expected 2)
@angeloskath
Copy link
Collaborator

Hi,

the colab link does not seem to be accessible. Regardless, it appears you are providing a full sequence to the recurrent decoder. Unlike the decoder the recurrent decoder should be provided with one element at a time. If you are using it to train, I suggest that you use a normal decoder to train and use the recurrent one for inference.

Cheers,
Angelos

@hadaev8
Copy link
Contributor Author

hadaev8 commented Sep 17, 2020

So how it should looks like?

Now im getting error

TypeError: forward() got an unexpected keyword argument 'attn_mask'

with

        self.transformer_encoder = TransformerEncoder([RecurrentTransformerEncoderLayer(
            attention=RecurrentAttentionLayer(RecurrentLinearAttention(), hidden, nhead), d_model=hidden, n_heads=nhead) for _ in range(3)])

        self.transformer_decoder = TransformerDecoder([RecurrentTransformerDecoderLayer(
            self_attention=RecurrentAttentionLayer(RecurrentLinearAttention(), hidden, nhead),
            cross_attention=RecurrentCrossAttentionLayer(RecurrentCrossLinearAttention(), hidden, nhead), d_model=hidden) for _ in range(1)])

Here is link again, should be accessible
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?usp=sharing

@angeloskath
Copy link
Collaborator

So you can check the docs for how to use the recurrent versions of the transformers.

However, the main idea is that as recurrent models, they expect one input at a time and give you back a state that should be passed together with the input next time they are called. Also, no need for a triangular mask because it is enforced by default since there are no future inputs to attend to.

As a general comment, it makes more sense to use normal transformers for training and evaluation and only use recurrent models when you need to sample from a transformer. This way you can have the best of both worlds.

@hadaev8
Copy link
Contributor Author

hadaev8 commented Sep 17, 2020

As a general comment, it makes more sense to use normal transformers for training and evaluation and only use recurrent models when you need to sample from a transformer. This way you can have the best of both worlds.

Honestly this does not seems to be very convient.
I need train model, init new one and transfer weights, right?

@angeloskath
Copy link
Collaborator

Although it might be an inconvenience, the implementations are so vastly different that it makes little sense to have them in the same model. Also keep in mind that you need the recurrent version only for sampling from the model. Not computing perplexity or anything.

So the idea is that you would simply train your model and then deploy the recurrent one.

In addition, using the builders makes creating recurrent or non recurrent models super easy, for instance:

class MyLanguageModel(nn.Module):
    def __init__(self, builder_class):
        super().__init__()
        self.encoder = builder_class.from_kwargs(....).get()

    def forward(self, x, masks, whatever):
        ...

    def forward_recurrent(self, different_x, whatever_else):
        ...

@hadaev8
Copy link
Contributor Author

hadaev8 commented Sep 17, 2020

@angeloskath
So while training i need to have normal versions of all modules?
Like encoder/encoderllayer/attention?

@angeloskath
Copy link
Collaborator

angeloskath commented Sep 18, 2020

Yes. Or to simplify things use the transformer builders.

Also read the docs and the api docs. If you feel that something is missing from the docs then make an issue to suggest its addition, or better yet fork and add it.

I will be closing the issue since it is not a bug of the library. Feel free to reopen it or open another issue if you experience problems.

Cheers,
Angelos

@hadaev8
Copy link
Contributor Author

hadaev8 commented Sep 18, 2020

Well, maybe its easy to create models, but what about transfering trained weights.
Since layers have different names, i cant just use it like
rnn_model.load_state_dict(model.state_dict)

@angeloskath
Copy link
Collaborator

Of course you can. That is exactly the point. It is supposed to be used as below:

model1 = TransformerEncoderBuilder().get()
model2 = RecurrentEncoderBuilder().get()

model2.load_state_dict(model1.state_dict())

@hadaev8
Copy link
Contributor Author

hadaev8 commented Sep 18, 2020

Ah, i confused object name and class name, my mitstake.

@hadaev8
Copy link
Contributor Author

hadaev8 commented Sep 18, 2020

@angeloskath
Thanks for your patience, but I again have an error I don't understand
Link
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?authuser=1#scrollTo=QgSZNuvrAJrJ&line=10&uniqifier=1
Trace

TypeError                                 Traceback (most recent call last)

<ipython-input-47-35e375e87d60> in <module>()
      7 
      8         output = rnnmodel.pos_decoder(rnnmodel.decoder(out_token), i)
----> 9         output, state = rnnmodel.fc_out(rnnmodel.transformer_decoder(output.squeeze(1), memory, memory_length_mask=encoder_len_mask, state=state))
     10         out_token = output.argmax(-1)[:,-1].unsqueeze(0)
     11         trg_tensor = torch.cat([trg_tensor, out_token], axis=-1)

4 frames

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, memory, memory_length_mask, state)
    271         for i, layer in enumerate(self.layers):
    272             x, s = layer(x, memory, memory_length_mask=memory_length_mask,
--> 273                          state=state[i])
    274             state[i] = s
    275 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, memory, memory_length_mask, state)
    214         # Secondly apply the cross attention and add it to the previous output
    215         x2, cross_state = self.cross_attention(
--> 216             x, memory, memory, memory_length_mask, state=cross_state
    217         )
    218         x = self.norm2(x + self.dropout(x2))

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

TypeError: forward() got multiple values for argument 'state'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants