Error with recurrent attention ValueError: too many values to unpack (expected 2) #25

hadaev8 · 2020-09-17T00:38:23Z

Colab Link:
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?authuser=1#scrollTo=cflC2xVxKb5M&line=8&uniqifier=1

Full trace:

<ipython-input-20-cd7d3f9fcf71> in forward(self, batch)
     59         src = self.encoder(batch['inp'])
     60         src = self.pos_encoder(src)
---> 61         src = self.transformer_encoder(src)
     62 
     63         trg = self.decoder(batch['out'][:,:-1])

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, state, memory)
    131         # Apply all the transformers
    132         for i, layer in enumerate(self.layers):
--> 133             x, s = layer(x, state[i])
    134             state[i] = s
    135 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, state, memory)
     77 
     78         # Run the self attention and add it to the input
---> 79         x2, state = self.attention(x, x, x, state)
     80         x = x + self.dropout(x2)
     81 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/attention/self_attention/attention_layer.py in forward(self, query, key, value, state, memory)
     83 
     84         # Reshape them into many heads and compute the attention
---> 85         N, D = query.shape
     86         H = self.n_heads
     87         new_value, state = self.inner_attention(

ValueError: too many values to unpack (expected 2)

The text was updated successfully, but these errors were encountered:

angeloskath · 2020-09-17T10:27:28Z

Hi,

the colab link does not seem to be accessible. Regardless, it appears you are providing a full sequence to the recurrent decoder. Unlike the decoder the recurrent decoder should be provided with one element at a time. If you are using it to train, I suggest that you use a normal decoder to train and use the recurrent one for inference.

Cheers,
Angelos

hadaev8 · 2020-09-17T13:10:44Z

So how it should looks like?

Now im getting error

TypeError: forward() got an unexpected keyword argument 'attn_mask'

with

        self.transformer_encoder = TransformerEncoder([RecurrentTransformerEncoderLayer(
            attention=RecurrentAttentionLayer(RecurrentLinearAttention(), hidden, nhead), d_model=hidden, n_heads=nhead) for _ in range(3)])

        self.transformer_decoder = TransformerDecoder([RecurrentTransformerDecoderLayer(
            self_attention=RecurrentAttentionLayer(RecurrentLinearAttention(), hidden, nhead),
            cross_attention=RecurrentCrossAttentionLayer(RecurrentCrossLinearAttention(), hidden, nhead), d_model=hidden) for _ in range(1)])

Here is link again, should be accessible
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?usp=sharing

angeloskath · 2020-09-17T14:43:26Z

So you can check the docs for how to use the recurrent versions of the transformers.

However, the main idea is that as recurrent models, they expect one input at a time and give you back a state that should be passed together with the input next time they are called. Also, no need for a triangular mask because it is enforced by default since there are no future inputs to attend to.

As a general comment, it makes more sense to use normal transformers for training and evaluation and only use recurrent models when you need to sample from a transformer. This way you can have the best of both worlds.

hadaev8 · 2020-09-17T15:01:01Z

As a general comment, it makes more sense to use normal transformers for training and evaluation and only use recurrent models when you need to sample from a transformer. This way you can have the best of both worlds.

Honestly this does not seems to be very convient.
I need train model, init new one and transfer weights, right?

angeloskath · 2020-09-17T15:43:34Z

Although it might be an inconvenience, the implementations are so vastly different that it makes little sense to have them in the same model. Also keep in mind that you need the recurrent version only for sampling from the model. Not computing perplexity or anything.

So the idea is that you would simply train your model and then deploy the recurrent one.

In addition, using the builders makes creating recurrent or non recurrent models super easy, for instance:

class MyLanguageModel(nn.Module):
    def __init__(self, builder_class):
        super().__init__()
        self.encoder = builder_class.from_kwargs(....).get()

    def forward(self, x, masks, whatever):
        ...

    def forward_recurrent(self, different_x, whatever_else):
        ...

hadaev8 · 2020-09-17T22:29:21Z

@angeloskath
So while training i need to have normal versions of all modules?
Like encoder/encoderllayer/attention?

angeloskath · 2020-09-18T08:06:28Z

Yes. Or to simplify things use the transformer builders.

Also read the docs and the api docs. If you feel that something is missing from the docs then make an issue to suggest its addition, or better yet fork and add it.

I will be closing the issue since it is not a bug of the library. Feel free to reopen it or open another issue if you experience problems.

Cheers,
Angelos

hadaev8 · 2020-09-18T08:33:40Z

Well, maybe its easy to create models, but what about transfering trained weights.
Since layers have different names, i cant just use it like
rnn_model.load_state_dict(model.state_dict)

angeloskath · 2020-09-18T08:39:51Z

Of course you can. That is exactly the point. It is supposed to be used as below:

model1 = TransformerEncoderBuilder().get()
model2 = RecurrentEncoderBuilder().get()

model2.load_state_dict(model1.state_dict())

hadaev8 · 2020-09-18T10:24:53Z

Ah, i confused object name and class name, my mitstake.

hadaev8 · 2020-09-18T17:15:57Z

@angeloskath
Thanks for your patience, but I again have an error I don't understand
Link
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?authuser=1#scrollTo=QgSZNuvrAJrJ&line=10&uniqifier=1
Trace

TypeError                                 Traceback (most recent call last)

<ipython-input-47-35e375e87d60> in <module>()
      7 
      8         output = rnnmodel.pos_decoder(rnnmodel.decoder(out_token), i)
----> 9         output, state = rnnmodel.fc_out(rnnmodel.transformer_decoder(output.squeeze(1), memory, memory_length_mask=encoder_len_mask, state=state))
     10         out_token = output.argmax(-1)[:,-1].unsqueeze(0)
     11         trg_tensor = torch.cat([trg_tensor, out_token], axis=-1)

4 frames

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, memory, memory_length_mask, state)
    271         for i, layer in enumerate(self.layers):
    272             x, s = layer(x, memory, memory_length_mask=memory_length_mask,
--> 273                          state=state[i])
    274             state[i] = s
    275 

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

/content/fast-transformers/fast_transformers/recurrent/transformers.py in forward(self, x, memory, memory_length_mask, state)
    214         # Secondly apply the cross attention and add it to the previous output
    215         x2, cross_state = self.cross_attention(
--> 216             x, memory, memory, memory_length_mask, state=cross_state
    217         )
    218         x = self.norm2(x + self.dropout(x2))

/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

TypeError: forward() got multiple values for argument 'state'

angeloskath closed this as completed Sep 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with recurrent attention ValueError: too many values to unpack (expected 2) #25

Error with recurrent attention ValueError: too many values to unpack (expected 2) #25

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 17, 2020

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 17, 2020

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 17, 2020

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 18, 2020 •

edited

hadaev8 commented Sep 18, 2020

angeloskath commented Sep 18, 2020

hadaev8 commented Sep 18, 2020

hadaev8 commented Sep 18, 2020

Error with recurrent attention ValueError: too many values to unpack (expected 2) #25

Error with recurrent attention ValueError: too many values to unpack (expected 2) #25

Comments

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 17, 2020

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 17, 2020

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 17, 2020

hadaev8 commented Sep 17, 2020

angeloskath commented Sep 18, 2020 • edited

hadaev8 commented Sep 18, 2020

angeloskath commented Sep 18, 2020

hadaev8 commented Sep 18, 2020

hadaev8 commented Sep 18, 2020

angeloskath commented Sep 18, 2020 •

edited