New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with recurrent attention ValueError: too many values to unpack (expected 2) #25
Comments
Hi, the colab link does not seem to be accessible. Regardless, it appears you are providing a full sequence to the recurrent decoder. Unlike the decoder the recurrent decoder should be provided with one element at a time. If you are using it to train, I suggest that you use a normal decoder to train and use the recurrent one for inference. Cheers, |
So how it should looks like? Now im getting error
with
Here is link again, should be accessible |
So you can check the docs for how to use the recurrent versions of the transformers. However, the main idea is that as recurrent models, they expect one input at a time and give you back a As a general comment, it makes more sense to use normal transformers for training and evaluation and only use recurrent models when you need to sample from a transformer. This way you can have the best of both worlds. |
Honestly this does not seems to be very convient. |
Although it might be an inconvenience, the implementations are so vastly different that it makes little sense to have them in the same model. Also keep in mind that you need the recurrent version only for sampling from the model. Not computing perplexity or anything. So the idea is that you would simply train your model and then deploy the recurrent one. In addition, using the builders makes creating recurrent or non recurrent models super easy, for instance: class MyLanguageModel(nn.Module):
def __init__(self, builder_class):
super().__init__()
self.encoder = builder_class.from_kwargs(....).get()
def forward(self, x, masks, whatever):
...
def forward_recurrent(self, different_x, whatever_else):
... |
@angeloskath |
Yes. Or to simplify things use the transformer builders. Also read the docs and the api docs. If you feel that something is missing from the docs then make an issue to suggest its addition, or better yet fork and add it. I will be closing the issue since it is not a bug of the library. Feel free to reopen it or open another issue if you experience problems. Cheers, |
Well, maybe its easy to create models, but what about transfering trained weights. |
Of course you can. That is exactly the point. It is supposed to be used as below: model1 = TransformerEncoderBuilder().get()
model2 = RecurrentEncoderBuilder().get()
model2.load_state_dict(model1.state_dict()) |
Ah, i confused object name and class name, my mitstake. |
@angeloskath
|
Colab Link:
https://colab.research.google.com/drive/1mYTh4MO_Tg6LBrhhVQUd81R92UNE56F7?authuser=1#scrollTo=cflC2xVxKb5M&line=8&uniqifier=1
Full trace:
The text was updated successfully, but these errors were encountered: