Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Longformer convert error #6465

Closed
Maybewuss opened this issue Aug 13, 2020 · 11 comments
Closed

Longformer convert error #6465

Maybewuss opened this issue Aug 13, 2020 · 11 comments
Labels

Comments

@Maybewuss
Copy link

When i install transformers from source and convert bert to "long vesion", failed.

@Maybewuss Maybewuss changed the title Longformer conver error Longformer convert error Aug 13, 2020
@Maybewuss
Copy link
Author

Error(s) in loading state_dict for RobertaLongForMaskedLM:
size mismatch for embeddings.position_ids: copying a param with shape torch.Size([1, 512]) from checkpoint, the shape in current model is torch.Size([1, 4096]).

@patrickvonplaten
Copy link
Contributor

Hey @Maybewuss,

This is a community notebook, so we don't really plan on maintaining this notebook with current library changes.
Regarding your question I would suggest to post it on https://discuss.huggingface.co/ and/or to contact the author @ibeltagy - maybe he can help you.

Before that it would be nice if you can create a notebook which can be used to re-create your error (replacing RoBERTA with BERT in the above notebook)

@alexyalunin
Copy link

alexyalunin commented Oct 2, 2020

@patrickvonplaten Is there a way of converting existing 'short' models to Longformer? The notebook above (from allennlp) seem not to be useful since you can't automatically convert their 'long' model to Longformer Huggingface's class. The only way I see is to manually remap nodes.

@patrickvonplaten
Copy link
Contributor

Yeah, it is not straight-forward to convert any HF model to its "long" version. You will need to write some special code for this yourself I think. The notebook should work more as an example for how it can be done with a model like Roberta

@NadiaRom
Copy link

NadiaRom commented Oct 29, 2020

I faced the same error with roberta. Size mismatch was in the position embedding and position ids. Adding the following lines to create_long_model helped:

model.roberta.embeddings.position_embeddings.weight.data = new_pos_embed    # add after this line
model.roberta.embeddings.position_embeddings.num_embeddings = len(new_pos_embed.data)
# first, check that model.roberta.embeddings.position_embeddings.weight.data.shape is correct — has to be 4096 (default) of your desired length
model.roberta.embeddings.position_ids = torch.arange(
    0, model.roberta.embeddings.position_embeddings.num_embeddings
)[None]

For some reason number of embeddings didn't change after adding new weight tensor, so we fix it and also add new position ids.
I use torch==1.6.0 and transformers==3.4.0

@MarkusSagen
Copy link
Contributor

MarkusSagen commented Nov 24, 2020

@NadiaRom Been trying this implementation, but the forward pass in RobertaLongSelfAttention gets too many inputs in the forward pass.

class RobertaLongSelfAttention(LongformerSelfAttention):
    def forward(
        self,
        hidden_states,
        attention_mask=None,
        head_mask=None,
        encoder_hidden_states=None,
        encoder_attention_mask=None,
        output_attentions=False,
    ):
        return super().forward(hidden_states, attention_mask=attention_mask, output_attentions=output_attentions)

And doesnt work with the current implementation in the transformer library of the forward pass

Any thought on how to solve this and use the conversion script in the current transformers release (3.5.1)?

@stale
Copy link

stale bot commented Jan 24, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 24, 2021
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been automatically marked as stale and been closed because it has not had recent activity. Thank you for your contributions.

If you think this still needs to be addressed please comment on this thread.

@github-actions github-actions bot closed this as completed Mar 6, 2021
@versae
Copy link
Contributor

versae commented Aug 9, 2021

@MarkusSagen, were you able to solve the forward() issue?

@MarkusSagen
Copy link
Contributor

@versae I only looked at it for a couple of hours and decided it was easier to roll back to an earlier version of transformers. If anyone implements a fix, I would be very interested to hear 😊👌

@versae
Copy link
Contributor

versae commented Aug 10, 2021

@MarkusSagen, this PR makes it work for 4.2.0, and with a couple of changes it also works for 4.9.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants