Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError when run generate.ipynb with default parameter #120

Open
jacquesqiao opened this issue Jul 30, 2023 · 4 comments
Open

AssertionError when run generate.ipynb with default parameter #120

jacquesqiao opened this issue Jul 30, 2023 · 4 comments

Comments

@jacquesqiao
Copy link

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[14], line 2
      1 if use_mingpt:
----> 2     model = GPT.from_pretrained(model_type)
      3 else:
      4     model = GPT2LMHeadModel.from_pretrained(model_type)

File ~/project/llm/minGPT/mingpt/model.py:200, in GPT.from_pretrained(cls, model_type)
    197 transposed = ['attn.c_attn.weight', 'attn.c_proj.weight', 'mlp.c_fc.weight', 'mlp.c_proj.weight']
    198 # basically the openai checkpoints use a "Conv1D" module, but we only want to use a vanilla nn.Linear.
    199 # this means that we have to transpose these weights when we import them
--> 200 assert len(keys) == len(sd)
    201 for k in keys:
    202     if any(k.endswith(w) for w in transposed):
    203         # special treatment for the Conv1D weights we need to transpose

AssertionError: 
@jacquesqiao jacquesqiao changed the title AssertionError when run generate.ipynb AssertionError when run generate.ipynb with default parameter Jul 31, 2023
@hjwdzh
Copy link

hjwdzh commented Aug 22, 2023

Same problem here. Maybe huggingface updated their pretrained model? Did you find a solution?

@ydyjya
Copy link

ydyjya commented Aug 31, 2023

I encountered the same problem, I found the problem was caused by the account of parameters. Then I compared the parameters of sd and sd_hf. The problem seems to be caused by hugging face update GPT2Attentionsource code
I add self.register_buffer("masked_bias", torch.tensor(-1e4), persistent=False) in Model.py, then solve it!

@jasonwvasquez
Copy link

Where did you add that line of code in model.py?

@ToddMorrill
Copy link

ToddMorrill commented Dec 25, 2023

My fix was the following in model.py.

# attn.bias isn't in the hugging face state dict, so we can't check for it
assert len(keys) == len([k for k in sd if not k.endswith('.attn.bias')])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants