-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem loading model with GPTNeoX architecture (weight gpt_neox.layers.0.attention.rotary_emb.inv_freq does not exist) #1460
Comments
For additional context - I retrained the model, this time with flash attention, and the error remains the same:
|
@Narsil I think this is probably the same as #790 for Llama which you fixed by removing loading of position embeddings from weights in #793. This fix seems simple enough but it looks like there was some concern about it breaking previous versions of the model / integration tests (but maybe everything was fine) I think this probably broke when position embeddings were removed from the GPTNeoX weights (link to line diff from @ArthurZucker), which looks to have been released in transformers 4.35 so if you don't need Flash Attention, you might be able to downgrade transformers and re-export the model to get things working until fixed |
TYVM for the response. I can see the PR is held up - I'll attempt the downgrade/re-export you mentioned in the meantime. |
# What does this PR do? `transformers` 4.35 removed rotary embeddings from GPTNeoX's weights ([link to line diff](huggingface/transformers@253f9a3#diff-0e2a05d86c82e96f516db8c14070ceb36f53ca44c6bc21a9cd92ad2e777b9cf1R298)). This applies the same fix as #793 which generates them on-the-fly using the appropriate value from the config file Fixes #1460 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [x] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? @OlivierDehaene OR @Narsil
# What does this PR do? `transformers` 4.35 removed rotary embeddings from GPTNeoX's weights ([link to line diff](huggingface/transformers@253f9a3#diff-0e2a05d86c82e96f516db8c14070ceb36f53ca44c6bc21a9cd92ad2e777b9cf1R298)). This applies the same fix as huggingface/text-generation-inference#793 which generates them on-the-fly using the appropriate value from the config file Fixes huggingface/text-generation-inference#1460 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [x] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? @OlivierDehaene OR @Narsil
# What does this PR do? `transformers` 4.35 removed rotary embeddings from GPTNeoX's weights ([link to line diff](huggingface/transformers@253f9a3#diff-0e2a05d86c82e96f516db8c14070ceb36f53ca44c6bc21a9cd92ad2e777b9cf1R298)). This applies the same fix as huggingface#793 which generates them on-the-fly using the appropriate value from the config file Fixes huggingface#1460 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [x] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? @OlivierDehaene OR @Narsil
System Info
docker run --rm --entrypoint /bin/bash -itd --name "traclm-v1-3b-instruct" -v "path/to/folder":/data --gpus '"device=3"' -p 172.20.158.30:8082:80 ghcr.io/huggingface/text-generation-inference:latest
togethercomputer/RedPajama-INCITE-Base-3B-v1
withteknium/GPT4-LLM-Cleaned
, created using the HF Trainer without flash attention (could this be the problem?)-Deployment specificities: N/A, using latest TGI image as of creating this issue.
Information
Tasks
Reproduction
N/A, I am attempting to load the model via standard usage of TGI. The problem is related to the model, which uses the GPTNeoX architecture (pasted below). Note the model was not trained with flash attention, though the provided stacktrace seems to imply it needs to be (e.g. multiple instances of
FlashNeoXAttention()
)Expected behavior
I would expect the TGI server to start as with any other supported model. However, for this model I get the following error:
The text was updated successfully, but these errors were encountered: