New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Base Model Name of LlamaForQuestionAnswering #29258
Fix Base Model Name of LlamaForQuestionAnswering #29258
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR ! Unfortunately this is a breakign change - you could overwrite the base_model_prefix
only for that class though, what do you think?
True, I didn't think about whether renaming the variable would be a breaking change. In this case, setting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be "breaking" but since it was not reported it means it was not used as you mentioned you cannot save + load
@younesbelkada feel free to merge if it is alright with you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks !
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Changes: - HF changed parts of the Llama model implementation - HF added a `LlamaForQuestionAnswering`. However, this model has a wrong base model name. I added a workaround that solves this problem until this is fixed in Transformers (huggingface/transformers#29258) --------- Co-authored-by: calpt <calpt@mail.de>
What does this PR do?
The
LlamaForQuestionAnswering
currently has theLlamaModel
in thetransformer
variable. This does not match thebase_model_prefix
set inLlamaPreTrainedModel
, which is "model".This Pull Request changes the name from
transformer
tomodel
inLlamaForQuestionAnswering
Who can review?