-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPTJ] enable common tests and few fixes #14190
[GPTJ] enable common tests and few fixes #14190
Conversation
@@ -112,6 +112,7 @@ def __init__( | |||
use_cache=True, | |||
bos_token_id=50256, | |||
eos_token_id=50256, | |||
tie_word_embeddings=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPT-J does not tie word embeds with lm_head
. This was a very sneaky bug but thankfully didn't affect the model, because resize_position_embeddings
was not correctly implemented in GPTJForCausalLM
so the weights were not tied.
@@ -675,7 +675,7 @@ def custom_forward(*inputs): | |||
GPTJ_START_DOCSTRING, | |||
) | |||
class GPTJForCausalLM(GPTJPreTrainedModel): | |||
_keys_to_ignore_on_load_missing = [r"h\.\d+\.attn\.masked_bias", r"h\.\d+\.attn\.bias", r"lm_head\.weight"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lm_head\.weight
should not be ignored, since GPTJ does not tie word embeds.
|
||
all_model_classes = (GPTJModel, GPTJForCausalLM, GPTJForSequenceClassification) if is_torch_available() else () | ||
all_generative_model_classes = (GPTJForCausalLM,) if is_torch_available() else () | ||
fx_ready_model_classes = all_model_classes | ||
test_pruning = False | ||
test_missing_keys = False | ||
test_model_parallel = False | ||
test_head_masking = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
head_masking is not implemented for GPTJ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for fixing the common tests!
@patil-suraj For the specific issue of
|
@patil-suraj Nevermind! It looks like you caught this and also made additional changes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you @patil-suraj!
Looks good to me too! |
What does this PR do?
Currently, GPTJ does not run common tests because its testes class does not subclass
ModelTesterMixin, GenerationTesterMixin
. This PR enables common tests for GPTJ and fixes a few things along the way.I've run the slow tests manually and verified that they pass.
Thanks a lot, @sgugger for spotting this!
cc @StellaAthena
Fixes #14107