Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPTJ] enable common tests and few fixes #14190

Merged
merged 4 commits into from
Nov 1, 2021

Conversation

patil-suraj
Copy link
Contributor

@patil-suraj patil-suraj commented Oct 28, 2021

What does this PR do?

Currently, GPTJ does not run common tests because its testes class does not subclass ModelTesterMixin, GenerationTesterMixin. This PR enables common tests for GPTJ and fixes a few things along the way.
I've run the slow tests manually and verified that they pass.

Thanks a lot, @sgugger for spotting this!

cc @StellaAthena

Fixes #14107

@@ -112,6 +112,7 @@ def __init__(
use_cache=True,
bos_token_id=50256,
eos_token_id=50256,
tie_word_embeddings=False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPT-J does not tie word embeds with lm_head. This was a very sneaky bug but thankfully didn't affect the model, because resize_position_embeddings was not correctly implemented in GPTJForCausalLM so the weights were not tied.

@@ -675,7 +675,7 @@ def custom_forward(*inputs):
GPTJ_START_DOCSTRING,
)
class GPTJForCausalLM(GPTJPreTrainedModel):
_keys_to_ignore_on_load_missing = [r"h\.\d+\.attn\.masked_bias", r"h\.\d+\.attn\.bias", r"lm_head\.weight"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lm_head\.weight should not be ignored, since GPTJ does not tie word embeds.


all_model_classes = (GPTJModel, GPTJForCausalLM, GPTJForSequenceClassification) if is_torch_available() else ()
all_generative_model_classes = (GPTJForCausalLM,) if is_torch_available() else ()
fx_ready_model_classes = all_model_classes
test_pruning = False
test_missing_keys = False
test_model_parallel = False
test_head_masking = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

head_masking is not implemented for GPTJ

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for fixing the common tests!

@alexorona
Copy link
Contributor

@patil-suraj For the specific issue of resize_token_embeddings on gptj (#14107), I got it to work by changing two methods in modeling_gptj.py below. I'm not sure this is right because the model needs to train 2 epochs to get a good result whereas I almost always need just 1 epoch with other model types (gpt-2, gpt-neo, etc.). Will this PR cover the resize_token_embeddings issue? It doesn't seem to make changes to these methods.

def get_output_embeddings(self):
    return self.lm_head

def set_output_embeddings(self, new_embeddings):
    self.lm_head = new_embeddings

@alexorona
Copy link
Contributor

@patil-suraj For the specific issue of resize_token_embeddings on gptj (#14107), I got it to work by changing two methods in modeling_gptj.py below. I'm not sure this is right because the model needs to train 2 epochs to get a good result whereas I almost always need just 1 epoch with other model types (gpt-2, gpt-neo, etc.). Will this PR cover the resize_token_embeddings issue? It doesn't seem to make changes to these methods.

def get_output_embeddings(self):
    return self.lm_head

def set_output_embeddings(self, new_embeddings):
    self.lm_head = new_embeddings

@patil-suraj Nevermind! It looks like you caught this and also made additional changes!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you @patil-suraj!

cc @StellaAthena

@StellaAthena
Copy link
Contributor

Looks good to me too!

@patil-suraj patil-suraj merged commit ce91bf9 into huggingface:master Nov 1, 2021
@patil-suraj patil-suraj deleted the fix-gptj-resize-embds branch November 1, 2021 17:08
@ydshieh ydshieh mentioned this pull request Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GPT-J Models Cannot Load If Tokens Have Been Resized Using resize_token_embeddings Method
5 participants