[GPTJ] enable common tests and few fixes #14190

patil-suraj · 2021-10-28T12:18:06Z

What does this PR do?

Currently, GPTJ does not run common tests because its testes class does not subclass ModelTesterMixin, GenerationTesterMixin. This PR enables common tests for GPTJ and fixes a few things along the way.
I've run the slow tests manually and verified that they pass.

Thanks a lot, @sgugger for spotting this!

cc @StellaAthena

Fixes #14107

patil-suraj · 2021-10-28T14:39:28Z

src/transformers/models/gptj/configuration_gptj.py

@@ -112,6 +112,7 @@ def __init__(
        use_cache=True,
        bos_token_id=50256,
        eos_token_id=50256,
+        tie_word_embeddings=False,


GPT-J does not tie word embeds with lm_head. This was a very sneaky bug but thankfully didn't affect the model, because resize_position_embeddings was not correctly implemented in GPTJForCausalLM so the weights were not tied.

patil-suraj · 2021-10-28T14:40:07Z

src/transformers/models/gptj/modeling_gptj.py

@@ -675,7 +675,7 @@ def custom_forward(*inputs):
    GPTJ_START_DOCSTRING,
 )
 class GPTJForCausalLM(GPTJPreTrainedModel):
-    _keys_to_ignore_on_load_missing = [r"h\.\d+\.attn\.masked_bias", r"h\.\d+\.attn\.bias", r"lm_head\.weight"]


lm_head\.weight should not be ignored, since GPTJ does not tie word embeds.

patil-suraj · 2021-10-28T14:40:34Z

tests/test_modeling_gptj.py


    all_model_classes = (GPTJModel, GPTJForCausalLM, GPTJForSequenceClassification) if is_torch_available() else ()
    all_generative_model_classes = (GPTJForCausalLM,) if is_torch_available() else ()
    fx_ready_model_classes = all_model_classes
    test_pruning = False
    test_missing_keys = False
    test_model_parallel = False
+    test_head_masking = False


head_masking is not implemented for GPTJ

sgugger

Thanks a lot for fixing the common tests!

alexorona · 2021-10-31T20:22:30Z

@patil-suraj For the specific issue of resize_token_embeddings on gptj (#14107), I got it to work by changing two methods in modeling_gptj.py below. I'm not sure this is right because the model needs to train 2 epochs to get a good result whereas I almost always need just 1 epoch with other model types (gpt-2, gpt-neo, etc.). Will this PR cover the resize_token_embeddings issue? It doesn't seem to make changes to these methods.

def get_output_embeddings(self):
    return self.lm_head

def set_output_embeddings(self, new_embeddings):
    self.lm_head = new_embeddings

alexorona · 2021-10-31T20:29:32Z

@patil-suraj For the specific issue of resize_token_embeddings on gptj (#14107), I got it to work by changing two methods in modeling_gptj.py below. I'm not sure this is right because the model needs to train 2 epochs to get a good result whereas I almost always need just 1 epoch with other model types (gpt-2, gpt-neo, etc.). Will this PR cover the resize_token_embeddings issue? It doesn't seem to make changes to these methods.
def get_output_embeddings(self):
    return self.lm_head

def set_output_embeddings(self, new_embeddings):
    self.lm_head = new_embeddings

@patil-suraj Nevermind! It looks like you caught this and also made additional changes!

LysandreJik

Looks good to me, thank you @patil-suraj!

cc @StellaAthena

StellaAthena · 2021-11-01T13:30:26Z

Looks good to me too!

patil-suraj added 3 commits October 28, 2021 17:28

enable common tests, small fixes

2ed4b7c

don't tie word embeds

a03842c

don't ignore lm_head

daf6721

patil-suraj commented Oct 28, 2021

View reviewed changes

patil-suraj requested review from sgugger, LysandreJik and patrickvonplaten October 28, 2021 14:53

sgugger approved these changes Oct 28, 2021

View reviewed changes

Merge branch 'master' into fix-gptj-resize-embds

3135946

patil-suraj mentioned this pull request Nov 1, 2021

gpt-j input shape after finetuning #13581

Closed

LysandreJik approved these changes Nov 1, 2021

View reviewed changes

patil-suraj merged commit ce91bf9 into huggingface:master Nov 1, 2021

patil-suraj deleted the fix-gptj-resize-embds branch November 1, 2021 17:08

ydshieh mentioned this pull request Jan 20, 2023

Fix GPTJ doctest #21213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPTJ] enable common tests and few fixes #14190

[GPTJ] enable common tests and few fixes #14190

patil-suraj commented Oct 28, 2021 •

edited

Loading

patil-suraj Oct 28, 2021

patil-suraj Oct 28, 2021

patil-suraj Oct 28, 2021

sgugger left a comment

alexorona commented Oct 31, 2021

alexorona commented Oct 31, 2021

LysandreJik left a comment

StellaAthena commented Nov 1, 2021

[GPTJ] enable common tests and few fixes #14190

[GPTJ] enable common tests and few fixes #14190

Conversation

patil-suraj commented Oct 28, 2021 • edited Loading

What does this PR do?

patil-suraj Oct 28, 2021

Choose a reason for hiding this comment

patil-suraj Oct 28, 2021

Choose a reason for hiding this comment

patil-suraj Oct 28, 2021

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

alexorona commented Oct 31, 2021

alexorona commented Oct 31, 2021

LysandreJik left a comment

Choose a reason for hiding this comment

StellaAthena commented Nov 1, 2021

patil-suraj commented Oct 28, 2021 •

edited

Loading