Add TF implementation of GPT-J #15623

stancld · 2022-02-11T14:29:14Z

What does this PR do?

This PR adds a TensorFlow implementation of GPT-J models

Fixes #15583

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

@LysandreJik @patrickvonplaten

HuggingFaceDocBuilder · 2022-02-11T14:29:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

LysandreJik · 2022-02-22T19:21:00Z

Super exciting! cc @Rocketknight1 and @gante

patil-suraj

Thanks a lot for adding the TF version!
The modeling code looks good to me! Will let @Rocketknight1 and @gante review the TF side of things :-)

src/transformers/models/gptj/modeling_tf_gptj.py

patil-suraj · 2022-02-23T10:41:09Z

tests/test_modeling_tf_gptj.py

+class TFGPTJModelLanguageGenerationTest(unittest.TestCase):
+    @tooslow
+    def test_lm_generate_gptj(self):
+        model = TFGPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", from_pt=True)


We should use TF checkpoint here. Once the PR is approved I will upload the TF checkpoint.

+1 because cross-loading PyTorch checkpoints requires torch to be installed, so we should make conversions whenever possible.

* Adjust split and merge heads to handle 4 and 5-dim tensors * Fix use_cache according to PT implementation * Add some missing comments * Fix formattin of expected_output_ids in the test file

gante · 2022-03-11T12:04:28Z

@gante Bit late but working on the PR now :]

lovely 👌 lmk if you need a hand with anything, let's push this beauty to the finish line

gante

Other than the two issues where you have tagged me, it seems good to go 🔥 I will have a look at those.

patil-suraj · 2022-03-15T15:14:06Z

I have uploaded the TF weights https://huggingface.co/EleutherAI/gpt-j-6B/blob/main/tf_model.h5

@gante Do you know how to save the fp16 weights in TF ? We need those for the float16 branch.

gante · 2022-03-15T15:20:29Z

@gante Do you know how do you save the fp16 weights in TF ? We need those for the float16 branch.

@patil-suraj negative 😬 @Rocketknight1, do you know how to do it?

Rocketknight1 · 2022-03-15T16:49:57Z

Not easy to do in TF, unfortunately! Keras really wants you to do mixed precision, with full-precision weights. I don't think there's any "native" way to convert a Model to use float16 weights except with TFLite stuff, or just manually converting the weight arrays yourself.

patil-suraj · 2022-03-15T17:15:55Z

Okay, thanks for the answer.

or just manually converting the weight arrays yourself.

so should we manually create fp16 weights for the float16 branch ? Not sure if that affects anything in TF.

Rocketknight1 · 2022-03-16T13:07:44Z

I'm not sure, basically - I don't know how you'd load them into the model class without converting them to float32. Keras has some options for forcing dtypes but I don't think they're very well-supported, so as long as we want the model in Keras and not raw TF then I don't really know how to do this. Can we just drop the float16 branch for TF for now?

patil-suraj · 2022-03-22T13:41:32Z

Gently pinging everyone here :) Is this PR good for merge ?

gante · 2022-03-22T17:04:19Z

There are two outstanding issues that require light changes (@stancld):

Other than that, good to go IMO 👍

stancld · 2022-03-25T12:12:43Z

There are two outstanding issues that require light changes (@stancld):

Add TF implementation of GPT-J #15623 (comment)

Add TF implementation of GPT-J #15623 (comment)

Other than that, good to go IMO 👍

@gante I'm gonna solve these issues now O:]

HuggingFaceDocBuilderDev · 2022-03-25T12:45:54Z

The documentation is not available anymore as the PR was closed or merged.

@tooslow

* Update set/get output embeddings method * Update prepare prepare_inputs_for_generation * Skip test_resize_token_embeddings as this part of code is going to undergo a major refactor * Update outputs for @tooslow tests

stancld · 2022-03-25T13:07:27Z

@gante All the remaining issues/comments should be resolved now :] Thanks a lot for your guidance! O:]

gante · 2022-03-25T14:55:00Z

@stancld amazing, thank you so much for this contribution! Adding these models is always a tough task, especially the last mile, so I appreciate your effort 🤗 and I'm sure the community does as well.

@patil-suraj @Rocketknight1 are you cool with me merging this PR? We won't be able to call resize_token_embeddings (and I have an action point to enable that), but other than that seems good to go!

patil-suraj · 2022-03-25T15:16:25Z

Good to merge for me! Maybe just override the resize_token_embeddings method and raise NotImplementedError with a message. But don't feel strongly about this.

gante · 2022-03-25T19:26:13Z

(@Rocketknight1 approved on slack, merging now to get a boost on Friday afternoon good vibes)

stancld added 8 commits February 9, 2022 22:20

Initial commit

eeb2dfe

Add TFGPTJModel

daec2ee

[WIP] Fix some basic issues

148326d

Fix a forward pass

1e0cf12

Merge branch 'master' into tf_gpt-j

b8bee61

Add TFGPTJCausalLM

ff06081

Add TFGPTJForSequenceClassification

d9ea032

Add TFGPTJForQuestionAnswering

342a9dd

stancld added 16 commits February 11, 2022 16:18

Fix docs

f824b25

make fix-copies

a5b0fd4

Merge branch 'master' into tf_gpt-j

089f4ad

Add models into the auto factory

d839d10

Fix shape_list import in a test file

05556c1

make style

2866c49

Merge branch 'master' into tf_gpt-j

0eff264

Merge branch 'master' into tf_gpt-j

25ffba9

Fix - Unable to create link (name already exists)

91e409f

Fix model compilation

37ea8d8

Deal with TF dynamic shapes

8302737

Add Loss parents to models

ec87443

Fix imports

aec1a32

Update keys to ignore + fix scale_attn assignment

66c7dc8

Fix PT-TF equivalence

76b6174

Define product of list items due to python<=3.7

00b4854

stancld marked this pull request as ready for review February 19, 2022 20:46

stancld changed the title ~~[WIP] Add TF implementation of GPT-J~~ Add TF implementation of GPT-J Feb 19, 2022

patrickvonplaten requested a review from patil-suraj February 21, 2022 16:08

patil-suraj reviewed Feb 23, 2022

View reviewed changes

Apply some suggestions from code review

7aec58d

* Adjust split and merge heads to handle 4 and 5-dim tensors * Fix use_cache according to PT implementation * Add some missing comments * Fix formattin of expected_output_ids in the test file

stancld added 7 commits March 11, 2022 14:00

[WIP] Add some new slow/tooslow tests

5332030

.

ad18a5c

Merge branch 'master' into tf_gpt-j

878db21

Add token_type_ids to prepare_inputs_for_generation

730dba4

Add a type hint and a test

7442fab

Move test_batch_generation among TFGPTJModelLanguageGenerationTest tests

be9ca28

Merge remote-tracking branch 'upstream/master' into tf_gpt-j

2a12082

stancld mentioned this pull request Mar 12, 2022

Add TF implementation of GPT-J model #15583

Closed

gante approved these changes Mar 14, 2022

View reviewed changes

Merge branch 'main' into tf_gpt-j

947031e

Resolve some remaining issues

25e67de

* Update set/get output embeddings method * Update prepare prepare_inputs_for_generation * Skip test_resize_token_embeddings as this part of code is going to undergo a major refactor * Update outputs for @tooslow tests

Merge remote-tracking branch 'upstream/main' into tf_gpt-j

b101109

patil-suraj approved these changes Mar 25, 2022

View reviewed changes

gante merged commit ed2ee37 into huggingface:main Mar 25, 2022

stancld deleted the tf_gpt-j branch March 26, 2022 12:34

Add TF implementation of GPT-J #15623

Add TF implementation of GPT-J #15623

Uh oh!

Conversation

stancld commented Feb 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilder commented Feb 11, 2022

Uh oh!

LysandreJik commented Feb 22, 2022

Uh oh!

patil-suraj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

patil-suraj Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Feb 24, 2022

Choose a reason for hiding this comment

Uh oh!

gante commented Mar 11, 2022

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

patil-suraj commented Mar 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante commented Mar 15, 2022

Uh oh!

Rocketknight1 commented Mar 15, 2022

Uh oh!

patil-suraj commented Mar 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Rocketknight1 commented Mar 16, 2022

Uh oh!

patil-suraj commented Mar 22, 2022

Uh oh!

gante commented Mar 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stancld commented Mar 25, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stancld commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gante commented Mar 25, 2022

Uh oh!

patil-suraj commented Mar 25, 2022

Uh oh!

gante commented Mar 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

stancld commented Feb 11, 2022 •

edited

Loading

patil-suraj commented Mar 15, 2022 •

edited

Loading

patil-suraj commented Mar 15, 2022 •

edited

Loading

gante commented Mar 22, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 25, 2022 •

edited

Loading

stancld commented Mar 25, 2022 •

edited

Loading