[GPT-J] Use the `float16` checkpoints in integration tests #13676

anton-l · 2021-09-21T14:26:00Z

This PR switches GPTJ checkpoints in the integration tests to fp16 to test if they're able to run on our daily CI.
At the moment, fp32 checkpoints are timing out either during model downloads or initialization:

600.01s call     tests/test_modeling_gptj.py::GPTJModelTest::test_batch_generation
600.00s call     tests/test_modeling_gptj.py::GPTJModelLanguageGenerationTest::test_lm_generate_gptj
600.00s call     tests/test_modeling_gptj.py::GPTJModelLanguageGenerationTest::test_gptj_sample_max_time
600.00s call     tests/test_modeling_gptj.py::GPTJModelTest::test_model_from_pretrained
600.00s call     tests/test_modeling_gptj.py::GPTJModelLanguageGenerationTest::test_gptj_sample

Note that this doesn't guarantee reproducibility of the old tests (some tokens may be different), but it could help with caching the models on the runner to avoid timeouts.

⚠️ The tests should be revisited once more, once #13466 is merged

anton-l · 2021-09-21T15:07:11Z

test_gptj_sample() and test_gptj_sample_max_time() were disabled due to GPU OOM during more than one call to .generate()

patil-suraj

Thanks a lot for taking care of this.

patil-suraj · 2021-09-21T15:12:05Z

tests/test_modeling_gptj.py

+        # Marked as @tooslow due to GPU OOM (issue #13676)
+        tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B", revision="float16")
+        model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16)


for this test, I think we could use a smaller random model, since this test does not really test generations but only test the max time constraints. WDYT @LysandreJik @anton-l

Yes, sounds good to me

LysandreJik

LGTM, thank you @anton-l

LysandreJik · 2021-09-21T19:09:01Z

tests/test_modeling_gptj.py

+        # Marked as @tooslow due to GPU OOM (issue #13676)
+        tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B", revision="float16")
+        model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16)


Yes, sounds good to me

…ion-tests # Conflicts: # tests/test_modeling_gptj.py

…ce#13676) * Use fp16 checkpoints * Style * Fix outputs and disable OOM tests * Correct another output * Use a random smaller model for generation tests * repo quickfix * fix gradient checkpointing

anton-l added 2 commits September 21, 2021 16:05

Use fp16 checkpoints

b06e00d

Style

ccf9fbf

anton-l requested review from patil-suraj and LysandreJik September 21, 2021 14:26

Fix outputs and disable OOM tests

5dd0faa

Correct another output

191fe0d

patil-suraj approved these changes Sep 21, 2021

View reviewed changes

LysandreJik approved these changes Sep 21, 2021

View reviewed changes

anton-l added 4 commits September 22, 2021 22:48

Use a random smaller model for generation tests

e170069

repo quickfix

a2bad6d

fix gradient checkpointing

038fb4f

Merge remote-tracking branch 'upstream/master' into fix-gptj-integrat…

d77abfb

…ion-tests # Conflicts: # tests/test_modeling_gptj.py

anton-l merged commit 7c7d2ec into huggingface:master Sep 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPT-J] Use the `float16` checkpoints in integration tests #13676

[GPT-J] Use the `float16` checkpoints in integration tests #13676

anton-l commented Sep 21, 2021 •

edited

anton-l commented Sep 21, 2021

patil-suraj left a comment

patil-suraj Sep 21, 2021

LysandreJik Sep 21, 2021

LysandreJik left a comment

LysandreJik Sep 21, 2021

[GPT-J] Use the float16 checkpoints in integration tests #13676

[GPT-J] Use the float16 checkpoints in integration tests #13676

Conversation

anton-l commented Sep 21, 2021 • edited

anton-l commented Sep 21, 2021

patil-suraj left a comment

Choose a reason for hiding this comment

patil-suraj Sep 21, 2021

Choose a reason for hiding this comment

LysandreJik Sep 21, 2021

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Sep 21, 2021

Choose a reason for hiding this comment

[GPT-J] Use the `float16` checkpoints in integration tests #13676

[GPT-J] Use the `float16` checkpoints in integration tests #13676

anton-l commented Sep 21, 2021 •

edited