Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPT-J] Use the float16 checkpoints in integration tests #13676

Merged
merged 8 commits into from
Sep 22, 2021
Merged

[GPT-J] Use the float16 checkpoints in integration tests #13676

merged 8 commits into from
Sep 22, 2021

Conversation

anton-l
Copy link
Member

@anton-l anton-l commented Sep 21, 2021

This PR switches GPTJ checkpoints in the integration tests to fp16 to test if they're able to run on our daily CI.
At the moment, fp32 checkpoints are timing out either during model downloads or initialization:

600.01s call     tests/test_modeling_gptj.py::GPTJModelTest::test_batch_generation
600.00s call     tests/test_modeling_gptj.py::GPTJModelLanguageGenerationTest::test_lm_generate_gptj
600.00s call     tests/test_modeling_gptj.py::GPTJModelLanguageGenerationTest::test_gptj_sample_max_time
600.00s call     tests/test_modeling_gptj.py::GPTJModelTest::test_model_from_pretrained
600.00s call     tests/test_modeling_gptj.py::GPTJModelLanguageGenerationTest::test_gptj_sample

Note that this doesn't guarantee reproducibility of the old tests (some tokens may be different), but it could help with caching the models on the runner to avoid timeouts.

⚠️ The tests should be revisited once more, once #13466 is merged

@anton-l
Copy link
Member Author

anton-l commented Sep 21, 2021

test_gptj_sample() and test_gptj_sample_max_time() were disabled due to GPU OOM during more than one call to .generate()

Copy link
Contributor

@patil-suraj patil-suraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking care of this.

Comment on lines 517 to 519
# Marked as @tooslow due to GPU OOM (issue #13676)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B", revision="float16")
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this test, I think we could use a smaller random model, since this test does not really test generations but only test the max time constraints. WDYT @LysandreJik @anton-l

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good to me

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @anton-l

Comment on lines 517 to 519
# Marked as @tooslow due to GPU OOM (issue #13676)
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B", revision="float16")
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good to me

@anton-l anton-l merged commit 7c7d2ec into huggingface:master Sep 22, 2021
Narsil pushed a commit to Narsil/transformers that referenced this pull request Sep 25, 2021
…ce#13676)

* Use fp16 checkpoints

* Style

* Fix outputs and disable OOM tests

* Correct another output

* Use a random smaller model for generation tests

* repo quickfix

* fix gradient checkpointing
stas00 pushed a commit to stas00/transformers that referenced this pull request Oct 12, 2021
…ce#13676)

* Use fp16 checkpoints

* Style

* Fix outputs and disable OOM tests

* Correct another output

* Use a random smaller model for generation tests

* repo quickfix

* fix gradient checkpointing
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 13, 2022
…ce#13676)

* Use fp16 checkpoints

* Style

* Fix outputs and disable OOM tests

* Correct another output

* Use a random smaller model for generation tests

* repo quickfix

* fix gradient checkpointing
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
…ce#13676)

* Use fp16 checkpoints

* Style

* Fix outputs and disable OOM tests

* Correct another output

* Use a random smaller model for generation tests

* repo quickfix

* fix gradient checkpointing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants