-
Notifications
You must be signed in to change notification settings - Fork 25.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPT-J] Use the float16
checkpoints in integration tests
#13676
[GPT-J] Use the float16
checkpoints in integration tests
#13676
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking care of this.
tests/test_modeling_gptj.py
Outdated
# Marked as @tooslow due to GPU OOM (issue #13676) | ||
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B", revision="float16") | ||
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this test, I think we could use a smaller random model, since this test does not really test generations but only test the max time constraints. WDYT @LysandreJik @anton-l
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @anton-l
tests/test_modeling_gptj.py
Outdated
# Marked as @tooslow due to GPU OOM (issue #13676) | ||
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B", revision="float16") | ||
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B", revision="float16", torch_dtype=torch.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, sounds good to me
…ion-tests # Conflicts: # tests/test_modeling_gptj.py
…ce#13676) * Use fp16 checkpoints * Style * Fix outputs and disable OOM tests * Correct another output * Use a random smaller model for generation tests * repo quickfix * fix gradient checkpointing
…ce#13676) * Use fp16 checkpoints * Style * Fix outputs and disable OOM tests * Correct another output * Use a random smaller model for generation tests * repo quickfix * fix gradient checkpointing
…ce#13676) * Use fp16 checkpoints * Style * Fix outputs and disable OOM tests * Correct another output * Use a random smaller model for generation tests * repo quickfix * fix gradient checkpointing
…ce#13676) * Use fp16 checkpoints * Style * Fix outputs and disable OOM tests * Correct another output * Use a random smaller model for generation tests * repo quickfix * fix gradient checkpointing
This PR switches GPTJ checkpoints in the integration tests to fp16 to test if they're able to run on our daily CI.
At the moment, fp32 checkpoints are timing out either during model downloads or initialization:
Note that this doesn't guarantee reproducibility of the old tests (some tokens may be different), but it could help with caching the models on the runner to avoid timeouts.