Add Falcon-7B HuggingFace model#1758
Conversation
|
Hi @petermcaughan! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks! |
|
|
||
| class Model(HuggingFaceModel): | ||
| task = NLP.LANGUAGE_MODELING | ||
| DEFAULT_TRAIN_BSIZE = 4 |
There was a problem hiding this comment.
Can we have a link to upstream code that reference the train batch size?
There was a problem hiding this comment.
The Falcon README shows the training batch size to be 2304 [link] - updated comment to reference this. This seems prohibitive to run in testing suite though, so I kept the default size to 4.
There was a problem hiding this comment.
Thanks for making the reference! We should use the per-device batch size here. In the link it mentioned 2304 batch size on 384 A100 GPUs, so per-GPU batch size should be 6. Can you please update it to 6 and add the link ( https://huggingface.co/tiiuae/falcon-7b/blob/main/README.md#:~:text=Batch%20size,tokens%20ramp%2Dup) as the reference?
There was a problem hiding this comment.
Thanks for the clarification! Updated the size and the reference link.
* Rewrite complexity_test to use (hardcoded) manual time This test is fundamentally flaky, because it tried to read tea leafs, and is inherently misbehaving in CI environments, since there are unmitigated sources of noise. That being said, the computed Big-O also depends on the `--benchmark_min_time=` Fixes google/benchmark#272 * Correctly compute Big-O for manual timings. Fixes pytorch#1758. * complexity_test: do more stuff in empty loop * Make all empty loops be a bit longer empty Looks like on windows, some of these tests still fail, i guess clock precision is too small.
This PR adds two features to torchbenchmark:
__init__.pyinto model constructor during benchmarking. Running Falcon-7B in HuggingFace requires the parametertrust_remote_codeto be passed.