-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding te Linear for fp8 support #271
Conversation
|
actckpt error with amp_fp8 (with
|
Re. Edit: actually re-read your comments, and I guess we are still blocked on ActCkpt + amp_fp8. Will read the logs carefully now... |
There seems to be a bug referenced in this PR (NVIDIA/TransformerEngine#93) that was fixed in this PR (NVIDIA/TransformerEngine#187) that is available on Could you install TE @ |
Note: transformer_engine has its own ckpt util which might need to be integrated into composer (?) for fp8 to work with act ckpt??? |
TE @ main requires flash-attn==1.0.6 Solution: add |
installing TE from main (as suggested by @abhi-mosaic) makes our integrated act ckpt work; no need to integrate TE act ckpt |
* Add a callback that logs generations to wandb at eval end (#265) * updt * add 40gb tput * Update examples/llm/throughput/README.md Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com> --------- Co-authored-by: Daniel King <43149077+dakinggg@users.noreply.github.com> Co-authored-by: Abhi Venigalla <77638579+abhi-mosaic@users.noreply.github.com>
Note: 3B model uses act chpt so its model TFLOPS is multiplied by 0.75. slightly more here |
ran
composer train/train.py train/yamls/pretrain/mpt-3b.yaml
also withmodel.fc_type=te
andprecision=amp_fp8
Result:
Note there does seem to be this error when activation ckpt is enabled when
activation_checkpointing_reentrant: false
. If we setactivation_checkpointing_reentrant: true
, then act ckpt works fine withoutamp_fp8
; withamp_fp8
the issue still persists.(previously, circa summer 2022,
activation_checkpointing_reentrant: true
resulted in some difficulties which is why we set it tofalse
; not sure if this is necessary still...)ActCkpt error will be added to the comments.
(the error with
amp_fp8
might be a composer impl of fp8 issue)