Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we summarize the meanings of data type like bf16_fp16?, for example, what's activation data type and output data type, what's the computing instruction? #414

Open
heagoo opened this issue May 21, 2024 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@heagoo
Copy link

heagoo commented May 21, 2024

No description provided.

@Duyi-Wang
Copy link
Contributor

Sorry, our docs are still in WIP due to ongoing code refactoring.

The mixed data type such as bf16_fp16 and bf16_int8 refers to the usage of BF16 format during the 1st token, while fp16 or int8 type is used during the next token. This is because the 1st token is compute-intensive and highly sensitive to precision, hence we use half precision along with AMX to accelerate computation. However, next token is memory-bound, so lower precision is employed to speed up the process. For the bf16_fp16 type, introduced this type since fp16 performance is better than bf16 in some cases in older versions, but now after optimization, it is recommended to use bf16 instead of bf16_fp16.

@Duyi-Wang Duyi-Wang added the documentation Improvements or additions to documentation label May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants