Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why use 22 bit quantized activations for some layer norms (except in Embeddings)? #5

Closed
bdalal opened this issue Mar 30, 2021 · 2 comments
Labels
question Further information is requested

Comments

@bdalal
Copy link

bdalal commented Mar 30, 2021

Hi,
I've noticed that the QuantAct layers preceding IntLayerNorm in the IBertSelfOutput and IbertOutput modules specify a 22 bit activation width while the QuantAct layer preceding IntLayerNorm in IBertEmbedding specifies a 16 bit activation.

I couldn't find any mention of these bit width choices in the paper. Could you please explain why these choices have been made?

Thank you!

@bdalal bdalal added the question Further information is requested label Mar 30, 2021
@kssteven418
Copy link
Owner

Those numbers are manually chosen to (1) avoid overflow and (2) minimize accuracy degradation.

We find that activations in Embedding layers are somewhat regular and contain fewer number of outliers, allowing 16-bit quantization without accuracy degradation. In contrary, activations in Transformer layers contain more number of outliers (that are sometimes orders of magnitude larger) and, therefore, assigning 16-bit for them could have a significant impact in accuracy. We find that 22-bit is a large enough bit width to avoid performance drop and at the same time avoid overflow in the subsequent IntLayerNorm layers. Therefore, it is fine to use 22-bit in Emedding layers as well - which is a more conservative strategy.

@bdalal
Copy link
Author

bdalal commented Mar 31, 2021

That makes sense. Thank you for the clarification!

@bdalal bdalal closed this as completed Mar 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants