Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is GELU operation quantized as well? #2

Closed
Kevinpsk opened this issue Dec 13, 2021 · 1 comment
Closed

Is GELU operation quantized as well? #2

Kevinpsk opened this issue Dec 13, 2021 · 1 comment

Comments

@Kevinpsk
Copy link

Hi there,

Thanks a lot for sharing the coding.
I have a quick question, is GELU layer fully quantized as well? I see you mention that the vision transformer is fully quantized, but I was not able to find where you quantize GELU in code, nor do I find any description of it in the paper? Did I miss something?
Thanks a lot for your clarification.

@linyang-zhh
Copy link
Collaborator

linyang-zhh commented Dec 14, 2021

@Kevinpsk
Hi! Thanks for your interest in our work!

We quantize the output of GELU using the QAct, the code is here.

This processing is consistent with the actual application. Considering that PTQ must guarantee the quantized inference is consistent with the original full-precision inference, we must keep the non-linear GELU layers. Thus, to minimize hardware consumption for subsequent operations (e.g., the next linear operation), quantization is usually performed on the output of GELU, and the inference of the GELU function is performed in a cpu unit, or implemented using a Look-Up-Table (LUT).

If you are interested in a purely fixed-point implementation of the inference of GELU, we recommend you to follow the work of I-BERT, and also welcome to try our optimizations for LayerNorm and Softmax!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants