Is GELU operation quantized as well? #2

Kevinpsk · 2021-12-13T16:07:09Z

Hi there,

Thanks a lot for sharing the coding.
I have a quick question, is GELU layer fully quantized as well? I see you mention that the vision transformer is fully quantized, but I was not able to find where you quantize GELU in code, nor do I find any description of it in the paper? Did I miss something?
Thanks a lot for your clarification.

linyang-zhh · 2021-12-14T05:09:17Z

@Kevinpsk
Hi! Thanks for your interest in our work!

We quantize the output of GELU using the QAct, the code is here.

This processing is consistent with the actual application. Considering that PTQ must guarantee the quantized inference is consistent with the original full-precision inference, we must keep the non-linear GELU layers. Thus, to minimize hardware consumption for subsequent operations (e.g., the next linear operation), quantization is usually performed on the output of GELU, and the inference of the GELU function is performed in a cpu unit, or implemented using a Look-Up-Table (LUT).

If you are interested in a purely fixed-point implementation of the inference of GELU, we recommend you to follow the work of I-BERT, and also welcome to try our optimizations for LayerNorm and Softmax!

linyang-zhh closed this as completed Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is GELU operation quantized as well? #2

Is GELU operation quantized as well? #2

Kevinpsk commented Dec 13, 2021

linyang-zhh commented Dec 14, 2021 •

edited

Is GELU operation quantized as well? #2

Is GELU operation quantized as well? #2

Comments

Kevinpsk commented Dec 13, 2021

linyang-zhh commented Dec 14, 2021 • edited

linyang-zhh commented Dec 14, 2021 •

edited