How to quantization INT8 / FP16 #37

Jeremy-J-J · 2021-07-21T01:47:46Z

How to quantization INT8 / FP16?

Define_string_opt("--quantization", g_flag_quantization, "", "declare **json file** saved quantization information");

what does the json file look like？

The text was updated successfully, but these errors were encountered:

Si-XU · 2021-07-21T05:22:53Z

Hi, OpenPPL for cuda only supports fp16 right now. We will support int8 ASAP.
When you input a fp32 data file, OpenPPL will automatically transfer to fp16.
"--quantization" is designed for int8, so you don't need to use it.

openppl-public closed this as completed Jul 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to quantization INT8 / FP16 #37

How to quantization INT8 / FP16 #37

Jeremy-J-J commented Jul 21, 2021

Si-XU commented Jul 21, 2021 •

edited

How to quantization INT8 / FP16 #37

How to quantization INT8 / FP16 #37

Comments

Jeremy-J-J commented Jul 21, 2021

Si-XU commented Jul 21, 2021 • edited

Si-XU commented Jul 21, 2021 •

edited