Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Ask) why InferInheritedType handle int8 to fp16 out? #844

Open
DeepTecher opened this issue Jul 19, 2023 · 3 comments
Open

(Ask) why InferInheritedType handle int8 to fp16 out? #844

DeepTecher opened this issue Jul 19, 2023 · 3 comments

Comments

@DeepTecher
Copy link

As you described in the docs,

InferInheritedType(info); // All Inputs inherit the type of their previous node’s outputs and sets outputs to the data type of the first input.

But in the code, it seems that you handle int8 for special case. Can you tell me why you do this?
https://github.com/openppl-public/ppl.nn/blob/252e7f27eec3976a3be48bb21f15c660cddec6af/src/ppl/nn/engines/cuda/optimizer/opt_kernel.h#L264

@Si-XU
Copy link
Contributor

Si-XU commented Jul 20, 2023

Int8 type should be used combining with quant scale.
We strictly use function of UnifyToOutputQuant and CopyQuantType to deal with int8 inputs.

ppl.nn/src/ppl/nn/engines/cuda/optimizer/opt_kernel.h
This statement is just in case some operations forget to handle int8 inputs.

@DeepTecher
Copy link
Author

Okay. But why do we handle int8 input to fp16 input in the case that some operations forget to handle int8 inputs?

However, perhaps some ops may be not quant op, but support int8 input and int8 output. Like on Min/Max operation

@Si-XU
Copy link
Contributor

Si-XU commented Jul 21, 2023

Yes, some ops support int8 input and int8 output directlly.
However, we suggest the user to specify which ops use int8 precision via "quant.json". The rest of the operators will not save quant information and use float16 by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants