-
-
Notifications
You must be signed in to change notification settings - Fork 55.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNN: Add qgemm and squeeze op13 supported on ONNXImporter #22306
DNN: Add qgemm and squeeze op13 supported on ONNXImporter #22306
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contribution!
if (constBlobs.find(node_proto.input(3)) == constBlobs.end()) | ||
CV_Error(Error::StsNotImplemented, "Variable weights is not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inp_sc
/inp_zp
below can also be non-constant - so I think we should probably have new overload of getGlob
that accepts error message and does this CV_Error(Error::StSNotImplemented)
stuff under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. In the near future, we do not have a plan to load this type ONNX op (like QGemm with non-constant inp_sc/inp_zp
). I will try to do it.
else // ninputs == 8 | ||
{ | ||
out_sc = getBlob(node_proto, 6); | ||
bias = Mat::zeros(1, outCn, CV_32S); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we assume that input C
is skipped, but that might not be the case. And I think there should be "7" either way - the sixth tensor is just empty, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, I can only support 8 or 9 inputs.
If it has 9 inputs, there are two different cases: the sixth tensor is empty or not.
If it has 8 inputs, we think the C
is empty, and we get the output_scale
at sixth tensor.
This is not a complete solution, but it will handle most cases.
The example has 9 inputs, and with empty tensor at the sixth tensor, so we can only get the output_scale
from the seventh tensor by getBlob()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should test the "else" case, because as far as I understand this paragraph, out_sc should still be the seventh blob. Either way, if we expect something to be missing - we should add an assertion for that.
In case of 8 inputs I think we don't have y_zero_point
, and in case of 9 it seems that we ignore it even if we have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, now I get your points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I see that C
and y_scale
tensors are being handled correctly now. But what about y_zero_point
? I think we should assert that either ninputs == 8
or node_proto.input(8).empty()
, or that 8'th blob contains scalar zero
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rogday, I have checked the detail of the quantized structure in DNN. It was my fault, we have uniformly set the output_scale
and output_zeropoint
here. So we don't need to set them again during the node parsing stage. And in some cases, we need to calculate the outputMultiplier
, that's why we need to get the output_scale
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems unreasonable to calculate the outputMultiplier
in the parse stage, maybe we can move it to the initialization stage of every quantized node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function seems very fragile. It relies on the names the producer decided, not the official spec. I think your PR can be merged, but we need to fix this function in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #22442
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree.
the operations are needed to supported quantized version of text recognition model that one of GSoC students is working on. |
@zihaomu Friendly reminder |
1 similar comment
@zihaomu Friendly reminder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
@zihaomu Please squash the commits before merge. |
1dfe8b6
to
2d837ef
Compare
Hi @asmorkalov, done. |
Squeeze node of ONNX has moved the axes from attribute to input.
Test Data
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.