New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dnn: add gemm_layer in place of fully_connected_layer for onnx models #23897
Conversation
@fengyuentau, could you provide some performance numbers (before/after)? |
@vpisarev , I just put on the preliminary benchmark results in the first comment. Please take a look.
|
Hello @opencv-alalek , could you help download the new model on the default CI node? |
@@ -2066,7 +2066,7 @@ TEST_P(Test_ONNX_nets, Googlenet) | |||
applyTestTag(CV_TEST_TAG_DNN_SKIP_IE_NGRAPH); | |||
#endif | |||
|
|||
const String model = _tf("models/googlenet.onnx", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not introduce regressions for previously supported models.
You could add new test cases, but you can't just drop them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just wrong when you force it running with batch size >= 2 while it has a reshape operator always reshaping the batch size to be 1 (more details here). Why do we have to keep a wrong thing in our tests? It does not make sense to replicate bugs in a refactor implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkurt Could you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I right that original model is not available anymore? https://s3.amazonaws.com/download.onnx/models/opset_8/bvlc_googlenet.tar.gz
Model <GoogleNet (ONNX)>
expect 739732220ba2e3efa88f7c26f13badad9b7514bc
catch [Errno 2] No such file or directory: 'bvlc_googlenet.tar.gz'
hash check failed - downloading
get https://s3.amazonaws.com/download.onnx/models/opset_8/bvlc_googlenet.tar.gz
catch HTTP Error 403: Forbidden
Also, if a new model from https://drive.google.com/u/0/uc?id=1FucNLURGgdPk4nCxT0378isPigCcbZ1M&export=download is a personal Google Drive with a modified file, I think it's not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due even opset 12 version has Reshape with 1x1024 hardcoded shapes, I think it does not make sense to update the model with Flatten. However still there is a need to update URL/model/data. So I recommend this: https://github.com/onnx/models/tree/main/vision/classification/inception_and_googlenet/googlenet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hold on. Although I am not fully agree all the opinions here, I am working on adding back this feature, which needs only two additional transposition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fengyuentau, sure. Here is a brief proposal so if you find it suitable, we can add a workaround: dkurt@cad21a3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, I found if batch is allowed here, it is even much more broken. I found in TEST_P(Test_ONNX_nets, Googlenet)
, the shape of ref
is 2x1000
instead of 2x1x1000
, meaning the shape of output of the last Gemm layer in Googlenet is 2x1000
also. Note that the input of this Gemm layer is 2 x 1 x 1024
(2 is the batch added by blobsFromImages
in the source input). This is a broken shape inference with batch in the test. It may break some axis-sensitive layers, also it affects other backends with their own graph engines and shape inference.
Also I can see The forced-batch feature is going to cause some troubles in vision transformer models. Following is an example.
Above is the an attention block in ONNX. The batch dimension can be blended into other dimensions; In the 2nd last Reshape, the original batch size 1
is blended into 8
. If the forced batch feature brings a batch=2
, the Gemm layer won't work because the inner dimension will not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fengyuentau, will dkurt@cad21a3 solve the problem at least partially?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should fix the googlenet problem but it messes the dimensions in attention block. Transpose
operators in the attention block transposes the batch dimension to axis 1.
Could we just simply follow the standards instead of going futher in a wrong way? The forced-batch feature assumes the batch dimension is at axis 0, which may work for most of the conv nets but not for vit.
Hello all, more perf stats are provided in the first comment. Please have a look. |
3abbbaf
to
3f6a499
Compare
…dling 0d/1d cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test binary crashed in OpenCL builds: #24312
@@ -9,6 +9,7 @@ ocv_add_dispatched_file_force_all("int8layers/layers_common" AVX2 AVX512_SKX LAS | |||
ocv_add_dispatched_file_force_all("layers/cpu_kernels/conv_block" AVX AVX2) | |||
ocv_add_dispatched_file_force_all("layers/cpu_kernels/conv_depthwise" AVX AVX2 RVV LASX) | |||
ocv_add_dispatched_file_force_all("layers/cpu_kernels/conv_winograd_f63" AVX AVX2) | |||
ocv_add_dispatched_file_force_all("layers/cpu_kernels/fast_gemm_kernels" AVX AVX2 NEON LASX) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEON
We usually don't use runtime dispatching with NEON (it doesn't work due to different ABI).
Whole library is compiled with NEON instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing via #24315
@@ -2597,6 +2597,40 @@ TEST_P(Test_ONNX_layers, where_node) | |||
testONNXModels("where_layer"); | |||
} | |||
|
|||
TEST_P(Test_ONNX_layers, Conformance_Gemm_all_attributes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conformance
That confuses in test logs.
Avoid using conformance
wording outside of test_onnx_conformance.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing via #24315
…opencv#23897) * first commit * turned C from input to constant; force C constant in impl; better handling 0d/1d cases * integrate with gemm from ficus nn * fix const inputs * adjust threshold for int8 tryQuantize * adjust threshold for int8 quantized 2 * support batched gemm and matmul; tune threshold for rcnn_ilsvrc13; update googlenet * add gemm perf against innerproduct * add perf tests for innerproduct with bias * fix perf * add memset * renamings for next step * add dedicated perf gemm * add innerproduct in perf_gemm * remove gemm and innerproduct perf tests from perf_layer * add perf cases for vit sizes; prepack constants * remove batched gemm; fix wrong trans; optimize KC * remove prepacking for const A; several fixes for const B prepacking * add todos and gemm expression * add optimized branch for avx/avx2 * trigger build * update macros and signature * update signature * fix macro * fix bugs for neon aarch64 & x64 * add backends: cuda, cann, inf_ngraph and vkcom * fix cuda backend * test commit for cuda * test cuda backend * remove debug message from cuda backend * use cpu dispatcher * fix neon macro undef in dispatcher * fix dispatcher * fix inner kernel for neon aarch64 * fix compiling issue on armv7; try fixing accuracy issue on other platforms * broadcast C with beta multiplied; improve func namings * fix bug for avx and avx2 * put all platform-specific kernels in dispatcher * fix typos * attempt to fix compile issues on x64 * run old gemm when neon, avx, avx2 are all not available; add kernel for armv7 neon * fix typo * quick fix: add macros for pack4 * quick fix: use vmlaq_f32 for armv7 * quick fix for missing macro of fast gemm pack f32 4 * disable conformance tests when optimized branches are not supported * disable perf tests when optimized branches are not supported * decouple cv_try_neon and cv_neon_aarch64 * drop googlenet_2023; add fastGemmBatched * fix step in fastGemmBatched * cpu: fix initialization ofb; gpu: support batch * quick followup fix for cuda * add default kernels * quick followup fix to avoid macro redef * optmized kernels for lasx * resolve mis-alignment; remove comments * tune performance for x64 platform * tune performance for neon aarch64 * tune for armv7 * comment time consuming tests * quick follow-up fix
Merge with opencv/opencv_extra#1073.
This PR integrates gemm from ficus nn.
Gemm
layer is created to replaceInnerProduct
layer for onnx models. Once this PR is merged or proved to be working fine, allInnerProduct
layer calls will be replaced byGemm
.Checklist:
avoid re-packing for const A.Removed for simplicity.Benchmarks:
Input scales are collected from ViTs. First scale is A, second is B and third is C if presented. All cases are without transposition. Check CI reports for more tests on different scales (square, retanglular, ...).
You can see basically all
xxG
columns have better speed.xxG
columns stand for Gemm layer performance on different hardware.Notations:
All data in ms (milliseconds).
Other platforms:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.