-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
(5.x) Merge 4.x #24254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(5.x) Merge 4.x #24254
Conversation
Although acceptible to Intel CPUs, it's still undefined behaviour according to the C++ standard. It can be replaced with memcpy, which makes the code simpler, and it generates the same assembly code with gcc and clang with -O2 (verified with godbolt). Also expanded the test to include other little endian CPUs by testing for __LITTLE_ENDIAN__.
… (rawMode == true)
dnn: cleanup of tengine backend opencv#24122 🚀 Cleanup for OpenCV 5.0. Tengine backend is added for convolution layer speedup on ARM CPUs, but it is not maintained and the convolution layer on our default backend has reached similar performance to that of Tengine. Tengine backend related PRs: - opencv#16724 - opencv#18323 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
videoio: doc: add odd width or height limitation for FFMPEG
…rtTo_copyTo_bindings `cuda`: Fix `GpuMat::copyTo` and `GpuMat::converTo` python bindings
…tst_scene_render Fix python sample code (tst_scene_render) opencv#24116 Fix bug of python sample code (samples/python/tst_scene_render.py) when backGr or fgr is None (opencv#24114) 1) pass shape tuple to np.zeros arguments instead of integers 2) change np.int to int ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [o] I agree to contribute to the project under Apache 2 License. - [o] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [o] The PR is proposed to the proper branch - [o] There is a reference to the original bug report and related work - [o] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [o] The feature is well documented and sample code can be built with the project CMake
Fixed bug when MSMF webcamera doesn't start when build with VIDEOIO_PLUGIN_ALL
It has the usual Unix filesystem operations.
Rewrite Universal Intrinsic code by using new API: Core module. opencv#23980 The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API. The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885. Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are: 1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited - ./modules/core/src/stat.simd.hpp - ./modules/core/src/matrix_transform.cpp - ./modules/core/src/matmul.simd.hpp 2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly. - ./modules/core/src/mathfuncs_core.simd.hpp ```cpp struct v_atan_f32 { explicit v_atan_f32(const float& scale) { ... } v_float32 compute(const v_float32& y, const v_float32& x) { ... } ... v_float32 val90; // sizeless type can not used in a class v_float32 val180; v_float32 val360; v_float32 s; }; ``` 3. The API interface does not support/does not match - ./modules/core/src/norm.cpp Use `v_popcount`, ~~waiting for opencv#23966~~ Fixed - ./modules/core/src/has_non_zero.simd.hpp Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed ```cpp /** @brief Bitwise OR Only for integer types. */ template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b); ``` ```cpp #if CV_SIMD typedef v_float32 v_type; const v_type v_zero = vx_setzero_f32(); constexpr const int unrollCount = 8; int step = v_type::nlanes * unrollCount; int len0 = len & -step; const float* srcSimdEnd = src+len0; int countSIMD = static_cast<int>((srcSimdEnd-src)/step); while(!res && countSIMD--) { v_type v0 = vx_load(src); src += v_type::nlanes; v_type v1 = vx_load(src); src += v_type::nlanes; .... src += v_type::nlanes; v0 |= v1; //Illegal ? .... //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ res = !v_check_all(((v0 | v4) == v_zero)); } v_cleanup(); #endif ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
`VideoCapture`: remove decoder initialization when demuxing
Fix GNU/Hurd build
Fixed invalid cast and unaligned memory access
Streamlabs Desktop has the same issue in opencv#19746. This fixes it using opencv#23460 method.
OCL_FP16 MatMul with large batch * Workaround FP16 MatMul with large batch * Fix OCL reinitialization * Higher thresholds for INT8 quantization * Try fix gemm_buffer_NT for half (columns) * Fix GEMM by rows * Add batch dimension to InnerProduct layer test * Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding * Batch 16 * Replace all vload4 * Version suffix for MobileNetSSD_deploy Caffe model
/cc @vpisarev @hanliutong Could you check, if all things merged correctly? |
13289e8
to
6af4de6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/cc @vpisarev To review merged changes with #23865
/cc @mshabunin To review merged changes with #23980
Wrong cross-repo references. |
@asmorkalov , 5.x branch does not compile for RISC-V at this moment, so I can not check whether recent patches have been applied correctly. I'll try to fix the build first, please give me some time. |
70c9e4f
to
0b0fb90
Compare
@@ -11,7 +11,7 @@ | |||
namespace cv | |||
{ | |||
|
|||
#if CV_SIMD | |||
#if (CV_SIMD || CV_SIMD_SCALABLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've fixed the compilation, but this merge still can not be built because of incompatible code being added in 5.x (uses operators with intrinsics, for example vx_load_as
in this #if
block). It means either separate pass with refactoring tool is needed or manual adaptation of the code.
75b55a4
to
538bd5c
Compare
538bd5c
to
fdab565
Compare
const int nlanes = v_uint64::nlanes; | ||
double buf[v_uint64::nlanes*2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vpisarev Are you sure in the type and buffer size here?
@opencv-alalek The PR is ready for review. |
@asmorkalov Need to trigger GHA for contrib PR. There are no build results at all. |
Done. |
OpenCV Contrib: #3559
OpenCV Extra: #1093
#23607 from alexander-varjo:alexander-varjo-patch-1
#23734 from seanm:unaligned-copy
#23904 from kai-waang:removing-unreachable
#23965 from fengyuentau:broadcast_to
#23980 from hanliutong:rewrite-core
#24012 from cudawarped:videocapture_raw_read
#24086 from Kumataro:fix24081
#24089 from cudawarped:cuda_gpumat_fix_convertTo_copyTo_bindings
#24098 from 0xMihir:4.x
#24116 from chaebkimm/update-samples-python-tst_scene_render
#24120 from dkurt:actualize_dnn_links
#24122 from fengyuentau:remove_tengine
#24128 from CSBVision:CSBVision-patch-1
#24133 from alexlyulkov:al/fixed-msmf-webcam
#24138 from mshabunin:fix-gst-plugin-camera
#24139 from AleksandrPanov:fix_refineDetectedMarkers
#24140 from sthibaul:4.x
#24142 from beanjoy:4.x
#24143 from seanm:sprintf4
#24150 from DeePingXian:4.x
#24153 from Ginkgo-Biloba:ipp-warp-affine
#24156 from zihaomu:fix_24041
#24157 from dkurt:gapi_ov_optional
#24160 from mshabunin:update-ade
#24167 from autoantwort:missing-include
#24172 from CSBVision:CSBVision-patch-1-1
#24176 from dkurt:correct_perf_test
#24178 from dmatveev:dm/streaming_queue
#24179 from Kumataro:fix24145
#24180 from MambaWong:4.x
#24186 from dkurt:ts_fixture_constructor_skip
#24189 from dkurt:skip_ov_max_pool_ov
#24194 from vrabaud:compilation_fix
#24196 from dkurt:ov_backend_cleanups
#24199 from Kumataro:fixlibTiffSite
#24203 from thesamesam:arm64-fp16
#24204 from georgthegreat:mser-license
#24209 from alexlyulkov:al/fixed-mjpeg
#24211 from philsc:fix-asan-crash
#24214 from dkurt:distanceTransform_big_step
#24215 from Kumataro:fix24213
#24216 from dkurt:inter_lines_less_compute
#24218 from CSBVision:patch-5
#24221 from WanliZhong:issue_24016
#24223 from asmorkalov:as/24186_revert
#24227 from georgthegreat:missing-includes
#24228 from AleksandrPanov:fix_extendDictionary
#24232 from georgthegreat:missing-qualifiers
#24244 from alexlyulkov:al/update-dnn-js-face-recognition-sample
#24245 from alexlyulkov/al/update-fast-neural-style-dnn-sample
#24246 from asmorkalov:as/merge_input_check2
#24248 from opencv-pushbot:gitee/alalek/issue_22751
#24251 from dkurt:ov_build_debug
#24252 from opencv-pushbot:gitee/alalek/refactor_24218
Previous "Merge 4.x": #24119