Skip to content

(5.x) Merge 4.x #24254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 94 commits into from
Sep 14, 2023
Merged

(5.x) Merge 4.x #24254

merged 94 commits into from
Sep 14, 2023

Conversation

asmorkalov
Copy link
Contributor

@asmorkalov asmorkalov commented Sep 11, 2023

OpenCV Contrib: #3559
OpenCV Extra: #1093

#23607 from alexander-varjo:alexander-varjo-patch-1
#23734 from seanm:unaligned-copy
#23904 from kai-waang:removing-unreachable
#23965 from fengyuentau:broadcast_to
#23980 from hanliutong:rewrite-core
#24012 from cudawarped:videocapture_raw_read
#24086 from Kumataro:fix24081
#24089 from cudawarped:cuda_gpumat_fix_convertTo_copyTo_bindings
#24098 from 0xMihir:4.x
#24116 from chaebkimm/update-samples-python-tst_scene_render
#24120 from dkurt:actualize_dnn_links
#24122 from fengyuentau:remove_tengine
#24128 from CSBVision:CSBVision-patch-1
#24133 from alexlyulkov:al/fixed-msmf-webcam
#24138 from mshabunin:fix-gst-plugin-camera
#24139 from AleksandrPanov:fix_refineDetectedMarkers
#24140 from sthibaul:4.x
#24142 from beanjoy:4.x
#24143 from seanm:sprintf4
#24150 from DeePingXian:4.x
#24153 from Ginkgo-Biloba:ipp-warp-affine
#24156 from zihaomu:fix_24041
#24157 from dkurt:gapi_ov_optional
#24160 from mshabunin:update-ade
#24167 from autoantwort:missing-include
#24172 from CSBVision:CSBVision-patch-1-1
#24176 from dkurt:correct_perf_test
#24178 from dmatveev:dm/streaming_queue
#24179 from Kumataro:fix24145
#24180 from MambaWong:4.x
#24186 from dkurt:ts_fixture_constructor_skip
#24189 from dkurt:skip_ov_max_pool_ov
#24194 from vrabaud:compilation_fix
#24196 from dkurt:ov_backend_cleanups
#24199 from Kumataro:fixlibTiffSite
#24203 from thesamesam:arm64-fp16
#24204 from georgthegreat:mser-license
#24209 from alexlyulkov:al/fixed-mjpeg
#24211 from philsc:fix-asan-crash
#24214 from dkurt:distanceTransform_big_step
#24215 from Kumataro:fix24213
#24216 from dkurt:inter_lines_less_compute
#24218 from CSBVision:patch-5
#24221 from WanliZhong:issue_24016
#24223 from asmorkalov:as/24186_revert
#24227 from georgthegreat:missing-includes
#24228 from AleksandrPanov:fix_extendDictionary
#24232 from georgthegreat:missing-qualifiers
#24244 from alexlyulkov:al/update-dnn-js-face-recognition-sample
#24245 from alexlyulkov/al/update-fast-neural-style-dnn-sample
#24246 from asmorkalov:as/merge_input_check2
#24248 from opencv-pushbot:gitee/alalek/issue_22751
#24251 from dkurt:ov_build_debug
#24252 from opencv-pushbot:gitee/alalek/refactor_24218

Previous "Merge 4.x": #24119

force_builders=Linux AVX2,Custom
build_image:Docs=docs-js:18.04
build_image:Custom=javascript
buildworker:Custom=linux-1,linux-4,linux-f1

seanm and others added 30 commits June 9, 2023 18:56
Although acceptible to Intel CPUs, it's still undefined behaviour according to the C++ standard.

It can be replaced with memcpy, which makes the code simpler, and it generates the same assembly code with gcc and clang with -O2 (verified with godbolt).

Also expanded the test to include other little endian CPUs by testing for __LITTLE_ENDIAN__.
dnn: cleanup of tengine backend opencv#24122

🚀 Cleanup for OpenCV 5.0. Tengine backend is added for convolution layer speedup on ARM CPUs, but it is not maintained and the convolution layer on our default backend has reached similar performance to that of Tengine.

Tengine backend related PRs:
- opencv#16724
- opencv#18323

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
videoio: doc: add odd width or height limitation for FFMPEG
…rtTo_copyTo_bindings

`cuda`: Fix `GpuMat::copyTo` and `GpuMat::converTo` python bindings
…tst_scene_render

Fix python sample code (tst_scene_render) opencv#24116

Fix bug of python sample code (samples/python/tst_scene_render.py) when backGr or fgr is None (opencv#24114)

1) pass shape tuple to np.zeros arguments instead of integers
2) change np.int to int

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [o] I agree to contribute to the project under Apache 2 License.
- [o] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [o] The PR is proposed to the proper branch
- [o] There is a reference to the original bug report and related work
- [o] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [o] The feature is well documented and sample code can be built with the project CMake
Fixed bug when MSMF webcamera doesn't start when build with VIDEOIO_PLUGIN_ALL
It has the usual Unix filesystem operations.
Rewrite Universal Intrinsic code by using new API: Core module. opencv#23980

The goal of this PR is to match and modify all SIMD code blocks guarded by `CV_SIMD` macro in the `opencv/modules/core` folder and rewrite them by using the new Universal Intrinsic API.

The patch is almost auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885.

Most of the files have been rewritten, but I marked this PR as draft because, the `CV_SIMD` macro also exists in the following files, and the reasons why they are not rewrited are:

1. ~~code design for fixed-size SIMD (v_int16x8, v_float32x4, etc.), need to manually rewrite.~~ Rewrited
- ./modules/core/src/stat.simd.hpp
- ./modules/core/src/matrix_transform.cpp
- ./modules/core/src/matmul.simd.hpp

2. Vector types are wrapped in other class/struct, that are not supported by the compiler in variable-length backends. Can not be rewrited directly.
- ./modules/core/src/mathfuncs_core.simd.hpp 
```cpp
struct v_atan_f32
{
    explicit v_atan_f32(const float& scale)
    {
...
    }

    v_float32 compute(const v_float32& y, const v_float32& x)
    {
...
    }

...
    v_float32 val90; // sizeless type can not used in a class
    v_float32 val180;
    v_float32 val360;
    v_float32 s;
};
```

3. The API interface does not support/does not match

- ./modules/core/src/norm.cpp 
Use `v_popcount`, ~~waiting for opencv#23966~~ Fixed
- ./modules/core/src/has_non_zero.simd.hpp
Use illegal Universal Intrinsic API: For float type, there is no logical operation `|`. Further discussion needed

```cpp
/** @brief Bitwise OR

Only for integer types. */
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n> operator|(const v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
template<typename _Tp, int n> CV_INLINE v_reg<_Tp, n>& operator|=(v_reg<_Tp, n>& a, const v_reg<_Tp, n>& b);
```

```cpp
#if CV_SIMD
    typedef v_float32 v_type;
    const v_type v_zero = vx_setzero_f32();
    constexpr const int unrollCount = 8;
    int step = v_type::nlanes * unrollCount;
    int len0 = len & -step;
    const float* srcSimdEnd = src+len0;

    int countSIMD = static_cast<int>((srcSimdEnd-src)/step);
    while(!res && countSIMD--)
    {
        v_type v0 = vx_load(src);
        src += v_type::nlanes;
        v_type v1 = vx_load(src);
        src += v_type::nlanes;
....
        src += v_type::nlanes;
        v0 |= v1; //Illegal ?
....
        //res = v_check_any(((v0 | v4) != v_zero));//beware : (NaN != 0) returns "false" since != is mapped to _CMP_NEQ_OQ and not _CMP_NEQ_UQ
        res = !v_check_all(((v0 | v4) == v_zero));
    }

    v_cleanup();
#endif
```

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
`VideoCapture`: remove decoder initialization when demuxing
Fixed invalid cast and unaligned memory access
Streamlabs Desktop has the same issue in opencv#19746.
This fixes it using opencv#23460 method.
OCL_FP16 MatMul with large batch

* Workaround FP16 MatMul with large batch

* Fix OCL reinitialization

* Higher thresholds for INT8 quantization

* Try fix gemm_buffer_NT for half (columns)

* Fix GEMM by rows

* Add batch dimension to InnerProduct layer test

* Fix Test_ONNX_conformance.Layer_Test/test_basic_conv_with_padding

* Batch 16

* Replace all vload4

* Version suffix for MobileNetSSD_deploy Caffe model
@asmorkalov
Copy link
Contributor Author

/cc @vpisarev @hanliutong Could you check, if all things merged correctly?

@asmorkalov asmorkalov added this to the 4.9.0 milestone Sep 11, 2023
Copy link
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @vpisarev To review merged changes with #23865
/cc @mshabunin To review merged changes with #23980

@opencv-alalek
Copy link
Contributor

OpenCV Contrib: https://github.com/opencv/opencv/pull/3559
OpenCV Extra: https://github.com/opencv/opencv/pull/1093

Wrong cross-repo references.

@asmorkalov asmorkalov changed the title (5.x) Merge 4.x WIP: (5.x) Merge 4.x Sep 12, 2023
@mshabunin
Copy link
Contributor

@asmorkalov , 5.x branch does not compile for RISC-V at this moment, so I can not check whether recent patches have been applied correctly. I'll try to fix the build first, please give me some time.

@asmorkalov asmorkalov force-pushed the 5.x-merge-4.x branch 2 times, most recently from 70c9e4f to 0b0fb90 Compare September 12, 2023 16:22
@@ -11,7 +11,7 @@
namespace cv
{

#if CV_SIMD
#if (CV_SIMD || CV_SIMD_SCALABLE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've fixed the compilation, but this merge still can not be built because of incompatible code being added in 5.x (uses operators with intrinsics, for example vx_load_as in this #if block). It means either separate pass with refactoring tool is needed or manual adaptation of the code.

@asmorkalov asmorkalov force-pushed the 5.x-merge-4.x branch 5 times, most recently from 75b55a4 to 538bd5c Compare September 13, 2023 08:52
@asmorkalov asmorkalov changed the title WIP: (5.x) Merge 4.x (5.x) Merge 4.x Sep 13, 2023
Comment on lines -598 to -599
const int nlanes = v_uint64::nlanes;
double buf[v_uint64::nlanes*2];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpisarev Are you sure in the type and buffer size here?

@asmorkalov
Copy link
Contributor Author

@opencv-alalek The PR is ready for review.

@opencv-alalek
Copy link
Contributor

@asmorkalov Need to trigger GHA for contrib PR. There are no build results at all.

@asmorkalov
Copy link
Contributor Author

asmorkalov commented Sep 14, 2023

Done.

@asmorkalov asmorkalov merged commit fdab565 into opencv:5.x Sep 14, 2023
@asmorkalov asmorkalov mentioned this pull request Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.