Skip to content

Dev add SVE optimizations.#357

Merged
ermig1979 merged 38 commits into
masterfrom
dev
May 26, 2026
Merged

Dev add SVE optimizations.#357
ermig1979 merged 38 commits into
masterfrom
dev

Conversation

@ermig1979
Copy link
Copy Markdown
Owner

No description provided.

ermig1979 and others added 30 commits May 4, 2026 16:48
Add new Copy() overloads (no args and with Rectangle) returning a new
View/Frame by value, alongside the existing pointer-returning Clone().
Add View::Copy and Frame::Copy returning by value
Copilot AI review requested due to automatic review settings May 26, 2026 08:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands ARM support by adding SVE-optimized implementations for several image/statistics routines, introduces value-returning copy helpers for View/Frame, and adds AArch64 NEON CRC32/CRC32c paths with corresponding test and build-system updates.

Changes:

  • Add SVE implementations and test coverage for statistic/difference/background routines (plus alignment reporting updates for SVE).
  • Add View::Copy() / Frame::Copy() convenience methods returning deep-copied objects by value.
  • Add AArch64 NEON CRC32/CRC32c implementation and wire it into dispatch + tests; update CMake/VS projects and docs accordingly.

Reviewed changes

Copilot reviewed 59 out of 87 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/Test/TestSynetConvolution16b.cpp Adjusts convolution16b test cases/coverage selection.
src/Test/TestStatistic.cpp Fixes statistic test comparison logging; adds SVE test dispatch.
src/Test/TestDifferenceSum.cpp Adds SVE coverage for AbsDifferenceSum variants in tests.
src/Test/TestCrc32.cpp Adds NEON (AArch64) CRC32/CRC32c tests.
src/Test/TestCompare.h Adds TEST_CHECK_VALUE_AS_INT for correct uint8 logging/comparison.
src/Test/TestBackground.cpp Adds SVE test dispatch for background grow-range routines.
src/Simd/SimdView.hpp Adds View::Copy() overloads returning deep copies by value.
src/Simd/SimdSynetConvolution16b.h Declares new SynetConvolution16bNhwcSpecV3 and AMX-BF16 specialization.
src/Simd/SimdSve1Statistic.cpp Adds SVE GetStatistic implementation.
src/Simd/SimdSve1Background.cpp Adds SVE BackgroundGrowRangeSlow/Fast implementations.
src/Simd/SimdSve1.h Declares new SVE APIs (difference sums, background grow-range, statistic).
src/Simd/SimdNeonCrc32.cpp Adds AArch64 NEON CRC32/CRC32c implementations.
src/Simd/SimdNeonAbsDifferenceSum.cpp Minor formatting/guard cleanup.
src/Simd/SimdNeon.h Declares NEON CRC32/CRC32c on AArch64.
src/Simd/SimdLib.h Adds C++20/C++23 feature macros; updates alignment documentation for SVE.
src/Simd/SimdLib.cpp Adds dispatch to SVE routines; adds NEON CRC32/CRC32c dispatch.
src/Simd/SimdFrame.hpp Adds Frame::Copy() overloads returning deep copies by value.
src/Simd/SimdAmxBf16SynetConvolution16b.cpp Wires in AMX-BF16 SynetConvolution16bNhwcSpecV3 creation.
src/Simd/SimdAlignment.h Reports SVE vector size as alignment when SVE is enabled.
README.md Documents SVE support and SIMD_SVE build option.
prj/vs2022/Sve1.vcxproj.filters Adds filters/entries for new SVE sources.
prj/vs2022/Sve1.vcxproj Adds new SVE sources to the VS project.
prj/vs2022/Neon.vcxproj.filters Adds NEON CRC32 source to VS filters.
prj/vs2022/Neon.vcxproj Adds NEON CRC32 source to the VS project.
prj/vs2022/Base.vcxproj.filters Adds Base NHWC SpecV3 source to VS filters.
prj/vs2022/Base.vcxproj Adds Base NHWC SpecV3 source to the VS project.
prj/vs2022/AmxBf16.vcxproj.filters Adds AMX-BF16 NHWC SpecV3 source to VS filters.
prj/vs2022/AmxBf16.vcxproj Adds AMX-BF16 NHWC SpecV3 source to the VS project.
prj/txt/DoxygenOverview.txt Updates overview/build options to mention SVE.
prj/cmake/arm.cmake Updates ARM processor detection; adds AArch64 -march=armv8-a+crc.
docs/index.html Updates docs homepage to mention SVE and adds a contributor entry.
docs/help/struct_simd_1_1_view.html Documents new View::Copy() APIs and related doc tweaks.
docs/help/struct_simd_1_1_frame.html Documents new Frame::Copy() APIs.
docs/help/struct_simd_1_1_detection.html Updates line references due to header shifts.
docs/help/index.html Updates doc index to mention SVE and build option.
docs/help/group__warp__affine.html Updates SimdRelease description wording.
docs/help/group__thread.html Updates wording for SimdGetThreadNumber documentation.
docs/help/group__synet__scale.html Updates SimdRelease description wording.
docs/help/group__synet__quantized__merged__convolution.html Updates SimdRelease description wording.
docs/help/group__synet__quantized__convolution.html Updates SimdRelease description wording.
docs/help/group__synet__quantized__add.html Updates SimdRelease description wording.
docs/help/group__synet__permute.html Updates SimdRelease description wording.
docs/help/group__synet__inner__product.html Updates SimdRelease description wording.
docs/help/group__synet__inner__product__bf16.html Updates SimdRelease description wording.
docs/help/group__synet__grid__sample.html Updates SimdRelease description wording.
docs/help/group__synet__gather__elements.html Updates SimdRelease description wording.
docs/help/group__synet__deconvolution__fp32.html Updates SimdRelease description wording.
docs/help/group__synet__deconvolution__bf16.html Updates SimdRelease description wording.
docs/help/group__synet__convolution__int8.html Updates SimdRelease description wording.
docs/help/group__synet__convolution__fp32.html Updates SimdRelease description wording.
docs/help/group__synet__convolution__bf16.html Updates SimdRelease description wording.
docs/help/group__synet__add.html Updates SimdRelease description wording.
docs/help/group__shifting.html Updates SimdRelease description wording.
docs/help/group__resizing.html Updates SimdRelease description wording.
docs/help/group__recursive__bilateral__filter.html Updates SimdRelease description wording.
docs/help/group__other__filter.html Improves wording for AbsGradientSaturatedSum description.
docs/help/group__object__detection.html Updates SimdRelease description wording.
docs/help/group__matrix.html Updates thread-note wording to match updated thread docs.
docs/help/group__info.html Expands documentation for version/CPU info APIs and adds SVE mentions.
docs/help/group__image__io.html Updates SimdFree description wording.
docs/help/group__hash.html Improves CRC32/CRC32c documentation wording/details.
docs/help/group__gaussian__filter.html Updates SimdRelease description wording.
docs/help/group__difference__estimation.html Clarifies AddFeatureDifference documentation/behavior description.
docs/help/group__c__types.html Updates references to updated SimdCpuDesc documentation title.
docs/help/functions_u.html Reorders/adjusts symbol index entries.
docs/help/functions_t.html Updates symbol index grouping for Top() entries.
docs/help/functions_r.html Reorders right/Right() entries in symbol index.
docs/help/functions_l.html Reorders lossType/LossType entries in symbol index.
docs/help/functions_i.html Reorders initType/InitType entries in symbol index.
docs/help/functions_func_c.html Adds Copy() entries for new View/Frame methods in symbol index.
docs/help/functions_f.html Updates symbol index grouping for Flipped()/Format() entries.
docs/help/functions_c.html Adds Copy() entries for new View/Frame methods in symbol index.
docs/help/functions_b.html Reorders bottom/Bottom() entries in symbol index.
docs/2026.html Adds release notes for upcoming version including SVE/NEON updates.
.gitignore Ignores IDE and cmake build directories.
Files not reviewed (27)
  • docs/help/functions_b.html: Language not supported
  • docs/help/functions_c.html: Language not supported
  • docs/help/functions_f.html: Language not supported
  • docs/help/functions_func_c.html: Language not supported
  • docs/help/functions_i.html: Language not supported
  • docs/help/functions_l.html: Language not supported
  • docs/help/functions_r.html: Language not supported
  • docs/help/functions_t.html: Language not supported
  • docs/help/functions_u.html: Language not supported
  • docs/help/group__c__types.html: Language not supported
  • docs/help/group__correlation.html: Language not supported
  • docs/help/group__cpu__flags.html: Language not supported
  • docs/help/group__descrint.html: Language not supported
  • docs/help/group__difference__estimation.html: Language not supported
  • docs/help/group__drawing.html: Language not supported
  • docs/help/group__gaussian__filter.html: Language not supported
  • docs/help/group__hash.html: Language not supported
  • docs/help/group__image__io.html: Language not supported
  • docs/help/group__info.html: Language not supported
  • docs/help/group__matrix.html: Language not supported
  • docs/help/group__memory.html: Language not supported
  • docs/help/group__object__detection.html: Language not supported
  • docs/help/group__other__filter.html: Language not supported
  • docs/help/group__recursive__bilateral__filter.html: Language not supported
  • docs/help/group__resizing.html: Language not supported
  • docs/help/group__shifting.html: Language not supported
  • docs/help/group__synet__add.html: Language not supported

Comment on lines +50 to +55
uint8_t* nose = (uint8_t*)src;
uint64_t* body = (uint64_t*)AlignHi(nose, sizeof(uint64_t));
uint64_t* tail = (uint64_t*)AlignLo(nose + size, sizeof(uint64_t));

uint32_t crc = 0xFFFFFFFF;
Crc32(crc, nose, (uint8_t*)body);
Comment on lines +77 to +82
uint8_t* nose = (uint8_t*)src;
uint64_t* body = (uint64_t*)AlignHi(nose, sizeof(uint64_t));
uint64_t* tail = (uint64_t*)AlignLo(nose + size, sizeof(uint64_t));

uint32_t crc = 0xFFFFFFFF;
Crc32c(crc, nose, (uint8_t*)body);
Comment thread src/Simd/SimdLib.cpp
Comment on lines 264 to +271
SIMD_API uint32_t SimdCrc32(const void* src, size_t size)
{
return Base::Crc32(src, size);
#if defined(SIMD_NEON_ENABLE) && defined(SIMD_ARM64_ENABLE)
if (Neon::Enable)
return Neon::Crc32(src, size);
else
#endif
return Base::Crc32(src, size);
Comment thread src/Simd/SimdLib.cpp
Comment on lines 274 to 286
@@ -273,6 +278,11 @@ SIMD_API uint32_t SimdCrc32c(const void * src, size_t size)
if(Sse41::Enable)
return Sse41::Crc32c(src, size);
else
#endif
#if defined(SIMD_NEON_ENABLE) && defined(SIMD_ARM64_ENABLE)
if (Neon::Enable)
return Neon::Crc32c(src, size);
else
#endif
Comment on lines +67 to +70

*min = svminv_u8(svptrue_b32(), _min);
*max = svmaxv_u8(svptrue_b32(), _max);
*average = (uint8_t)((sum + width*height / 2) / (width*height));
Comment on lines +79 to +81
<p>Gets current global thread number configured for <a class="el" href="namespace_simd.html">Simd</a> Library parallel algorithms. </p>
<p >Returns the value set by <a class="el" href="group__thread.html#ga604c2f8bafd54c63bf7734c6d3da085f" title="Sets number of threads used by Simd Library to parallelize some algorithms.">SimdSetThreadNumber</a>. By default this value is <code>1</code>. When set, it is restricted to the range <code></code>[1, std::thread::hardware_concurrency()].</p>
<dl class="section return"><dt>Returns</dt><dd>current configured thread number. </dd></dl>
Comment thread docs/2026.html Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@ermig1979 ermig1979 merged commit c881abf into master May 26, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants