NEON : Complex operations from Armv8.3-a #1077

wewe5215 · 2023-10-16T09:54:07Z

This pull request includes initial implementations and corresponding test cases listed below

cadd_rot
cmla_lane
cmla_rot_lane

Sorry for the typo in commit fa9a14d.
It is [Neon] Add vcmla_rot270_lane_f{16/32} and vcmla_rot270_laneq_f{16/32} and vcmlaq_rot270_lane_f{16/32} and vcmlaq_rot270_laneq_f{16/32}

mr-c · 2023-10-16T13:48:57Z

About the binary operations and 16-bit floating points: they might work when SIMDE_FLOAT16_API is SIMDE_FLOAT16_API_FLOAT16 or SIMDE_FLOAT16_API_FP16;

wewe5215 · 2023-10-16T15:54:29Z

@mr-c

Thanks for your help. I have fixed it and pushed the code again!

simde/arm/neon/cmla_lane.h

mr-c · 2023-10-16T16:31:11Z

simde/arm/neon/cmla_lane.h

+  result = simde_float16x4_from_private(r_);
+  return result;


Combine these lines

mr-c · 2023-10-16T16:31:29Z

simde/arm/neon/cmla_lane.h

+#if defined(SIMDE_ARM_NEON_A32V8_ENABLE_NATIVE_ALIASES)
+#undef vcmla_lane_f16
+#define vcmla_lane_f16(r, a, b, lane) simde_vcmla_lane_f16(r, a, b, lane)
+#endif


Suggested change

#if defined(SIMDE_ARM_NEON_A32V8_ENABLE_NATIVE_ALIASES)

#undef vcmla_lane_f16

#define vcmla_lane_f16(r, a, b, lane) simde_vcmla_lane_f16(r, a, b, lane)

#endif

#if defined(SIMDE_ARM_NEON_A32V8_ENABLE_NATIVE_ALIASES)

#undef vcmla_lane_f16

#define vcmla_lane_f16(r, a, b, lane) simde_vcmla_lane_f16(r, a, b, lane)

#endif

simde/arm/neon/cmla_rot180_lane.h

mr-c · 2023-10-16T16:36:04Z

simde/arm/neon/cmla_rot90_lane.h

+  simde_float32x4_private r_ =
+                              simde_float32x4_to_private(simde_vcvt_f32_f16(r)),


This style is more readable

Suggested change

simde_float32x4_private r_ =

simde_float32x4_to_private(simde_vcvt_f32_f16(r)),

simde_float32x4_private r_ = simde_float32x4_to_private(

simde_vcvt_f32_f16(r)),

simde/arm/neon/cmla_rot90_lane.h

mr-c · 2023-10-16T16:38:10Z

test/arm/neon/cadd_rot270.c

+
+

In general, there are several instances of too many blank lines

Suggested change

mr-c · 2023-10-16T16:40:12Z

test/arm/neon/cmla_rot180_lane.c

+
+    // simde_float32x4_t r = simde_vcmlaq_rot180_laneq_f32(r_, a, b,
+    // test_vec[i].lane); simde_test_arm_neon_write_f32x4(2, r,
+    // SIMDE_TEST_VEC_POS_LAST);


Please tidy up your test code as well; thank you!

Suggested change

// simde_float32x4_t r = simde_vcmlaq_rot180_laneq_f32(r_, a, b,

// test_vec[i].lane); simde_test_arm_neon_write_f32x4(2, r,

// SIMDE_TEST_VEC_POS_LAST);

Hello! I've cleaned up the code and reformatted it using Clang-Format with the LLVM style. Apologies for the coding style and redundant comments.

Thanks for the clean up. SIMDe does not (yet) have an official clang-format style. I see that the LLVM style still uses ColumnLimit: 80, which I find to be too narrow.

I will add that guidance to https://github.com/simd-everywhere/simde/wiki/Coding-Style for future contributors

mr-c

It looks like all the compiler issues are fixed (I think the rpm-build:fedora-rawhide-i386 failure is not your fault). Please fix the formatting errors and merge/re-base on the latest commits in https://github.com/simd-everywhere/simde/tree/master ; I'll then merge this PR! Thank you @wewe5215 !

wewe5215 · 2023-10-17T11:27:47Z

@mr-c Hello! I have reformatted the code and rebased it on the latest commit. I really appreciate your patience!

…ne_f{16/32} and vcmlaq_laneq_f{16/32}

…nd vcmlaq_rot90_lane_f{16/32} and vcmlaq_rot90_laneq_f{16/32}

… and vcmlaq_rot180_lane_f{16/32} and vcmlaq_rot180_laneq_f{16/32}

…_f16 and vcmla{/q}_lane{/q}_f16

… vcmla{/q}_lane{/q}_f16

mr-c reviewed Oct 16, 2023

View reviewed changes

simde/arm/neon/cmla_lane.h Outdated Show resolved Hide resolved

mr-c reviewed Oct 16, 2023

View reviewed changes

simde/arm/neon/cmla_rot180_lane.h Outdated Show resolved Hide resolved

mr-c reviewed Oct 16, 2023

View reviewed changes

simde/arm/neon/cmla_rot90_lane.h Outdated Show resolved Hide resolved

mr-c reviewed Oct 16, 2023

View reviewed changes

mr-c requested changes Oct 17, 2023

View reviewed changes

朱季葳 added 18 commits October 17, 2023 13:29

[Neon] Add vcadd_rot270_f{16/32} and vcaddq_rot270_f{16/32/64}

78593be

[Neon] Add vcadd_rot90_f{16/32} and vcaddq_rot90_f{16/32/64}

617cf05

[Neon] Add vcmla_lane_f{16/32} and vcmla_laneq_f{16/32} and vcmlaq_la…

94cc32b

…ne_f{16/32} and vcmlaq_laneq_f{16/32}

[Neon] Add vcmla_rot90_lane_f{16/32} and vcmla_rot90_laneq_f{16/32} a…

596cafc

…nd vcmlaq_rot90_lane_f{16/32} and vcmlaq_rot90_laneq_f{16/32}

[Neon] Add vcmla_rot180_lane_f{16/32} and vcmla_rot180_laneq_f{16/32}…

4b6c6c7

… and vcmlaq_rot180_lane_f{16/32} and vcmlaq_rot180_laneq_f{16/32}

[Neon] Add vcadd_rot270_f{16/32} and vcaddq_rot270_f{16/32/64}

560be04

[Neon] : add meson.build and simde/arm/neon.h

1818df3

[Fix] : add newline

6fbd8c2

[Fix] : formatting the code

72f8a1c

[Fix] : add newline

318c72b

[Fix] : invalid operands to binary expression for f16

5866e62

[Fix] : operation for f16

7a7cc03

[Fix] : simde_vaddh_f16 missed

21fb644

[Fix] : invalid argument type 'simde_float16' to unary expression

bf58e4d

[Fix] : not using static const struct for f16

856b4a7

[Fix] : f16 intrinsic of cadd_rot270 and cadd_rot90

9f6cace

[Fix] : f16 intrinsics

dfc4481

[Fix] : format the code

d262180

朱季葳 and others added 11 commits October 17, 2023 13:29

[Fix] : add newline in test/arm/neon/cadd_rot270.c

0880dac

[Fix] : remove comment for test code

d3fab5b

[Fix] : coding style

e42ff32

[Fix] : warning of unused variable

609c46f

[Fix] : use another way to implement f16 functions

3eac558

[Fix] : use implementation of f16 functions

e966085

[Fix] : delete conflicting type

20131b3

[Fix] : another implementation for vcmla{/q}_rot{180/270/90}_lane{/q}…

0c531c3

…_f16 and vcmla{/q}_lane{/q}_f16

[Fix] : implementation for vcmla{/q}_rot{180/270/90}_lane{/q}_f16 and…

788e06e

… vcmla{/q}_lane{/q}_f16

[Fix] : elements in shuffle vector

790d471

[Fix] : formatting with ColumnLimit = 125, IdentWidth = 2, TabWidth = 4

19ed113

mr-c force-pushed the complex_operation branch from c51131a to 19ed113 Compare October 17, 2023 11:29

mr-c approved these changes Oct 17, 2023

View reviewed changes

mr-c enabled auto-merge (squash) October 17, 2023 11:34

mr-c merged commit d08d67c into simd-everywhere:master Oct 17, 2023
80 of 81 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NEON : Complex operations from Armv8.3-a #1077

NEON : Complex operations from Armv8.3-a #1077

wewe5215 commented Oct 16, 2023 •

edited

mr-c commented Oct 16, 2023

wewe5215 commented Oct 16, 2023 •

edited

mr-c Oct 16, 2023

mr-c Oct 16, 2023

mr-c Oct 16, 2023

mr-c Oct 16, 2023

mr-c Oct 16, 2023

wewe5215 Oct 17, 2023

mr-c Oct 17, 2023

mr-c left a comment

wewe5215 commented Oct 17, 2023

		simde_float32x4_private r_ =
		simde_float32x4_to_private(simde_vcvt_f32_f16(r)),

NEON : Complex operations from Armv8.3-a #1077

NEON : Complex operations from Armv8.3-a #1077

Conversation

wewe5215 commented Oct 16, 2023 • edited

mr-c commented Oct 16, 2023

wewe5215 commented Oct 16, 2023 • edited

mr-c Oct 16, 2023

Choose a reason for hiding this comment

mr-c Oct 16, 2023

Choose a reason for hiding this comment

mr-c Oct 16, 2023

Choose a reason for hiding this comment

mr-c Oct 16, 2023

Choose a reason for hiding this comment

mr-c Oct 16, 2023

Choose a reason for hiding this comment

wewe5215 Oct 17, 2023

Choose a reason for hiding this comment

mr-c Oct 17, 2023

Choose a reason for hiding this comment

mr-c left a comment

Choose a reason for hiding this comment

wewe5215 commented Oct 17, 2023

wewe5215 commented Oct 16, 2023 •

edited

wewe5215 commented Oct 16, 2023 •

edited