Fix `vec128_half_neon.h` compilation with GCC #139235

malfet · 2024-10-29T23:40:20Z

mask is already defined as uint16x8_t no need to reinterpret it

pytorch/aten/src/ATen/cpu/vec/vec128/vec128_half_neon.h

Line 220 in bd369bb

uint16x8_t mask = vld1q_u16(pre_mask);

Fixes

var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec128/vec128_half_neon.h: In static member function 'static at::vec::DEFAULT::Vectorized<c10::Half> at::vec::DEFAULT::Vectorized<c10::Half>::set(const at::vec::DEFAULT::Vectorized<c10::Half>&, const at::vec::DEFAULT::Vectorized<c10::Half>&, int64_t)':
/var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec128/vec128_half_neon.h:227:39: error: cannot convert 'uint16x8_t' to 'float16x8_t'
  227 |                 vreinterpretq_u16_f16(mask),
      |                                       ^~~~
      |                                       |
      |                                       uint16x8_t
In file included from /var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/intrinsics.h:23,
                 from /var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec128/vec128.h:4,
                 from /var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec.h:6,
                 from /var/lib/jenkins/workspace/aten/src/ATen/test/vec_test_all_types.h:2,
                 from /var/lib/jenkins/workspace/aten/src/ATen/test/vec_test_all_types.cpp:1:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:5841:36: note:   initializing argument 1 of 'uint16x8_t vreinterpretq_u16_f16(float16x8_t)'
 5841 | vreinterpretq_u16_f16 (float16x8_t __a)
      |                        ~~~~~~~~~~~~^~~

introduced by #137911

Also, guard any use of NEON intrinsics in ReducedPrecisionFloatGemvFastPathKernel.cpp with !defined(CPU_CAPABILITY_SVE) otherwise compilation fails with

/var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp: In function 'float at::native::SVE256::reduce(at::vec::SVE256::VectorizedN<c10::Half, 16>&)':
/var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp:77:24: error: cannot convert 'at::vec::SVE256::Vectorized<float>' to 'float32x4_t'
   77 |   return vaddvq_f32(t0 + t1);
      |                     ~~~^~~~
      |                        |
      |                        at::vec::SVE256::Vectorized<float>
In file included from /var/lib/jenkins/workspace/c10/util/Half.h:51,
                 from /var/lib/jenkins/workspace/c10/util/Float8_e5m2.h:17,
                 from /var/lib/jenkins/workspace/c10/core/ScalarType.h:8,
                 from /var/lib/jenkins/workspace/c10/core/TensorImpl.h:11,
                 from /var/lib/jenkins/workspace/c10/core/GeneratorImpl.h:8,
                 from /var/lib/jenkins/workspace/aten/src/ATen/core/Generator.h:18,
                 from /var/lib/jenkins/workspace/aten/src/ATen/CPUGeneratorImpl.h:3,
                 from /var/lib/jenkins/workspace/aten/src/ATen/Context.h:4,
                 from /var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp:2,
                 from /var/lib/jenkins/workspace/build/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp.SVE256.cpp:1:
/usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:10423:25: note:   initializing argument 1 of 'float32_t vaddvq_f32(float32x4_t)'
10423 | vaddvq_f32 (float32x4_t __a)
      |             ~~~~~~~~~~~~^~~
In file included from /var/lib/jenkins/workspace/build/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp.SVE256.cpp:1:
/var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp: In function 'float at::native::SVE256::reduce(at::vec::SVE256::Vectorized<float>)':
/var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp:119:21: error: cannot convert 'at::vec::SVE256::Vectorized<float>' to 'float32x4_t'
  119 |   return vaddvq_f32(x);
      |                     ^
      |                     |
      |                     at::vec::SVE256::Vectorized<float>

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

`mask` is already `uint16x8_t` no need to reinterpret it

pytorch-bot · 2024-10-29T23:40:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139235

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6d9c71a with merge base bd369bb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet · 2024-10-30T00:46:30Z

@pytorchbot merge -f "Lint + relevant builds have passed"

pytorchmergebot · 2024-10-30T00:48:45Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

`mask` is already defined as `uint16x8_t` no need to reinterpret it https://github.com/pytorch/pytorch/blob/bd369bb18258fc3be5ee91f8fcaf06a4b6fc41a7/aten/src/ATen/cpu/vec/vec128/vec128_half_neon.h#L220 Fixes ``` var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec128/vec128_half_neon.h: In static member function 'static at::vec::DEFAULT::Vectorized<c10::Half> at::vec::DEFAULT::Vectorized<c10::Half>::set(const at::vec::DEFAULT::Vectorized<c10::Half>&, const at::vec::DEFAULT::Vectorized<c10::Half>&, int64_t)': /var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec128/vec128_half_neon.h:227:39: error: cannot convert 'uint16x8_t' to 'float16x8_t' 227 | vreinterpretq_u16_f16(mask), | ^~~~ | | | uint16x8_t In file included from /var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/intrinsics.h:23, from /var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec128/vec128.h:4, from /var/lib/jenkins/workspace/aten/src/ATen/cpu/vec/vec.h:6, from /var/lib/jenkins/workspace/aten/src/ATen/test/vec_test_all_types.h:2, from /var/lib/jenkins/workspace/aten/src/ATen/test/vec_test_all_types.cpp:1: /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:5841:36: note: initializing argument 1 of 'uint16x8_t vreinterpretq_u16_f16(float16x8_t)' 5841 | vreinterpretq_u16_f16 (float16x8_t __a) | ~~~~~~~~~~~~^~~ ``` introduced by pytorch#137911 Also, guard any use of NEON intrinsics in `ReducedPrecisionFloatGemvFastPathKernel.cpp` with `!defined(CPU_CAPABILITY_SVE)` otherwise compilation fails with ``` /var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp: In function 'float at::native::SVE256::reduce(at::vec::SVE256::VectorizedN<c10::Half, 16>&)': /var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp:77:24: error: cannot convert 'at::vec::SVE256::Vectorized<float>' to 'float32x4_t' 77 | return vaddvq_f32(t0 + t1); | ~~~^~~~ | | | at::vec::SVE256::Vectorized<float> In file included from /var/lib/jenkins/workspace/c10/util/Half.h:51, from /var/lib/jenkins/workspace/c10/util/Float8_e5m2.h:17, from /var/lib/jenkins/workspace/c10/core/ScalarType.h:8, from /var/lib/jenkins/workspace/c10/core/TensorImpl.h:11, from /var/lib/jenkins/workspace/c10/core/GeneratorImpl.h:8, from /var/lib/jenkins/workspace/aten/src/ATen/core/Generator.h:18, from /var/lib/jenkins/workspace/aten/src/ATen/CPUGeneratorImpl.h:3, from /var/lib/jenkins/workspace/aten/src/ATen/Context.h:4, from /var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp:2, from /var/lib/jenkins/workspace/build/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp.SVE256.cpp:1: /usr/lib/gcc/aarch64-linux-gnu/11/include/arm_neon.h:10423:25: note: initializing argument 1 of 'float32_t vaddvq_f32(float32x4_t)' 10423 | vaddvq_f32 (float32x4_t __a) | ~~~~~~~~~~~~^~~ In file included from /var/lib/jenkins/workspace/build/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp.SVE256.cpp:1: /var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp: In function 'float at::native::SVE256::reduce(at::vec::SVE256::Vectorized<float>)': /var/lib/jenkins/workspace/aten/src/ATen/native/cpu/ReducedPrecisionFloatGemvFastPathKernel.cpp:119:21: error: cannot convert 'at::vec::SVE256::Vectorized<float>' to 'float32x4_t' 119 | return vaddvq_f32(x); | ^ | | | at::vec::SVE256::Vectorized<float> ``` Pull Request resolved: pytorch#139235 Approved by: https://github.com/huydhn

Fix vec128_half_neon.h compilation with GCC

958182d

`mask` is already `uint16x8_t` no need to reinterpret it

malfet requested a review from swolchok October 29, 2024 23:40

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Oct 29, 2024

malfet added ciflow/linux-aarch64 linux aarch64 CI workflow topic: not user facing topic category labels Oct 29, 2024

huydhn approved these changes Oct 29, 2024

View reviewed changes

malfet added 2 commits October 29, 2024 17:23

And fix those

784a2b4

ALL SVE

6d9c71a

pytorchmergebot added the merging label Oct 30, 2024

pytorchmergebot added the Merged label Oct 30, 2024

pytorchmergebot closed this in f643499 Oct 30, 2024

pytorchmergebot removed the merging label Oct 30, 2024

github-actions bot deleted the malfet-patch-24 branch November 30, 2024 02:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `vec128_half_neon.h` compilation with GCC #139235

Fix `vec128_half_neon.h` compilation with GCC #139235

Uh oh!

malfet commented Oct 29, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 29, 2024 •

edited

Loading

Uh oh!

malfet commented Oct 30, 2024

Uh oh!

pytorchmergebot commented Oct 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix vec128_half_neon.h compilation with GCC #139235

Fix vec128_half_neon.h compilation with GCC #139235

Uh oh!

Conversation

malfet commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139235

✅ No Failures

Uh oh!

malfet commented Oct 30, 2024

Uh oh!

pytorchmergebot commented Oct 30, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix `vec128_half_neon.h` compilation with GCC #139235

Fix `vec128_half_neon.h` compilation with GCC #139235

malfet commented Oct 29, 2024 •

edited

Loading

pytorch-bot bot commented Oct 29, 2024 •

edited

Loading