-
Notifications
You must be signed in to change notification settings - Fork 15k
wasm: recognize any_true and all_true
#155885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
hi @folkertdev, I didn't quite spend much time on the PR but it seems like you're failing at isel stage because it cant find the pattern of So if you add these to the tablegen file it shouldn't fail anymore However you should be careful about these works, idk where in the codebase this happens (which file specifically) yet (my LLDB got stuck for 5min trying to load the program) but if you run following command to reproduce, you'll find that t3, t4, t5 is being turned into t8 uncaringly between we haven't proved that the other bits besides LSB bit in t2 is 0 yet before doing so, so if these transform were to go through while some bits other than LSB is 1 and LSB is 0, this will result in the wrong result. I'm still a bit new so not sure if that transformation i pointed out is correct by design or not but it seems that seems like a bigger issue if you were to set the VEC_REDUCE to legal |
|
I wasn't sure that that would be accurate, based on: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-and-intrinsic
While the wasm spec says that Maybe I'm missing something though? I'm open to other suggestions, of course, if marking those operations as legal is not a good approach. Marking them as |
|
I've now implemented a manual combine between This seems acceptable to me: ; SIMD128-LABEL: pairwise_or_v2i64:
; SIMD128: .functype pairwise_or_v2i64 (v128) -> (i64)
; SIMD128-NEXT: # %bb.0:
-; SIMD128-NEXT: i8x16.shuffle $push0=, $0, $0, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7
-; SIMD128-NEXT: v128.or $push1=, $0, $pop0
-; SIMD128-NEXT: i64x2.extract_lane $push2=, $pop1, 0
+; SIMD128-NEXT: i64x2.extract_lane $push1=, $0, 0
+; SIMD128-NEXT: i64x2.extract_lane $push0=, $0, 1
+; SIMD128-NEXT: i64.or $push2=, $pop1, $pop0
; SIMD128-NEXT: return $pop2
%res = tail call i64 @llvm.vector.reduce.or.v2i64(<2 x i64> %arg)
ret i64 %resBut in some cases it looks like the previous implementation was a lot smarter about the vector reduction than the default lowering. ; SIMD128-LABEL: pairwise_or_v8i16:
; SIMD128: .functype pairwise_or_v8i16 (v128) -> (i32)
; SIMD128-NEXT: # %bb.0:
-; SIMD128-NEXT: i8x16.shuffle $push0=, $0, $0, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 0, 1, 0, 1, 0, 1
-; SIMD128-NEXT: v128.or $push8=, $0, $pop0
-; SIMD128-NEXT: local.tee $push7=, $0=, $pop8
-; SIMD128-NEXT: i8x16.shuffle $push1=, $0, $0, 4, 5, 6, 7, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
-; SIMD128-NEXT: v128.or $push6=, $pop7, $pop1
-; SIMD128-NEXT: local.tee $push5=, $0=, $pop6
-; SIMD128-NEXT: i8x16.shuffle $push2=, $0, $0, 2, 3, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1
-; SIMD128-NEXT: v128.or $push3=, $pop5, $pop2
-; SIMD128-NEXT: i16x8.extract_lane_u $push4=, $pop3, 0
-; SIMD128-NEXT: return $pop4
+; SIMD128-NEXT: i16x8.extract_lane_u $push1=, $0, 0
+; SIMD128-NEXT: i16x8.extract_lane_u $push0=, $0, 1
+; SIMD128-NEXT: i32.or $push2=, $pop1, $pop0
+; SIMD128-NEXT: i16x8.extract_lane_u $push3=, $0, 2
+; SIMD128-NEXT: i32.or $push4=, $pop2, $pop3
+; SIMD128-NEXT: i16x8.extract_lane_u $push5=, $0, 3
+; SIMD128-NEXT: i32.or $push6=, $pop4, $pop5
+; SIMD128-NEXT: i16x8.extract_lane_u $push7=, $0, 4
+; SIMD128-NEXT: i32.or $push8=, $pop6, $pop7
+; SIMD128-NEXT: i16x8.extract_lane_u $push9=, $0, 5
+; SIMD128-NEXT: i32.or $push10=, $pop8, $pop9
+; SIMD128-NEXT: i16x8.extract_lane_u $push11=, $0, 6
+; SIMD128-NEXT: i32.or $push12=, $pop10, $pop11
+; SIMD128-NEXT: i16x8.extract_lane_u $push13=, $0, 7
+; SIMD128-NEXT: i32.or $push14=, $pop12, $pop13
+; SIMD128-NEXT: return $pop14
%res = tail call i16 @llvm.vector.reduce.or.v8i16(<8 x i16> %arg)
ret i16 %res
}What is the best way forward here? |
cc @lukel97 @badumbatish #145108
I've been learning a bit about LLVM, trying to make progress on some of these issues. The code below is based on #145108 (comment), by implementing
shouldExpandReduction.The implementation works for the test cases I added, but (obviously) fails for any existing cases.
ISD::VECREDUCE_ANDandISD::VECREDUCE_ORare now marked as legal, which is required for thePats to fire, but when they don't that causes a selection failure.So, I'm wondering, what is the right approach here. Should I mark these intrinsics as
Custominstead and manually perform the transformation in C++? Or is there some trick to still get the default lowering (a series of loads and scalar bitwise operations) when the patterns don't fire?