Use the custom implementation of multipliedFullWidth on arm64_32 #37905

stephentyrone · 2021-06-14T20:27:02Z

Previously we were falling back on the generic implementation for 64b integers, which resulted in the following codegen:

00000008	asr	x8, x0, #32
0000000c	asr	x9, x0, #63
00000010	cmp	x0, #0x0
00000014	cinv	w10, w0, lt
00000018	eor	w9, w10, w9
0000001c	asr	x10, x1, #32
00000020	asr	x11, x1, #63
00000024	cmp	x1, #0x0
00000028	cinv	w12, w1, lt
0000002c	eor	w11, w12, w11
00000030	umull	x12, w11, w9
00000034	mul	x11, x11, x8
00000038	add	x11, x11, x12, lsr #32
0000003c	asr	x12, x11, #63
00000040	cmp	x11, #0x0
00000044	cinv	w13, w11, lt
00000048	eor	w12, w13, w12
0000004c	madd	x9, x9, x10, x12
00000050	mul	x8, x10, x8
00000054	add	x8, x8, x11, asr #32
00000058	add	x0, x8, x9, asr #32

Instead, we should use the 64b implementation when targeting arm64_32, which allows us to generate:

00000008	smulh	x0, x1, x0

Unsurprisingly, this is considerably faster, though I don't think it will show up in any existing benchmarks.

Previously we were falling back on the generic implementation for 64b integers, which resulted in the following codegen: 00000008 asr x8, x0, swiftlang#32 0000000c asr x9, x0, swiftlang#63 00000010 cmp x0, #0x0 00000014 cinv w10, w0, lt 00000018 eor w9, w10, w9 0000001c asr x10, x1, swiftlang#32 00000020 asr x11, x1, swiftlang#63 00000024 cmp x1, #0x0 00000028 cinv w12, w1, lt 0000002c eor w11, w12, w11 00000030 umull x12, w11, w9 00000034 mul x11, x11, x8 00000038 add x11, x11, x12, lsr swiftlang#32 0000003c asr x12, x11, swiftlang#63 00000040 cmp x11, #0x0 00000044 cinv w13, w11, lt 00000048 eor w12, w13, w12 0000004c madd x9, x9, x10, x12 00000050 mul x8, x10, x8 00000054 add x8, x8, x11, asr swiftlang#32 00000058 add x0, x8, x9, asr swiftlang#32 0000005c ret Instead, we should use the 64b implementation when targeting arm64_32, which allows us to generate: 00000008 smulh x0, x1, x0 0000000c ret Unsurprisingly, this is considerably faster.

stephentyrone · 2021-06-14T20:27:36Z

@swift-ci please test

stephentyrone merged commit 9956097 into swiftlang:main Jun 15, 2021

stephentyrone deleted the arm64_32-double-width-multiply branch June 15, 2021 01:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the custom implementation of multipliedFullWidth on arm64_32 #37905

Use the custom implementation of multipliedFullWidth on arm64_32 #37905

stephentyrone commented Jun 14, 2021 •

edited

Loading

stephentyrone commented Jun 14, 2021

Use the custom implementation of multipliedFullWidth on arm64_32 #37905

Use the custom implementation of multipliedFullWidth on arm64_32 #37905

Conversation

stephentyrone commented Jun 14, 2021 • edited Loading

stephentyrone commented Jun 14, 2021

stephentyrone commented Jun 14, 2021 •

edited

Loading