Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the custom implementation of multipliedFullWidth on arm64_32 #37905

Merged

Conversation

stephentyrone
Copy link
Contributor

@stephentyrone stephentyrone commented Jun 14, 2021

Previously we were falling back on the generic implementation for 64b integers, which resulted in the following codegen:

00000008	asr	x8, x0, #32
0000000c	asr	x9, x0, #63
00000010	cmp	x0, #0x0
00000014	cinv	w10, w0, lt
00000018	eor	w9, w10, w9
0000001c	asr	x10, x1, #32
00000020	asr	x11, x1, #63
00000024	cmp	x1, #0x0
00000028	cinv	w12, w1, lt
0000002c	eor	w11, w12, w11
00000030	umull	x12, w11, w9
00000034	mul	x11, x11, x8
00000038	add	x11, x11, x12, lsr #32
0000003c	asr	x12, x11, #63
00000040	cmp	x11, #0x0
00000044	cinv	w13, w11, lt
00000048	eor	w12, w13, w12
0000004c	madd	x9, x9, x10, x12
00000050	mul	x8, x10, x8
00000054	add	x8, x8, x11, asr #32
00000058	add	x0, x8, x9, asr #32

Instead, we should use the 64b implementation when targeting arm64_32, which allows us to generate:

00000008	smulh	x0, x1, x0

Unsurprisingly, this is considerably faster, though I don't think it will show up in any existing benchmarks.

Previously we were falling back on the generic implementation for 64b integers, which resulted in the following codegen:

00000008	asr	x8, x0, swiftlang#32
0000000c	asr	x9, x0, swiftlang#63
00000010	cmp	x0, #0x0
00000014	cinv	w10, w0, lt
00000018	eor	w9, w10, w9
0000001c	asr	x10, x1, swiftlang#32
00000020	asr	x11, x1, swiftlang#63
00000024	cmp	x1, #0x0
00000028	cinv	w12, w1, lt
0000002c	eor	w11, w12, w11
00000030	umull	x12, w11, w9
00000034	mul	x11, x11, x8
00000038	add	x11, x11, x12, lsr swiftlang#32
0000003c	asr	x12, x11, swiftlang#63
00000040	cmp	x11, #0x0
00000044	cinv	w13, w11, lt
00000048	eor	w12, w13, w12
0000004c	madd	x9, x9, x10, x12
00000050	mul	x8, x10, x8
00000054	add	x8, x8, x11, asr swiftlang#32
00000058	add	x0, x8, x9, asr swiftlang#32
0000005c	ret

Instead, we should use the 64b implementation when targeting arm64_32, which allows us to generate:

00000008	smulh	x0, x1, x0
0000000c	ret

Unsurprisingly, this is considerably faster.
@stephentyrone
Copy link
Contributor Author

@swift-ci please test

@stephentyrone stephentyrone merged commit 9956097 into swiftlang:main Jun 15, 2021
@stephentyrone stephentyrone deleted the arm64_32-double-width-multiply branch June 15, 2021 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant