runtime: memequal missing short circuit for equal pointers on arm64 #64381

josharian · 2023-11-25T01:26:32Z

Go version

tip as of Nov 24, 2023

Reproducibility

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (`go env`)?

not relevant

What did you do?

amd64 memequal starts by comparing pointers and returning true if they are true.

go/src/internal/bytealg/equal_amd64.s

Lines 14 to 17 in 0c7e5d3

    
           CMPQ	AX, BX 
        
           JNE	neq 
        
           MOVQ	$1, AX	// return 1 
        
           RET

arm64 memequal is missing this optimization.

go/src/internal/bytealg/equal_arm64.s

Line 12 in 0c7e5d3

B memeqbody<>(SB)

(Other architectures may also be missing it. I did not check.)

Initially reported as josharian/intern#3

What did you expect to see?

Fast string comparisons of identical strings (same string headers).

What did you see instead?

Comparing their contents on arm64.

The text was updated successfully, but these errors were encountered:

josharian · 2023-11-25T01:27:08Z

cc @cherrymui @randall77

robpike · 2023-11-25T07:13:09Z

Does it matter? How often do we ask if memequal returns true when both arguments point to the same spot? Can it be worthwhile, amortized over all program executions?

josharian · 2023-11-25T14:04:50Z

There are some cases in which it matters a lot.

For example, if you’re interning strings, and then doing lots of string comparisons (e.g. as part of a diff algorithm), the cost savings of having string comparisons be mostly O(1) instead of O(length) is pretty significant.

(I say mostly because this leaves string comparisons not O(1) when the strings are different but have equal lengths.)

Anyway, this is one of those things that triggers more than you think it would, because it helps any time you test whether a string or struct is equal to itself.

josharian · 2023-11-25T14:39:01Z

…and if despiteallobjections it is found not to matter, we should get rid of it on amd64 et al, to avoid drastic performance swings on some programs across architectures.

cherrymui · 2023-11-27T15:01:33Z

@josharian feel free to send a CL. Thanks.

mauri870 · 2023-11-27T17:14:29Z

I did some testing on this for the go1 benchmarks, on average a 0.5% performance improvement.

This would certainly benefit a lot strings and bytes.Equal if you are doing a lot of comparisons, struct and direct pointer comparisons I believe would not be affected, following the generated code https://go.godbolt.org/z/nncY76M6x

diff --git a/src/internal/bytealg/equal_arm64.s b/src/internal/bytealg/equal_arm64.s
index d3aabba587..13eb5fb1bf 100644
--- a/src/internal/bytealg/equal_arm64.s
+++ b/src/internal/bytealg/equal_arm64.s
@@ -7,6 +7,9 @@
 
 // memequal(a, b unsafe.Pointer, size uintptr) bool
 TEXT runtime·memequal<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-25
+	// short path to handle equal pointers
+	CMP	R0, R1
+	BEQ	equal
 	// short path to handle 0-byte case
 	CBZ	R2, equal
 	B	memeqbody<>(SB)

mauri870 · 2023-11-27T18:03:21Z

FYI I've been looking around at the other architectures. From what I can tell, it seems like arm64 is the only one where this optimization hasn't been applied yet.

randall77 · 2023-11-27T18:12:20Z

The short circuit is already present in memequal_varlen on arm64, just not memequal.

josharian · 2023-11-28T04:11:31Z

@mauri870 I won’t be sending a CL. Feel free to send yours.

If memequal is invoked with the same pointers as arguments it ends up comparing the whole contents, instead of just comparing the pointers. This effectively makes an operation that could be O(1) into O(n). All the other architectures already have this optimization inplace. For instance, arm64 also have it, in memequal_varlen. Such optimization is very specific, one case that it will probably benefit is programs that rely heavily on interning of strings. ``` goos: darwin goarch: arm64 pkg: bytes │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Equal/same/1-8 2.678n ± ∞ ¹ 2.400n ± ∞ ¹ -10.38% (p=0.008 n=5) Equal/same/6-8 3.267n ± ∞ ¹ 2.431n ± ∞ ¹ -25.59% (p=0.008 n=5) Equal/same/9-8 2.981n ± ∞ ¹ 2.385n ± ∞ ¹ -19.99% (p=0.008 n=5) Equal/same/15-8 2.974n ± ∞ ¹ 2.390n ± ∞ ¹ -19.64% (p=0.008 n=5) Equal/same/16-8 2.983n ± ∞ ¹ 2.380n ± ∞ ¹ -20.21% (p=0.008 n=5) Equal/same/20-8 3.567n ± ∞ ¹ 2.384n ± ∞ ¹ -33.17% (p=0.008 n=5) Equal/same/32-8 3.568n ± ∞ ¹ 2.385n ± ∞ ¹ -33.16% (p=0.008 n=5) Equal/same/4K-8 78.040n ± ∞ ¹ 2.378n ± ∞ ¹ -96.95% (p=0.008 n=5) Equal/same/4M-8 78713.000n ± ∞ ¹ 2.385n ± ∞ ¹ -100.00% (p=0.008 n=5) Equal/same/64M-8 1348095.000n ± ∞ ¹ 2.381n ± ∞ ¹ -100.00% (p=0.008 n=5) geomean 43.52n 2.390n -94.51% ¹ need >= 6 samples for confidence interval at level 0.95 │ old.txt │ new.txt │ │ B/s │ B/s vs base │ Equal/same/1-8 356.1Mi ± ∞ ¹ 397.3Mi ± ∞ ¹ +11.57% (p=0.008 n=5) Equal/same/6-8 1.711Gi ± ∞ ¹ 2.298Gi ± ∞ ¹ +34.35% (p=0.008 n=5) Equal/same/9-8 2.812Gi ± ∞ ¹ 3.515Gi ± ∞ ¹ +24.99% (p=0.008 n=5) Equal/same/15-8 4.698Gi ± ∞ ¹ 5.844Gi ± ∞ ¹ +24.41% (p=0.008 n=5) Equal/same/16-8 4.995Gi ± ∞ ¹ 6.260Gi ± ∞ ¹ +25.34% (p=0.008 n=5) Equal/same/20-8 5.222Gi ± ∞ ¹ 7.814Gi ± ∞ ¹ +49.63% (p=0.008 n=5) Equal/same/32-8 8.353Gi ± ∞ ¹ 12.496Gi ± ∞ ¹ +49.59% (p=0.008 n=5) Equal/same/4K-8 48.88Gi ± ∞ ¹ 1603.96Gi ± ∞ ¹ +3181.17% (p=0.008 n=5) Equal/same/4M-8 49.63Gi ± ∞ ¹ 1637911.85Gi ± ∞ ¹ +3300381.91% (p=0.008 n=5) Equal/same/64M-8 46.36Gi ± ∞ ¹ 26253069.97Gi ± ∞ ¹ +56626517.99% (p=0.008 n=5) geomean 6.737Gi 122.7Gi +1721.01% ¹ need >= 6 samples for confidence interval at level 0.95 ``` Fixes golang#64381

gopherbot · 2023-11-28T10:25:15Z

Change https://go.dev/cl/545416 mentions this issue: runtime: short path for equal pointers in arm64 memequal

If memequal is invoked with the same pointers as arguments it ends up comparing the whole memory contents, instead of just comparing the pointers. This effectively makes an operation that could be O(1) into O(n). All the other architectures already have this optimization in place. For instance, arm64 also have it, in memequal_varlen. Such optimization is very specific, one case that it will probably benefit is programs that rely heavily on interning of strings. goos: darwin goarch: arm64 pkg: bytes │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Equal/same/1-8 2.678n ± ∞ ¹ 2.400n ± ∞ ¹ -10.38% (p=0.008 n=5) Equal/same/6-8 3.267n ± ∞ ¹ 2.431n ± ∞ ¹ -25.59% (p=0.008 n=5) Equal/same/9-8 2.981n ± ∞ ¹ 2.385n ± ∞ ¹ -19.99% (p=0.008 n=5) Equal/same/15-8 2.974n ± ∞ ¹ 2.390n ± ∞ ¹ -19.64% (p=0.008 n=5) Equal/same/16-8 2.983n ± ∞ ¹ 2.380n ± ∞ ¹ -20.21% (p=0.008 n=5) Equal/same/20-8 3.567n ± ∞ ¹ 2.384n ± ∞ ¹ -33.17% (p=0.008 n=5) Equal/same/32-8 3.568n ± ∞ ¹ 2.385n ± ∞ ¹ -33.16% (p=0.008 n=5) Equal/same/4K-8 78.040n ± ∞ ¹ 2.378n ± ∞ ¹ -96.95% (p=0.008 n=5) Equal/same/4M-8 78713.000n ± ∞ ¹ 2.385n ± ∞ ¹ -100.00% (p=0.008 n=5) Equal/same/64M-8 1348095.000n ± ∞ ¹ 2.381n ± ∞ ¹ -100.00% (p=0.008 n=5) geomean 43.52n 2.390n -94.51% ¹ need >= 6 samples for confidence interval at level 0.95 │ old.txt │ new.txt │ │ B/s │ B/s vs base │ Equal/same/1-8 356.1Mi ± ∞ ¹ 397.3Mi ± ∞ ¹ +11.57% (p=0.008 n=5) Equal/same/6-8 1.711Gi ± ∞ ¹ 2.298Gi ± ∞ ¹ +34.35% (p=0.008 n=5) Equal/same/9-8 2.812Gi ± ∞ ¹ 3.515Gi ± ∞ ¹ +24.99% (p=0.008 n=5) Equal/same/15-8 4.698Gi ± ∞ ¹ 5.844Gi ± ∞ ¹ +24.41% (p=0.008 n=5) Equal/same/16-8 4.995Gi ± ∞ ¹ 6.260Gi ± ∞ ¹ +25.34% (p=0.008 n=5) Equal/same/20-8 5.222Gi ± ∞ ¹ 7.814Gi ± ∞ ¹ +49.63% (p=0.008 n=5) Equal/same/32-8 8.353Gi ± ∞ ¹ 12.496Gi ± ∞ ¹ +49.59% (p=0.008 n=5) Equal/same/4K-8 48.88Gi ± ∞ ¹ 1603.96Gi ± ∞ ¹ +3181.17% (p=0.008 n=5) Equal/same/4M-8 49.63Gi ± ∞ ¹ 1637911.85Gi ± ∞ ¹ +3300381.91% (p=0.008 n=5) Equal/same/64M-8 46.36Gi ± ∞ ¹ 26253069.97Gi ± ∞ ¹ +56626517.99% (p=0.008 n=5) geomean 6.737Gi 122.7Gi +1721.01% ¹ need >= 6 samples for confidence interval at level 0.95 Fixes golang#64381 Change-Id: I7d423930a688edd88c4ba60d45e097296d9be852 GitHub-Last-Rev: ae8189f GitHub-Pull-Request: golang#64419 Reviewed-on: https://go-review.googlesource.com/c/go/+/545416 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 25, 2023

josharian mentioned this issue Nov 25, 2023

add Equal func josharian/intern#3

Closed

bcmills added Performance arch-arm64 labels Nov 27, 2023

dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 27, 2023

mauri870 mentioned this issue Nov 28, 2023

runtime: short path for equal pointers in arm64 memequal #64419

Closed

mauri870 added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 28, 2023

dmitshur added this to the Go1.23 milestone Nov 28, 2023

gopherbot closed this as completed in 66b776b Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: memequal missing short circuit for equal pointers on arm64 #64381

runtime: memequal missing short circuit for equal pointers on arm64 #64381

josharian commented Nov 25, 2023 •

edited

josharian commented Nov 25, 2023

robpike commented Nov 25, 2023

josharian commented Nov 25, 2023

josharian commented Nov 25, 2023

cherrymui commented Nov 27, 2023

mauri870 commented Nov 27, 2023 •

edited

mauri870 commented Nov 27, 2023

randall77 commented Nov 27, 2023

josharian commented Nov 28, 2023

gopherbot commented Nov 28, 2023

runtime: memequal missing short circuit for equal pointers on arm64 #64381

runtime: memequal missing short circuit for equal pointers on arm64 #64381

Comments

josharian commented Nov 25, 2023 • edited

Go version

Reproducibility

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

josharian commented Nov 25, 2023

robpike commented Nov 25, 2023

josharian commented Nov 25, 2023

josharian commented Nov 25, 2023

cherrymui commented Nov 27, 2023

mauri870 commented Nov 27, 2023 • edited

mauri870 commented Nov 27, 2023

randall77 commented Nov 27, 2023

josharian commented Nov 28, 2023

gopherbot commented Nov 28, 2023

josharian commented Nov 25, 2023 •

edited

What operating system and processor architecture are you using (`go env`)?

mauri870 commented Nov 27, 2023 •

edited