Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memequal missing short circuit for equal pointers on arm64 #64381

Closed
1 task done
josharian opened this issue Nov 25, 2023 · 10 comments
Closed
1 task done

runtime: memequal missing short circuit for equal pointers on arm64 #64381

josharian opened this issue Nov 25, 2023 · 10 comments
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@josharian
Copy link
Contributor

josharian commented Nov 25, 2023

Go version

tip as of Nov 24, 2023

Reproducibility

  • Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

not relevant

What did you do?

amd64 memequal starts by comparing pointers and returning true if they are true.

CMPQ AX, BX
JNE neq
MOVQ $1, AX // return 1
RET

arm64 memequal is missing this optimization.

B memeqbody<>(SB)

(Other architectures may also be missing it. I did not check.)

Initially reported as josharian/intern#3

What did you expect to see?

Fast string comparisons of identical strings (same string headers).

What did you see instead?

Comparing their contents on arm64.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 25, 2023
@josharian
Copy link
Contributor Author

cc @cherrymui @randall77

@robpike
Copy link
Contributor

robpike commented Nov 25, 2023

Does it matter? How often do we ask if memequal returns true when both arguments point to the same spot? Can it be worthwhile, amortized over all program executions?

@josharian
Copy link
Contributor Author

There are some cases in which it matters a lot.

For example, if you’re interning strings, and then doing lots of string comparisons (e.g. as part of a diff algorithm), the cost savings of having string comparisons be mostly O(1) instead of O(length) is pretty significant.

(I say mostly because this leaves string comparisons not O(1) when the strings are different but have equal lengths.)

Anyway, this is one of those things that triggers more than you think it would, because it helps any time you test whether a string or struct is equal to itself.

@josharian
Copy link
Contributor Author

…and if despiteallobjections it is found not to matter, we should get rid of it on amd64 et al, to avoid drastic performance swings on some programs across architectures.

@cherrymui
Copy link
Member

@josharian feel free to send a CL. Thanks.

@mauri870
Copy link
Member

mauri870 commented Nov 27, 2023

I did some testing on this for the go1 benchmarks, on average a 0.5% performance improvement.

This would certainly benefit a lot strings and bytes.Equal if you are doing a lot of comparisons, struct and direct pointer comparisons I believe would not be affected, following the generated code https://go.godbolt.org/z/nncY76M6x

diff --git a/src/internal/bytealg/equal_arm64.s b/src/internal/bytealg/equal_arm64.s
index d3aabba587..13eb5fb1bf 100644
--- a/src/internal/bytealg/equal_arm64.s
+++ b/src/internal/bytealg/equal_arm64.s
@@ -7,6 +7,9 @@
 
 // memequal(a, b unsafe.Pointer, size uintptr) bool
 TEXT runtime·memequal<ABIInternal>(SB),NOSPLIT|NOFRAME,$0-25
+	// short path to handle equal pointers
+	CMP	R0, R1
+	BEQ	equal
 	// short path to handle 0-byte case
 	CBZ	R2, equal
 	B	memeqbody<>(SB)

@mauri870
Copy link
Member

FYI I've been looking around at the other architectures. From what I can tell, it seems like arm64 is the only one where this optimization hasn't been applied yet.

@randall77
Copy link
Contributor

The short circuit is already present in memequal_varlen on arm64, just not memequal.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 27, 2023
@josharian
Copy link
Contributor Author

@mauri870 I won’t be sending a CL. Feel free to send yours.

mauri870 added a commit to mauri870/go that referenced this issue Nov 28, 2023
If memequal is invoked with the same pointers as arguments it ends up
comparing the whole contents, instead of just comparing the pointers.

This effectively makes an operation that could be O(1) into O(n). All the
other architectures already have this optimization inplace. For
instance, arm64 also have it, in memequal_varlen.

Such optimization is very specific, one case that it will probably benefit is
programs that rely heavily on interning of strings.

```
goos: darwin
goarch: arm64
pkg: bytes
                 │      old.txt       │               new.txt                │
                 │       sec/op       │    sec/op     vs base                │
Equal/same/1-8           2.678n ± ∞ ¹   2.400n ± ∞ ¹   -10.38% (p=0.008 n=5)
Equal/same/6-8           3.267n ± ∞ ¹   2.431n ± ∞ ¹   -25.59% (p=0.008 n=5)
Equal/same/9-8           2.981n ± ∞ ¹   2.385n ± ∞ ¹   -19.99% (p=0.008 n=5)
Equal/same/15-8          2.974n ± ∞ ¹   2.390n ± ∞ ¹   -19.64% (p=0.008 n=5)
Equal/same/16-8          2.983n ± ∞ ¹   2.380n ± ∞ ¹   -20.21% (p=0.008 n=5)
Equal/same/20-8          3.567n ± ∞ ¹   2.384n ± ∞ ¹   -33.17% (p=0.008 n=5)
Equal/same/32-8          3.568n ± ∞ ¹   2.385n ± ∞ ¹   -33.16% (p=0.008 n=5)
Equal/same/4K-8         78.040n ± ∞ ¹   2.378n ± ∞ ¹   -96.95% (p=0.008 n=5)
Equal/same/4M-8      78713.000n ± ∞ ¹   2.385n ± ∞ ¹  -100.00% (p=0.008 n=5)
Equal/same/64M-8   1348095.000n ± ∞ ¹   2.381n ± ∞ ¹  -100.00% (p=0.008 n=5)
geomean                  43.52n         2.390n         -94.51%
¹ need >= 6 samples for confidence interval at level 0.95

                 │    old.txt    │                     new.txt                      │
                 │      B/s      │         B/s          vs base                     │
Equal/same/1-8     356.1Mi ± ∞ ¹         397.3Mi ± ∞ ¹        +11.57% (p=0.008 n=5)
Equal/same/6-8     1.711Gi ± ∞ ¹         2.298Gi ± ∞ ¹        +34.35% (p=0.008 n=5)
Equal/same/9-8     2.812Gi ± ∞ ¹         3.515Gi ± ∞ ¹        +24.99% (p=0.008 n=5)
Equal/same/15-8    4.698Gi ± ∞ ¹         5.844Gi ± ∞ ¹        +24.41% (p=0.008 n=5)
Equal/same/16-8    4.995Gi ± ∞ ¹         6.260Gi ± ∞ ¹        +25.34% (p=0.008 n=5)
Equal/same/20-8    5.222Gi ± ∞ ¹         7.814Gi ± ∞ ¹        +49.63% (p=0.008 n=5)
Equal/same/32-8    8.353Gi ± ∞ ¹        12.496Gi ± ∞ ¹        +49.59% (p=0.008 n=5)
Equal/same/4K-8    48.88Gi ± ∞ ¹       1603.96Gi ± ∞ ¹      +3181.17% (p=0.008 n=5)
Equal/same/4M-8    49.63Gi ± ∞ ¹    1637911.85Gi ± ∞ ¹   +3300381.91% (p=0.008 n=5)
Equal/same/64M-8   46.36Gi ± ∞ ¹   26253069.97Gi ± ∞ ¹  +56626517.99% (p=0.008 n=5)
geomean            6.737Gi               122.7Gi            +1721.01%
¹ need >= 6 samples for confidence interval at level 0.95
```

Fixes golang#64381
@mauri870 mauri870 added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 28, 2023
@gopherbot
Copy link

Change https://go.dev/cl/545416 mentions this issue: runtime: short path for equal pointers in arm64 memequal

@dmitshur dmitshur added this to the Go1.23 milestone Nov 28, 2023
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
If memequal is invoked with the same pointers as arguments it ends up
comparing the whole memory contents, instead of just comparing the pointers.

This effectively makes an operation that could be O(1) into O(n). All the
other architectures already have this optimization in place. For
instance, arm64 also have it, in memequal_varlen.

Such optimization is very specific, one case that it will probably benefit is
programs that rely heavily on interning of strings.

goos: darwin
goarch: arm64
pkg: bytes
                 │      old.txt       │               new.txt                │
                 │       sec/op       │    sec/op     vs base                │
Equal/same/1-8           2.678n ± ∞ ¹   2.400n ± ∞ ¹   -10.38% (p=0.008 n=5)
Equal/same/6-8           3.267n ± ∞ ¹   2.431n ± ∞ ¹   -25.59% (p=0.008 n=5)
Equal/same/9-8           2.981n ± ∞ ¹   2.385n ± ∞ ¹   -19.99% (p=0.008 n=5)
Equal/same/15-8          2.974n ± ∞ ¹   2.390n ± ∞ ¹   -19.64% (p=0.008 n=5)
Equal/same/16-8          2.983n ± ∞ ¹   2.380n ± ∞ ¹   -20.21% (p=0.008 n=5)
Equal/same/20-8          3.567n ± ∞ ¹   2.384n ± ∞ ¹   -33.17% (p=0.008 n=5)
Equal/same/32-8          3.568n ± ∞ ¹   2.385n ± ∞ ¹   -33.16% (p=0.008 n=5)
Equal/same/4K-8         78.040n ± ∞ ¹   2.378n ± ∞ ¹   -96.95% (p=0.008 n=5)
Equal/same/4M-8      78713.000n ± ∞ ¹   2.385n ± ∞ ¹  -100.00% (p=0.008 n=5)
Equal/same/64M-8   1348095.000n ± ∞ ¹   2.381n ± ∞ ¹  -100.00% (p=0.008 n=5)
geomean                  43.52n         2.390n         -94.51%
¹ need >= 6 samples for confidence interval at level 0.95

                 │    old.txt    │                     new.txt                      │
                 │      B/s      │         B/s          vs base                     │
Equal/same/1-8     356.1Mi ± ∞ ¹         397.3Mi ± ∞ ¹        +11.57% (p=0.008 n=5)
Equal/same/6-8     1.711Gi ± ∞ ¹         2.298Gi ± ∞ ¹        +34.35% (p=0.008 n=5)
Equal/same/9-8     2.812Gi ± ∞ ¹         3.515Gi ± ∞ ¹        +24.99% (p=0.008 n=5)
Equal/same/15-8    4.698Gi ± ∞ ¹         5.844Gi ± ∞ ¹        +24.41% (p=0.008 n=5)
Equal/same/16-8    4.995Gi ± ∞ ¹         6.260Gi ± ∞ ¹        +25.34% (p=0.008 n=5)
Equal/same/20-8    5.222Gi ± ∞ ¹         7.814Gi ± ∞ ¹        +49.63% (p=0.008 n=5)
Equal/same/32-8    8.353Gi ± ∞ ¹        12.496Gi ± ∞ ¹        +49.59% (p=0.008 n=5)
Equal/same/4K-8    48.88Gi ± ∞ ¹       1603.96Gi ± ∞ ¹      +3181.17% (p=0.008 n=5)
Equal/same/4M-8    49.63Gi ± ∞ ¹    1637911.85Gi ± ∞ ¹   +3300381.91% (p=0.008 n=5)
Equal/same/64M-8   46.36Gi ± ∞ ¹   26253069.97Gi ± ∞ ¹  +56626517.99% (p=0.008 n=5)
geomean            6.737Gi               122.7Gi            +1721.01%
¹ need >= 6 samples for confidence interval at level 0.95

Fixes golang#64381

Change-Id: I7d423930a688edd88c4ba60d45e097296d9be852
GitHub-Last-Rev: ae8189f
GitHub-Pull-Request: golang#64419
Reviewed-on: https://go-review.googlesource.com/c/go/+/545416
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants