cmd/compile: opt: generate conditional comparisons for || and && conditions #71268
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
Implementation
Issues describing a semantics-preserving change to the Go implementation.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Performance
Milestone
Go version
go1.23.4 linux/arm64
Output of
go env
in your module/workspace:GOARCH='arm64'
What did you do?
While investigating the performance of json package, I found
unquoteBytes()
spends extra time with theif c == '\\' || c == '"' || c < ' '
condition (src/encoding/json/decode.go#L1209), which could be improved by condidional comparisons (such as ARM64CCMP
instruction).What did you see happen?
Currently, Go compiler can generate conditonal assignments (e.g.,
CSET
,CSEL
,CSINC
), but it can not generate conditional comparisons (i.e.,CCMP
), which let you combine the results of multiple comparisons so you can perform a single test at the end. (see more details: The AArch64 processor (aka arm64), part 16: Conditional execution - The Old New Thing)E.g., following are Go tests (https://godbolt.org/z/q6YMT7Msa) and corresponding C tests (https://godbolt.org/z/4qshde78o, compiled by GCC -O1). Go compiler generates
CMP;BEQ
insteald ofCPM;CCMP
fortest2()
:The Go tests:
The C tests:
Measure performance
Conditional comparisons should generally improve the performance of conjunction/disjunction of conditions by
&&
/||
operators on ARM64 machines.Following cases are simplified from
unquoteBytes
. I tested on ARM64 Neoverse-N1 (AmpereComputing Altra and AWS Graviton2 is similar), the C case (GCC -O3
generatesCCMP
) is much faster than the Go case (go1.23.4 generatesCMP
): 5.69s vs. 9.23s.The Go Test:
The C Test
As C may have less overhead than Go in function call and
main
, let's just compare the linux-perf samples of the loop:Go results:
C results:
The assembly instructions are much similar except the CMP and CCMP. If we just count samples related to the loop, the C case (
CCMP
) vs the Go case (CCMP
): 21976 vs 35035 (+47%).Since the input data may affect performance, I also tested data like
"\hello, world"
, so the loop could break at the 1st comparison against\
, then CCMP is still faster than CMP (1.00s vs. 1.29s).What did you expect to see?
Could we enhance Go compiler to generate
CCMP
?BTW. I searched and didn't find any issue about conditional instructions (there is just an old issue #6011 about failing to generate conditional move).
The text was updated successfully, but these errors were encountered: