cmd/compile/internal/gc/alg.go func geneq contains this code:
// Find maximal length run of memory-only fields.
size, next := memrun(t, i)
if s := fields[i:next]; len(s) <= 2 {
// Two or fewer fields: use plain field equality.
for _, f := range s {
and(eqfield(np, nq, f.Sym))
}
} else {
// More than two fields: use memequal.
and(eqmem(np, nq, f.Sym, size))
}
The idea here is to inline small, simple memory comparisons instead of calling runtime.memequal.
However, using the number of fields isn't a great heuristic. If one of those fields is (say) [64]uint64, inlining it turns into a runtime call anyway. If one of those fields is (say) struct { x, y, z, w uint64 }, we end up inlining four comparisons instead of the intended one. The heuristic fails in the other direction too: If there are eight fields, but each has type byte, we'd be better off doing a single uint64 comparison than calling runtime.memequal.
A better plan would be to calculate how much work we actually have to do and then do it optimally, either with the minimal number of inlined comparisons or with a single runtime call.
Roughly, if we need to compare n bytes, figure out how many comparisons are required to accomplish that, and then use that as a threshold. (We already do something like that in walk.go's walkCompareString when comparing a non-constant string to a constant string.) One complication is alignment. On architectures with alignment requirements, comparing 8 bytes could require (say) a 2 byte comparison + a 4 byte comparison + a 2 byte comparison. The other thing that requires care is pointers: Given struct { f float32; p *int; u uint32 }, we don't want to pull half of p into one register and the other half of p into another register.
Ideally, we would also use this work calculation in our decision about whether to inline equality comparisons in the first place or call a generated routine (walk.go func walkcompare); we currently count fields there, too.
I believe that this would be challenging but possible for someone with relatively little compiler experience, so marking as help wanted (and spelling things out in more detail than usual).
cmd/compile/internal/gc/alg.go func geneq contains this code:
The idea here is to inline small, simple memory comparisons instead of calling runtime.memequal.
However, using the number of fields isn't a great heuristic. If one of those fields is (say)
[64]uint64, inlining it turns into a runtime call anyway. If one of those fields is (say)struct { x, y, z, w uint64 }, we end up inlining four comparisons instead of the intended one. The heuristic fails in the other direction too: If there are eight fields, but each has typebyte, we'd be better off doing a single uint64 comparison than calling runtime.memequal.A better plan would be to calculate how much work we actually have to do and then do it optimally, either with the minimal number of inlined comparisons or with a single runtime call.
Roughly, if we need to compare n bytes, figure out how many comparisons are required to accomplish that, and then use that as a threshold. (We already do something like that in walk.go's walkCompareString when comparing a non-constant string to a constant string.) One complication is alignment. On architectures with alignment requirements, comparing 8 bytes could require (say) a 2 byte comparison + a 4 byte comparison + a 2 byte comparison. The other thing that requires care is pointers: Given
struct { f float32; p *int; u uint32 }, we don't want to pull half ofpinto one register and the other half ofpinto another register.Ideally, we would also use this work calculation in our decision about whether to inline equality comparisons in the first place or call a generated routine (walk.go func walkcompare); we currently count fields there, too.
I believe that this would be challenging but possible for someone with relatively little compiler experience, so marking as help wanted (and spelling things out in more detail than usual).