-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
A common recurring operation on strings is to remove or advance past the common prefix of two strings. Here are some examples from the main Go repository and the golang.org/x/tools repo:
https://github.com/golang/tools/blob/master/go/internal/gcimporter/bexport.go#L264
https://github.com/golang/tools/blob/master/internal/lsp/source/comment.go#L229
https://github.com/golang/go/blob/master/src/cmd/compile/internal/base/timings.go#L122
https://github.com/golang/go/blob/master/src/cmd/compile/internal/ir/dump.go#L129
https://github.com/golang/go/blob/master/src/strings/replace.go#L170
https://github.com/golang/go/blob/master/src/net/addrselect.go#L199
https://github.com/golang/go/blob/master/src/go/printer/printer.go#L485
https://github.com/golang/go/blob/master/src/go/doc/comment/parse.go#L397
This function is easy enough to implement by hand:
commonLen := min(len(a), len(b))
for i = 0; i < commonLen && a[i] == b[i]; i++ {}but it's possible to make it significantly faster than the naive version. This CL contains a version that runs 6x faster yet is still safe and portable. The code describes ways that it might be made much faster still at the cost of unsafe or assembly:
https://go-review.googlesource.com/c/go/+/408116
While benchmarking and optimizing a diff algorithm this week, I noticed that one of the main profiling hotspots was, in essence, a CommonPrefixLen operation:
https://go-review.googlesource.com/c/tools/+/396855/11/internal/lsp/diff/lcs/old.go#219
(It's a little hard to see in this form, but in a private branch I refactored the code so that the dynamic dispatch of eq(x, y byte) was moved outside the loop. In essence the primitive operation over which this algorithm is abstracted is CommonPrefixLen.)
I propose that we add func CommonPrefixLen(x, y string) int to the standard strings package. An analogous function should be added to the bytes package too.
We could also add a CommonPrefix(x, y string) string function, which actually strips off the prefix rather than merely returning its length, but I don't think this is necessary as it is trivial to implement. Also, whereas CommonPrefixLen makes sense for bytes, it's less clear what bytes.CommonPrefix should do, since its result must alias one of the arguments. The first one presumably? Better to avoid the question.