Skip to content

proposal: strings: add func CommonPrefixLen(x, y string) int #53120

@adonovan

Description

@adonovan

A common recurring operation on strings is to remove or advance past the common prefix of two strings. Here are some examples from the main Go repository and the golang.org/x/tools repo:

https://github.com/golang/tools/blob/master/go/internal/gcimporter/bexport.go#L264
https://github.com/golang/tools/blob/master/internal/lsp/source/comment.go#L229
https://github.com/golang/go/blob/master/src/cmd/compile/internal/base/timings.go#L122
https://github.com/golang/go/blob/master/src/cmd/compile/internal/ir/dump.go#L129
https://github.com/golang/go/blob/master/src/strings/replace.go#L170
https://github.com/golang/go/blob/master/src/net/addrselect.go#L199
https://github.com/golang/go/blob/master/src/go/printer/printer.go#L485
https://github.com/golang/go/blob/master/src/go/doc/comment/parse.go#L397

This function is easy enough to implement by hand:

commonLen :=  min(len(a), len(b))
for i = 0; i < commonLen && a[i] == b[i]; i++ {}

but it's possible to make it significantly faster than the naive version. This CL contains a version that runs 6x faster yet is still safe and portable. The code describes ways that it might be made much faster still at the cost of unsafe or assembly:
https://go-review.googlesource.com/c/go/+/408116

While benchmarking and optimizing a diff algorithm this week, I noticed that one of the main profiling hotspots was, in essence, a CommonPrefixLen operation:
https://go-review.googlesource.com/c/tools/+/396855/11/internal/lsp/diff/lcs/old.go#219
(It's a little hard to see in this form, but in a private branch I refactored the code so that the dynamic dispatch of eq(x, y byte) was moved outside the loop. In essence the primitive operation over which this algorithm is abstracted is CommonPrefixLen.)

I propose that we add func CommonPrefixLen(x, y string) int to the standard strings package. An analogous function should be added to the bytes package too.

We could also add a CommonPrefix(x, y string) string function, which actually strips off the prefix rather than merely returning its length, but I don't think this is necessary as it is trivial to implement. Also, whereas CommonPrefixLen makes sense for bytes, it's less clear what bytes.CommonPrefix should do, since its result must alias one of the arguments. The first one presumably? Better to avoid the question.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions