Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/collate: Key and KeyFromString silently ignore the collate.Force option #68379

Open
danderson opened this issue Jul 11, 2024 · 3 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@danderson
Copy link
Contributor

Go version

go version go1.22.5 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/dave/.cache/go-build'
GOENV='/home/dave/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/dave/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/dave/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/nix/store/3v17dij8rvg7q99009swxg52995r7s22-go-1.22.5/share/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/nix/store/3v17dij8rvg7q99009swxg52995r7s22-go-1.22.5/share/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.5'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2184709228=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Attempted to sort strings using x/text/collate's collate.Force option, which implements deterministic comparison per Unicode TR10 (https://www.unicode.org/reports/tr10/#Forcing_Deterministic_Comparisons). Specifically, it forces strings to only compare as equal if they are bit-identical. Strings that are equivalent but not bit-equal are forced into a total order by using bytewise comparison of the raw byte sequence as a tie-breaker.

What did you see happen?

Key and KeyFromString ignore collate.Force, and return 0 (a == b) for strings that are equivalent but not bit-identical. Demo: https://go.dev/play/p/s_fwVG8pBD5 compares "Québécois" in normalization forms C and D (equivalent but different character sequences), and gets:

Byte compare : 1
CompareString: 1
KeyFromString: 0

What did you expect to see?

All public APIs of Collator that are specified to compare things should be obeying the provided collation settings, or at least failing loud rather than silently producing a different collation order.

@gopherbot gopherbot added this to the Unreleased milestone Jul 11, 2024
@danderson
Copy link
Contributor Author

I looked through all the above bugs prior to filing this one. None are relevant to this issue.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 11, 2024
@cagedmantis
Copy link
Contributor

cc @mpvl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants