Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode: SimpleFold panics if r is negative #16690

Closed
hirochachacha opened this issue Aug 14, 2016 · 16 comments
Closed

unicode: SimpleFold panics if r is negative #16690

hirochachacha opened this issue Aug 14, 2016 · 16 comments

Comments

@hirochachacha
Copy link
Contributor

@hirochachacha hirochachacha commented Aug 14, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go version devel +2d33f62 Sun Aug 14 10:06:51 2016 +0900 darwin/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN=""
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH="/Users/hiro/.go"
    GORACE=""
    GOROOT="/Users/hiro/work/go"
    GOTOOLDIR="/Users/hiro/work/go/pkg/tool/darwin_amd64"
    CC="clang"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/wq/dwn8hs0x7njbzty9f68y61700000gn/T/go-build329740802=/tmp/go-build -gno-record-gcc-switches -fno-common"
    CXX="clang++"
    CGO_ENABLED="1"
  3. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.
    https://play.golang.org/p/Tj8rqp7o13 (playground doesn't panic)
  4. What did you expect to see?
    no panics.
  5. What did you see instead?
    panics.

This error doesn't exist before 1.7 (e607abb)

@gopherbot
Copy link

@gopherbot gopherbot commented Aug 14, 2016

CL https://golang.org/cl/27033 mentions this issue.

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Aug 14, 2016

I think for your repro you meant to include something like this https://play.golang.org/p/EqumNpoEJT or inlined

package main

import (
    "fmt"
    "unicode"
)

func main() {
    r := unicode.SimpleFold(-1)
    fmt.Printf("r: %v\n", r)
}

instead of unicode.SimpleFold('a')

I can confirm that it panics on tip 31ad583. This is a regression from Go1.6. /cc @bradfitz, @broady, @quentinmit on how this is going to affect the Go1.7 release.

@broady broady added this to the Go1.7 milestone Aug 14, 2016
@broady
Copy link
Member

@broady broady commented Aug 14, 2016

What's the use case for passing negative values to this function?

Is there real-world code that does this?

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Aug 15, 2016

What's the use case for passing negative values to this function?

I don't know the answer for this, perhaps @hirochachacha could chime in on their usage that might have led them to catching this bug.
Maybe we could also ask @campoy for help querying the Github corpus in BigQuery as was done here #14595 (comment)?

However:
Searching in the stdlib for users of SimpleFold, there are 29 uses of it across 11 unique files

$ git grep -E 'unicode\.SimpleFold\(' * | cut -d":" -f1 | sort | uniq | wc -l
      11
$ git grep -nE 'unicode\.SimpleFold\(' *
src/bytes/bytes.go:712:     r := unicode.SimpleFold(sr)
src/bytes/bytes.go:714:         r = unicode.SimpleFold(r)
src/cmd/doc/pkg.go:640:     r1 := unicode.SimpleFold(r)
src/cmd/go/main.go:763:         r = unicode.SimpleFold(r0)
src/regexp/onepass.go:387:              for r1 := unicode.SimpleFold(r0); r1 != r0; r1 = unicode.SimpleFold(r1) {
src/regexp/onepass.go:411:              for r1 := unicode.SimpleFold(r0); r1 != r0; r1 = unicode.SimpleFold(r1) {
src/regexp/syntax/compile.go:271:   if len(r) != 1 || unicode.SimpleFold(r[0]) == r[0] {
src/regexp/syntax/parse.go:119:     unicode.SimpleFold(re.Rune[0]) == re.Rune[2] &&
src/regexp/syntax/parse.go:120:     unicode.SimpleFold(re.Rune[2]) == re.Rune[0] ||
src/regexp/syntax/parse.go:123:         unicode.SimpleFold(re.Rune[0]) == re.Rune[1] &&
src/regexp/syntax/parse.go:124:         unicode.SimpleFold(re.Rune[1]) == re.Rune[0] {
src/regexp/syntax/parse.go:199: for r = unicode.SimpleFold(r); r != r0; r = unicode.SimpleFold(r) {
src/regexp/syntax/parse.go:1724:        f := unicode.SimpleFold(c)
src/regexp/syntax/parse.go:1727:            f = unicode.SimpleFold(f)
src/regexp/syntax/parse_test.go:335:                    if unicode.SimpleFold(r) != r {
src/regexp/syntax/parse_test.go:409:    c := unicode.SimpleFold(r)
src/regexp/syntax/parse_test.go:414:        c = unicode.SimpleFold(c)
src/regexp/syntax/parse_test.go:422:        if unicode.SimpleFold(i) == i {
src/regexp/syntax/prog.go:213:          for r1 := unicode.SimpleFold(r0); r1 != r0; r1 = unicode.SimpleFold(r1) {
src/strings/strings.go:712:     r := unicode.SimpleFold(sr)
src/strings/strings.go:714:         r = unicode.SimpleFold(r)
src/unicode/example_test.go:117:    fmt.Printf("%#U\n", unicode.SimpleFold('A'))      // 'a'
src/unicode/example_test.go:118:    fmt.Printf("%#U\n", unicode.SimpleFold('a'))      // 'A'
src/unicode/example_test.go:119:    fmt.Printf("%#U\n", unicode.SimpleFold('K'))      // 'k'
src/unicode/example_test.go:120:    fmt.Printf("%#U\n", unicode.SimpleFold('k'))      // '\u212A' (Kelvin symbol, K)
src/unicode/example_test.go:121:    fmt.Printf("%#U\n", unicode.SimpleFold('\u212A')) // 'K'
src/unicode/example_test.go:122:    fmt.Printf("%#U\n", unicode.SimpleFold('1'))      // '1'
src/unicode/maketables.go:1320:         if g := unicode.SimpleFold(i); g != f {
src/unicode/maketables.go:1321:             fmt.Fprintf(os.Stderr, "unicode.SimpleFold(%#U) = %#U, want %#U\n", i, g, f)
@broady broady modified the milestones: Go1.7Maybe, Go1.7 Aug 15, 2016
@broady
Copy link
Member

@broady broady commented Aug 15, 2016

SELECT repo_name, content
FROM (
  SELECT id, content
  FROM [campoy-github:go_files.contents]
  WHERE REGEXP_MATCH(content, r'unicode.SimpleFold\([^)]*-')
) as contents JOIN [campoy-github:go_files.files] as files
ON contents.id = files.id

Zero results. It's much more difficult (via BigQuery) to determine if a call to SimpleFold refers to a variable pointing to a negative value.

@odeke-em
Copy link
Member

@odeke-em odeke-em commented Aug 15, 2016

Thank you for the search and breakdown @broady.

I've commented on @hirochachacha's CL https://go-review.googlesource.com/#/c/27033. The bug was that a negative index was being passed in to the precomputed Folds table(thereby causing the index out of bounds panic) look up due to new condition

if int(r) < len(asciiFold) {
   return rune(asciiFold[r])
}

that should be us first checking if negative

if r >= 0 && int(r) < len(asciiFold) {
   return rune(asciiFold[r])
}

and this way only the precomputed values will benefit from this cache hit, other values(negative values included) will take the same code path as they were before that change/preGo1.7.

If we deem that the CL needs to go out ASAP since a new release is coming up, perhaps anyone of us can spin up/take over it and give the credit to @hirochachacha for the find/patch?

@minux
Copy link
Member

@minux minux commented Aug 15, 2016

I think panic is ok. There shouldn't be any guarantee when SimpleFold is
passed an invalid rune. Actually, according to its docs, it cannot return
anything for negative input.

  1. There is no equivalent code points, so the first clause doesn't apply.
  2. The smallest rune >= 0 clause also doesn't apply because r itself < 0.

Therefore, the only logical result is to panic.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 15, 2016

I think it's OK to either panic or to return the argument for 1.6 compatibility. I don't think this needs to be changed for 1.7.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.8, Go1.7Maybe Aug 15, 2016
@hirochachacha
Copy link
Contributor Author

@hirochachacha hirochachacha commented Aug 15, 2016

Actually, I don't have real usages, sorry. Perhaps as a sentinel value?
I've found this because I saw

https://github.com/golang/go/blob/31ad583ab24263f9dbcb5cbcce849eed64e74040/src/unicode/graphic.go#L37-#L38

So I assumed unicode package API accepts negative rune.

@josharian
Copy link
Contributor

@josharian josharian commented Aug 15, 2016

Aside: I suspect that that code can be made much more obvious in 1.8. I have an as yet unmailed compiler CL to do the unsigned comparison trick more generally, and the SSA backend should handle the bounds checks elimination.

@hirochachacha
Copy link
Contributor Author

@hirochachacha hirochachacha commented Aug 15, 2016

@josharian Maybe you mean #16697?

@hirochachacha
Copy link
Contributor Author

@hirochachacha hirochachacha commented Aug 15, 2016

@minux Even if panic is OK, I think it should be done by intention, not by accident.

For example:

if r < 0 {
  panic("negative rune is disallowed")
} 

So, I can understand that panic is intentional.
And how do you think about other functions like unicode.IsGraphic?

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Aug 15, 2016

@hirochachacha, yes, an explicit panic would be better if we decide to make these panic.

It's just not worth delaying Go 1.7.

@hirochachacha
Copy link
Contributor Author

@hirochachacha hirochachacha commented Aug 15, 2016

It's just not worth delaying Go 1.7.

I have no objection.

@quentinmit quentinmit added the NeedsFix label Oct 10, 2016
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 12, 2016

CL https://golang.org/cl/30892 mentions this issue.

@gopherbot gopherbot closed this in 94f49fd Oct 12, 2016
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 12, 2016

CL https://golang.org/cl/30935 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 12, 2016
Functions like ToLower and ToUpper return the invalid rune back,
so we might as well do the same here.

I changed my mind about panicking when I tried to document the behavior.

Fixes #16690 (again).

Change-Id: If1c68bfcd66daea160fd19948e7672b0e1add106
Reviewed-on: https://go-review.googlesource.com/30935
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
@golang golang locked and limited conversation to collaborators Oct 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
9 participants
You can’t perform that action at this time.