Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode: SimpleFold panics if r is negative #16690

Closed
hirochachacha opened this issue Aug 14, 2016 · 16 comments
Closed

unicode: SimpleFold panics if r is negative #16690

hirochachacha opened this issue Aug 14, 2016 · 16 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@hirochachacha
Copy link
Contributor

hirochachacha commented Aug 14, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go version devel +2d33f62 Sun Aug 14 10:06:51 2016 +0900 darwin/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN=""
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH="/Users/hiro/.go"
    GORACE=""
    GOROOT="/Users/hiro/work/go"
    GOTOOLDIR="/Users/hiro/work/go/pkg/tool/darwin_amd64"
    CC="clang"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/wq/dwn8hs0x7njbzty9f68y61700000gn/T/go-build329740802=/tmp/go-build -gno-record-gcc-switches -fno-common"
    CXX="clang++"
    CGO_ENABLED="1"
  3. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.
    https://play.golang.org/p/Tj8rqp7o13 (playground doesn't panic)
  4. What did you expect to see?
    no panics.
  5. What did you see instead?
    panics.

This error doesn't exist before 1.7 (e607abb)

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/27033 mentions this issue.

@odeke-em
Copy link
Member

I think for your repro you meant to include something like this https://play.golang.org/p/EqumNpoEJT or inlined

package main

import (
    "fmt"
    "unicode"
)

func main() {
    r := unicode.SimpleFold(-1)
    fmt.Printf("r: %v\n", r)
}

instead of unicode.SimpleFold('a')

I can confirm that it panics on tip 31ad583. This is a regression from Go1.6. /cc @bradfitz, @broady, @quentinmit on how this is going to affect the Go1.7 release.

@broady broady added this to the Go1.7 milestone Aug 14, 2016
@broady
Copy link
Contributor

broady commented Aug 14, 2016

What's the use case for passing negative values to this function?

Is there real-world code that does this?

@odeke-em
Copy link
Member

What's the use case for passing negative values to this function?

I don't know the answer for this, perhaps @hirochachacha could chime in on their usage that might have led them to catching this bug.
Maybe we could also ask @campoy for help querying the Github corpus in BigQuery as was done here #14595 (comment)?

However:
Searching in the stdlib for users of SimpleFold, there are 29 uses of it across 11 unique files

$ git grep -E 'unicode\.SimpleFold\(' * | cut -d":" -f1 | sort | uniq | wc -l
      11
$ git grep -nE 'unicode\.SimpleFold\(' *
src/bytes/bytes.go:712:     r := unicode.SimpleFold(sr)
src/bytes/bytes.go:714:         r = unicode.SimpleFold(r)
src/cmd/doc/pkg.go:640:     r1 := unicode.SimpleFold(r)
src/cmd/go/main.go:763:         r = unicode.SimpleFold(r0)
src/regexp/onepass.go:387:              for r1 := unicode.SimpleFold(r0); r1 != r0; r1 = unicode.SimpleFold(r1) {
src/regexp/onepass.go:411:              for r1 := unicode.SimpleFold(r0); r1 != r0; r1 = unicode.SimpleFold(r1) {
src/regexp/syntax/compile.go:271:   if len(r) != 1 || unicode.SimpleFold(r[0]) == r[0] {
src/regexp/syntax/parse.go:119:     unicode.SimpleFold(re.Rune[0]) == re.Rune[2] &&
src/regexp/syntax/parse.go:120:     unicode.SimpleFold(re.Rune[2]) == re.Rune[0] ||
src/regexp/syntax/parse.go:123:         unicode.SimpleFold(re.Rune[0]) == re.Rune[1] &&
src/regexp/syntax/parse.go:124:         unicode.SimpleFold(re.Rune[1]) == re.Rune[0] {
src/regexp/syntax/parse.go:199: for r = unicode.SimpleFold(r); r != r0; r = unicode.SimpleFold(r) {
src/regexp/syntax/parse.go:1724:        f := unicode.SimpleFold(c)
src/regexp/syntax/parse.go:1727:            f = unicode.SimpleFold(f)
src/regexp/syntax/parse_test.go:335:                    if unicode.SimpleFold(r) != r {
src/regexp/syntax/parse_test.go:409:    c := unicode.SimpleFold(r)
src/regexp/syntax/parse_test.go:414:        c = unicode.SimpleFold(c)
src/regexp/syntax/parse_test.go:422:        if unicode.SimpleFold(i) == i {
src/regexp/syntax/prog.go:213:          for r1 := unicode.SimpleFold(r0); r1 != r0; r1 = unicode.SimpleFold(r1) {
src/strings/strings.go:712:     r := unicode.SimpleFold(sr)
src/strings/strings.go:714:         r = unicode.SimpleFold(r)
src/unicode/example_test.go:117:    fmt.Printf("%#U\n", unicode.SimpleFold('A'))      // 'a'
src/unicode/example_test.go:118:    fmt.Printf("%#U\n", unicode.SimpleFold('a'))      // 'A'
src/unicode/example_test.go:119:    fmt.Printf("%#U\n", unicode.SimpleFold('K'))      // 'k'
src/unicode/example_test.go:120:    fmt.Printf("%#U\n", unicode.SimpleFold('k'))      // '\u212A' (Kelvin symbol, K)
src/unicode/example_test.go:121:    fmt.Printf("%#U\n", unicode.SimpleFold('\u212A')) // 'K'
src/unicode/example_test.go:122:    fmt.Printf("%#U\n", unicode.SimpleFold('1'))      // '1'
src/unicode/maketables.go:1320:         if g := unicode.SimpleFold(i); g != f {
src/unicode/maketables.go:1321:             fmt.Fprintf(os.Stderr, "unicode.SimpleFold(%#U) = %#U, want %#U\n", i, g, f)

@broady broady modified the milestones: Go1.7Maybe, Go1.7 Aug 15, 2016
@broady
Copy link
Contributor

broady commented Aug 15, 2016

SELECT repo_name, content
FROM (
  SELECT id, content
  FROM [campoy-github:go_files.contents]
  WHERE REGEXP_MATCH(content, r'unicode.SimpleFold\([^)]*-')
) as contents JOIN [campoy-github:go_files.files] as files
ON contents.id = files.id

Zero results. It's much more difficult (via BigQuery) to determine if a call to SimpleFold refers to a variable pointing to a negative value.

@odeke-em
Copy link
Member

Thank you for the search and breakdown @broady.

I've commented on @hirochachacha's CL https://go-review.googlesource.com/#/c/27033. The bug was that a negative index was being passed in to the precomputed Folds table(thereby causing the index out of bounds panic) look up due to new condition

if int(r) < len(asciiFold) {
   return rune(asciiFold[r])
}

that should be us first checking if negative

if r >= 0 && int(r) < len(asciiFold) {
   return rune(asciiFold[r])
}

and this way only the precomputed values will benefit from this cache hit, other values(negative values included) will take the same code path as they were before that change/preGo1.7.

If we deem that the CL needs to go out ASAP since a new release is coming up, perhaps anyone of us can spin up/take over it and give the credit to @hirochachacha for the find/patch?

@minux
Copy link
Member

minux commented Aug 15, 2016

I think panic is ok. There shouldn't be any guarantee when SimpleFold is
passed an invalid rune. Actually, according to its docs, it cannot return
anything for negative input.

  1. There is no equivalent code points, so the first clause doesn't apply.
  2. The smallest rune >= 0 clause also doesn't apply because r itself < 0.

Therefore, the only logical result is to panic.

@ianlancetaylor
Copy link
Contributor

I think it's OK to either panic or to return the argument for 1.6 compatibility. I don't think this needs to be changed for 1.7.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.8, Go1.7Maybe Aug 15, 2016
@hirochachacha
Copy link
Contributor Author

Actually, I don't have real usages, sorry. Perhaps as a sentinel value?
I've found this because I saw

https://github.com/golang/go/blob/31ad583ab24263f9dbcb5cbcce849eed64e74040/src/unicode/graphic.go#L37-#L38

So I assumed unicode package API accepts negative rune.

@josharian
Copy link
Contributor

Aside: I suspect that that code can be made much more obvious in 1.8. I have an as yet unmailed compiler CL to do the unsigned comparison trick more generally, and the SSA backend should handle the bounds checks elimination.

@hirochachacha
Copy link
Contributor Author

@josharian Maybe you mean #16697?

@hirochachacha
Copy link
Contributor Author

@minux Even if panic is OK, I think it should be done by intention, not by accident.

For example:

if r < 0 {
  panic("negative rune is disallowed")
} 

So, I can understand that panic is intentional.
And how do you think about other functions like unicode.IsGraphic?

@bradfitz
Copy link
Contributor

@hirochachacha, yes, an explicit panic would be better if we decide to make these panic.

It's just not worth delaying Go 1.7.

@hirochachacha
Copy link
Contributor Author

It's just not worth delaying Go 1.7.

I have no objection.

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 10, 2016
@gopherbot
Copy link
Contributor

CL https://golang.org/cl/30892 mentions this issue.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/30935 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 12, 2016
Functions like ToLower and ToUpper return the invalid rune back,
so we might as well do the same here.

I changed my mind about panicking when I tried to document the behavior.

Fixes #16690 (again).

Change-Id: If1c68bfcd66daea160fd19948e7672b0e1add106
Reviewed-on: https://go-review.googlesource.com/30935
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
@golang golang locked and limited conversation to collaborators Oct 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

9 participants