Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/language: index out of range #41617

Open
hongrich opened this issue Sep 24, 2020 · 10 comments
Open

x/text/language: index out of range #41617

hongrich opened this issue Sep 24, 2020 · 10 comments

Comments

@hongrich
Copy link

@hongrich hongrich commented Sep 24, 2020

This following program on the latest stable release, go1.14.8 and golang.org/x/text@v0.3.3:

https://play.golang.org/p/0MCiLN3ojeZ

package main

import (
	"golang.org/x/text/language"
)

func main() {
	language.ParseExtension("t_pt_MLt")
}

crashes as follows:

panic: runtime error: index out of range [4] with length 4

goroutine 1 [running]:
golang.org/x/text/internal/language.(*scanner).toLower(0xc00009eeb8, 0x0, 0x5)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/internal/language/parse.go:116 +0x65
golang.org/x/text/internal/language.parseExtension(0xc00009eeb8, 0x0)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/internal/language/parse.go:553 +0xea5
golang.org/x/text/internal/language.ParseExtension(0x4ce884, 0x8, 0x0, 0x0, 0x4b3560, 0xc000094058)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/internal/language/language.go:260 +0xbe
golang.org/x/text/language.ParseExtension(...)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/language/language.go:370
main.main()
	/tmp/sandbox787883426/prog.go:8 +0x36

This is found with go-fuzz

@gopherbot gopherbot added this to the Unreleased milestone Sep 24, 2020
@bcmills
Copy link
Member

@bcmills bcmills commented Sep 25, 2020

Is this also reproducible with the current head commit (v0.3.4-0.20200826142016-a8b467125457)?

@hongrich
Copy link
Author

@hongrich hongrich commented Sep 25, 2020

Yeah. Here's a playground using the current head commit:
https://play.golang.org/p/R7-w1-xWBa1

panic: runtime error: index out of range [4] with length 4

goroutine 1 [running]:
golang.org/x/text/internal/language.(*scanner).toLower(0xc000066eb8, 0x0, 0x5)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/internal/language/parse.go:116 +0x65
golang.org/x/text/internal/language.parseExtension(0xc000066eb8, 0x0)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/internal/language/parse.go:553 +0xea5
golang.org/x/text/internal/language.ParseExtension(0x4ce884, 0x8, 0x0, 0x0, 0x4b3560, 0xc00005e058)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/internal/language/language.go:260 +0xbe
golang.org/x/text/language.ParseExtension(...)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/language/language.go:370
main.main()
	/tmp/sandbox940662716/prog.go:8 +0x36
@darkLord19
Copy link
Contributor

@darkLord19 darkLord19 commented Sep 25, 2020

"t_pt_MLt" ends up being t-mt in scan.b for some reason and so s.b[4] is throwing index out of bound. I tried testing for other strings with similar pattern as given string but they were working fine.

@justplesh
Copy link
Contributor

@justplesh justplesh commented Sep 28, 2020

I'll work on this.

@andresag01
Copy link

@andresag01 andresag01 commented Oct 7, 2020

I had some spare time so decided to take pick up a Go issue for the first time. But I am not terribly familiar with the Go source

I agree with @darkLord19. Internally, the call to ParseExtension("t_pt_MLt") ends up in call to parseTag(scan) with the following arguments: scan.b:"t-pt-mlt" scan.token:"pt" scan.start:2 scan.end:4. This means that the call getLangID("pt") from parseTag() returns (t.LangID:"mt", e:dont_care).

Later on, parseTag(scan) calls scan.scan() , so scan ends up with the following internal state: scan.b:"t-pt-mlt" scan.token:"mlt" scan.start:5 scan.end:8. Then this block of code is executed and scan.b:"t-mt-" and the other call to scan.scan() modifies that buffer to scan.b:"t-mb" and returns end = 5 which happens to be out-of-bounds. Therefore, this call ends up as scan.toLower(start:0, end:5) which is clearly an out-of-bounds range.

Again, I am not very familiar with this code (so please let me know if I am wrong), but I think there are a few issues here:

  • The return value of the scan() function here is a bit confusing. It states that the function returns the end position of the last token consumed, but what is end when there is no more data in the buffer? what is end when there is an empty tag (e.g. the last character is -)?
    • I suggest to document these corner cases and possibly tidy up the return value
  • It is not clear to me why the scan.toLower() call here that triggers the failure is needed at all. Note that the call to parseTag() already calls scan.toLower() here before returning, so the second call (the one that fails) seems redundant to me (I might be wrong though).
  • There probably are a few missing test cases in this list. In particular, it would be good to add things that trigger corner cases for scan.scan() when the input buffer is modified.

Anyways, please let me know if this is helpful and if you would like me to submit a patch to make the changes. I am aware that someone else already said that they are working on it...

@justplesh
Copy link
Contributor

@justplesh justplesh commented Oct 7, 2020

@andresag01 I'm currently a little bit busy, so you are absolutely welcome to contribute a patch for this issue.

@hongrich
Copy link
Author

@hongrich hongrich commented Oct 7, 2020

I have a patch that I can submit for this if no one else has submitted one yet. As @andresag01 mentioned, the relevant part is inside parseTag where it will try to normalize <lang>-<extlang> into <extlang>. In this case, it tries to normalize pt-mlt into mt becuase getLangID("mlt").String() return mt. Then it tries to adjust the positions after doing this replacement, but it currently assumes that getLangID(token).String() returns a 3-character string which is not always the case. I believe we can just change the hardcoded numbers in this part to be based on the length of getLangID(token).String().

https://github.com/golang/text/blob/a8b4671254579a87fadf9f7fa577dc7368e9d009/internal/language/parse.go#L300-L308

@andresag01
Copy link

@andresag01 andresag01 commented Oct 7, 2020

@hongrich: I do not have a patch and I won't work on it if you plan to submit yours. If it helps, I am happy to review your change once you've raised the PR

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 7, 2020

Change https://golang.org/cl/260177 mentions this issue: internal/language: fix canonicalization of extlang

@mpvl
Copy link
Member

@mpvl mpvl commented Nov 25, 2020

The problem is that it should not do the extlang normalization within a -t extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants