Check for invalid UTF-8 encoded strings in cutsets #252

dsnet · 2016-10-26T08:28:30Z

The (strings|bytes).(IndexAny|LastIndexAny|ContainsAny|Trim|TrimLeft|TrimRight) functions take an argument for a set of UTF-8 encoded characters to match on. In production code, I have seen incorrect uses of these functions where the user treats chars as simply a list of bytes, rather UTF-8 encoded characters.

Most likely, any invalid chars input is due to user error. The following is probably not what a user expects:

fmt.Printf("%q\n", strings.Trim("\x80test\xff", "\xff"))

This currently prints test, while the user expectation is probably \x80test.

The text was updated successfully, but these errors were encountered:

dominikh · 2016-11-11T11:01:02Z

@dsnet Is golint the right place for this check? AFAIK, golint is about style, not correctness. I'm currently trying to decide between sending a PR to golint and adding the check to my own staticcheck.

dsnet · 2016-11-12T22:35:36Z

I'm actually not sure. You are right that this is an issue about correctness, so it doesn't quite fit as a lint check either. According to golang/go#17780, the criterion for a vet check is: correctness, frequency, and precision.

When I looked over a large number of Go code inside Google, this mishap only occurred a few dozen times, so it probably doesn't hit the frequency requirements. If you want to add it to staticcheck, that would sound great.

Inspired by golang/lint#252 Idea-By: Joe Tsai <joetsai@digital-static.net>

dominikh · 2016-11-13T00:25:08Z

I've added the check to staticcheck. Running it against my corpus also yielded a very low number of results, so most likely not suitable for vet.

dsnet · 2016-11-14T18:43:44Z

I'm going to close this issue since the feature request fits neither vet nor lint. Thanks @dominikh for adding it to staticcheck.

Inspired by golang/lint#252 Idea-By: Joe Tsai <joetsai@digital-static.net>

dsnet mentioned this issue Oct 26, 2016

strings: regression in Trim functionality golang/go#17611

Closed

dominikh added a commit to dominikh/go-staticcheck that referenced this issue Nov 13, 2016

Detect invalid cutset/chars for various strings functions

860f5c2

Inspired by golang/lint#252 Idea-By: Joe Tsai <joetsai@digital-static.net>

dsnet closed this as completed Nov 14, 2016

dominikh added a commit to dominikh/go-tools that referenced this issue Jan 24, 2017

Detect invalid cutset/chars for various strings functions

1f9f791

Inspired by golang/lint#252 Idea-By: Joe Tsai <joetsai@digital-static.net>

suntong mentioned this issue Jul 25, 2017

How to deal with "Invalid rune in input data" spakin/awk#12

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check for invalid UTF-8 encoded strings in cutsets #252

Check for invalid UTF-8 encoded strings in cutsets #252

dsnet commented Oct 26, 2016

dominikh commented Nov 11, 2016

dsnet commented Nov 12, 2016

dominikh commented Nov 13, 2016

dsnet commented Nov 14, 2016

Check for invalid UTF-8 encoded strings in cutsets #252

Check for invalid UTF-8 encoded strings in cutsets #252

Comments

dsnet commented Oct 26, 2016

dominikh commented Nov 11, 2016

dsnet commented Nov 12, 2016

dominikh commented Nov 13, 2016

dsnet commented Nov 14, 2016