Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/vet: report strings.Trim/TrimLeft/TrimRight with duplicate runes in cutset #46533

Open
CAFxX opened this issue Jun 3, 2021 · 22 comments
Open

Comments

@CAFxX
Copy link
Contributor

@CAFxX CAFxX commented Jun 3, 2021

go vet should ideally report likely misuses of standard library functions especially when it comes to providing constant(-ish) string arguments, e.g. (note: the following list is absolutely not exhaustive)

  • strings.(Trim|TrimLeft|TrimRight) with a cutset containing duplicated runes
  • net/http.NewRequest with an invalid HTTP method or a non-HTTP url
  • net/http.(Do|Get|Post) with a non-HTTP url
  • regexp.(Must)?Compile with an invalid regexp
  • (text|html)/template.Template.Parse with an invalid template text
  • (and so on, for all functions/structs where it makes sense; not all of them need to be added right away, coverage will likely improve over time)
@jfesler
Copy link

@jfesler jfesler commented Jun 3, 2021

I especially like the regexp.MustCompile example, consider it panics. And I see people using this function in places that may or may not be observed in testing. Code reviews help guard against this; but the reviewers are unfortunately human.

@seankhliao seankhliao changed the title vet: report suspicious constant string arguments proposal: cmd/vet: report suspicious constant string arguments Jun 3, 2021
@gopherbot gopherbot added this to the Proposal milestone Jun 3, 2021
@mvdan
Copy link
Member

@mvdan mvdan commented Jun 3, 2021

Would this qualify for the "frequency" factor of vet checks? How often do these issues occur?

It's also worth noting that @dominikh's staticcheck has had some of these for a while, like https://staticcheck.io/docs/checks#SA1000.

@dominikh
Copy link
Member

@dominikh dominikh commented Jun 3, 2021

net/http.NewRequest with an invalid HTTP method

I believe there's nothing stopping a server from offering non-standard methods, so this could cause false positives.

or a non-HTTP url

With custom transports that could also lead to false positives.

@guodongli-google
Copy link

@guodongli-google guodongli-google commented Jun 6, 2021

One question is whether we should use one checker to check all these cases or one checker for each API.
StaticCheck has some related or similar checks, e.g.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Jun 8, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Jun 9, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Jun 9, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Jun 16, 2021

Vet mostly stays away from the standard library. It does check Printf (in fact that's what vet was created for).
But Printf is far more core to the use of Go than, say, templates, or even regular expressions.
It also seems weird to put more of these library-specific checks in that are not really available to third-party libraries.
(Again, Printf is an exception, but we may not want to add more.)

@CAFxX
Copy link
Contributor Author

@CAFxX CAFxX commented Jun 20, 2021

If this is the direction of the vet tool, then sure.

In this case though, just a few notes:

  • The vet tool description should really be clarified. As it is right now, it makes it sound like vet is the correct place for this kind of checks, because it uses an example to describe what the tool does, and that example is precisely validating arguments to standard library functions: "Vet examines Go source code and reports suspicious constructs, such as Printf calls whose arguments do not align with the format string." and "Vet is a tool that checks correctness of Go programs. It runs a suite of tests, each tailored to check for a particular class of errors. Examples include incorrect Printf format verbs and malformed build tags."
  • The vet README contains helpful criteria for inclusions, but they do not mention those you mentioned (not really supposed to cover the standard library, checks should be available to third-party libraries). Maybe the criteria would benefit from an update?
  • Specifically about the checks not being available to third-party libraries... isn't this an implementation detail? (in that it would seem to be enough to have the analysis live in https://pkg.go.dev/golang.org/x/tools/go/analysis, where again it sounds like it would belong). (Or am I misunderstanding what you meant there?)
  • As a personal opinion though, let me share that I do not really follow the rationale for not considering vet the correct place for checks involving (mis)uses of the standard library. It is definitely not intuitive why that would be the case, given that the standard library can not be really considered "not being part of go" (from the point of view of the spec, maybe, but I would argue that the fraction of Go programs and developers that rely exclusively on the spec and not on the standard library is rather small).
    Indeed, this criterion is somewhat undermined by the impression that a significant number of the existing vet checks (almost half?) are actually about various types of standard library misuse (atomic, copylocks, httpresponse, lostcancel, printf, stdmethods, structtag, tests, unmarshal, unsafeptr). While it's undoubtedly true that the kind of each one of these tests may differ, AFAICT they share in their nature of being about some semantic aspect of the standard library.

@timothy-king
Copy link
Contributor

@timothy-king timothy-king commented Jun 25, 2021

Indeed, this criterion is somewhat undermined by the impression that a significant number of the existing vet checks (almost half?) are actually about various types of standard library misuse (atomic, copylocks, httpresponse, lostcancel, printf, stdmethods, structtag, tests, unmarshal, unsafeptr).

You can also included atomicalign, deepequalerrors , errorsas, and sortslice .

Vet mostly stays away from the standard library.

I am not following this claim either given the precedent for including checks against the standard library within vet.

But Printf is far more core to the use of Go than, say, templates, or even regular expressions.

Does "core"-ness come down to just the "frequency" requirement of vet?

Anyhow my preference would be to have 4 discussions: duplicates in string cutsets, regexg.MustCompile, non-http urls, and invalid template text. I think combining these 4 cases is distracting. Each case will have a different frequency concern. @dominikh brought up false positives for one of the cases that does not apply to the others. I also not sure if constant-ish is interesting for all of these cases (or what constant-ish means). For example, it is not clear whether we should check regexp.MustCompile for invalid constant prefixes with variable suffices. (That seems complicated for not much gain compared to an inferred constant value.) There might be enough underlying similarities that grouping them would help justify creating a new checker? Say in aggregate they are frequent enough, but not individually?

@rsc
Copy link
Contributor

@rsc rsc commented Jul 14, 2021

It's hard to tell exactly where the line is, but http.Do/Get/Post, regexp.Compile, text/template.Parse all return errors that users are expected to check. It doesn't seem like vet needs to repeat these, and it really doesn't seem like vet should link in the template parser.

The http.NewRequest example is an incorrect check, since non-HTTP methods are permitted (and it returns an error anyway).

The strings.Trim examples may be worth adding. They are clear errors, and it comes up more often than we'd like because people confuse Trim/TrimLeft for TrimPrefix and TrimRight for TrimSuffix. The check wouldn't catch all such cases, but it could catch some. If someone has any data on how often that happens, we could narrow the issue to that check.

@rsc rsc changed the title proposal: cmd/vet: report suspicious constant string arguments proposal: cmd/vet: report strings.Trim/TrimLeft/TrimRight with duplicate runes in cutset Jul 28, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Jul 28, 2021

Based on the discussion, retitled to be only about strings.Trim/TrimLeft/TrimRight, detecting something like strings.TrimLeft(s, "http://") that really wants strings.TrimPrefix.

@timothy-king
Copy link
Contributor

@timothy-king timothy-king commented Jul 28, 2021

In case @rsc's example is not obvious enough, strings.TrimLeft(s, "http://") has a cutset with duplicate runes {"t", "/"}. So a duplicate rune detector would catch this. I find this example fairly persuasive.

@rsc rsc moved this from Active to Likely Accept in Proposals Aug 4, 2021
@rsc
Copy link
Contributor

@rsc rsc commented Aug 4, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

@rsc rsc commented Aug 11, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals Aug 11, 2021
@rsc rsc changed the title proposal: cmd/vet: report strings.Trim/TrimLeft/TrimRight with duplicate runes in cutset cmd/vet: report strings.Trim/TrimLeft/TrimRight with duplicate runes in cutset Aug 11, 2021
@rsc rsc removed this from the Proposal milestone Aug 11, 2021
@rsc rsc added this to the Backlog milestone Aug 11, 2021
@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Aug 19, 2021

Could the implemented analysis extended by suggesting the 2 possible fixes (remove the duplicate rune or replace the TrimLeft/TrimRight with TrimPrefix/TrimSuffix?

Complaining is cheap, but offering solutions is better 😉

@cespare
Copy link
Contributor

@cespare cespare commented Aug 19, 2021

@nightlyone this bug strongly implies a misunderstanding of what TrimLeft/TrimRight do, so suggesting a "fix" where we remove the duplicate runes seems like a mistake.

@timothy-king
Copy link
Contributor

@timothy-king timothy-king commented Aug 19, 2021

@nightlyone We first need to decide that we want to. @cespare's point is a good one and I am not confident about if/when we should be suggesting a fixes in this case. The proposal is for this checker to make it to cmd/vet which has cautious "Precision" requirements.

IMO extending the Analyzer to propose these fixes in the future given the current CL would not be that big of a deal.

FWIW there also seems to be some controversy about suggesting multiple fixes: https://cs.opensource.google/go/x/tools/+/refs/tags/v0.1.5:go/analysis/diagnostic.go;l=23-28 . But this discussion is probably better in a different venue (such as a new bug).

@robpike
Copy link
Contributor

@robpike robpike commented Aug 20, 2021

I'm not comfortable with this one. It's a lucky accident that a duplicated rune catches a string where TrimPrefix was the right thing to call. I'd rather put this on hold until a more predictable method is discovered.

According to the vet README, this one fails the "precision" rule.

@timothy-king
Copy link
Contributor

@timothy-king timothy-king commented Aug 20, 2021

I ran the Go checker in the CL across a large % of the Google monorepo Go code.

I took a look at the reports. IMO there were not rally any false positives. Each report indicated some confusion about what Trim, TrimLeft, or TrimRight does and were . Here is roughly the breakdown:

  1. The bulk of these were url processing examples that I think are clearly mistaking TrimLeft/TrimRight for TrimPrefix/TrimSuffix, e.g. "/var/", "http://", "__foo_bar", "git::", etc. This was about ~80% of the reports in this experiment.
  2. "strings.Trim(s, \n\t)" - this is because of the '`'. This is quite likely to be a bug as they are not capturing the runes they intended.
  3. strings.Trim(s, "//") - given the context these were probably also just variants of url processing, but I am less confident.
  4. Using strings.Trim(s, "''") to transform strings like "'GET'" into "GET". There might be enough domain in this case that they were doing the right thing.

That final case was maybe a style issue. The duplicate does suggest that they are probably confused about the API though.

@timothy-king
Copy link
Contributor

@timothy-king timothy-king commented Aug 20, 2021

According to the vet README, this one fails the "precision" rule.

@robpike I am not sure I understand what part of the precision rule this fails. Can you elaborate?

@robpike
Copy link
Contributor

@robpike robpike commented Aug 21, 2021

@timothy-king The existence of multiple characters does not necessarily mean that TrimPrefix is needed. It may be a mistake, but it may be harmless.

The original idea to check ordering would be more precise, although still not perfect. I feel what you have proposed feels good primarily because it catches "http". But it won't catch other things that are just as incorrect, such as "such as".

The goal is imprecise, the suggested fix imprecise. This is not a top-level change to vet.

@timothy-king
Copy link
Contributor

@timothy-king timothy-king commented Aug 27, 2021

@robpike I am not sure there is ever a good reason to have duplicates in the cutset, whether it is benign or not. It is a fairly strong indication that the user does not understand the function. So far I have not seen good in-the-wild evidence of false positives or bad advice.

If the concern is false negatives (not catching "/etc"), I think it is reasonable to accept having some false negatives as long as we are clear about what is happening so users do not use this as a crutch.

If the concern is that the documentation for the analyzer is being a bit too suggestive that the problem is TrimPrefix/TrimSuffix, I am happy to adjust this.

Maybe @rsc can give advice on the path forward if you feel the proposal should not have been accepted.

FWIW duplicates would alert on "such as". There are two "s" runes. It would also alert on "exempli gratia" and "e.g.", but not on "eg".

@rsc
Copy link
Contributor

@rsc rsc commented Sep 8, 2021

The concern here seems to be not as much about precision (too many false positives) as recall (too many false negatives): perhaps Talked to @taking and he is going to try to get some data about false negative rate. This also has a bearing on #47822. It may be that duplicates have too high a false negative rate but that 'duplicates or limited unsorted' is OK, where 'limited unsorted' means checking just that ABC..Z, abc..z, and 123..9 appear sorted, and not worry about whether " \t\r\n" is sorted, whatever that means.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Proposals
Accepted
Development

No branches or pull requests