-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/analysis/checker: an importable analysis driver based on go/packages #61324
Comments
Change https://go.dev/cl/411907 mentions this issue: |
I like the proposal and have two questions about this.
|
The go/analysis/checker driver uses defer + recover around Analyzer.Run, so if the analyzer is single-threaded (as ~all are), it will catch the panic and treat it as a failure of that Action. Like compilation, all downstream actions will not be executed, but independent actions will continue as usual. All this information can be gleaned from the action graph returned by checker.Analyze.
The Fact data structures are part of the output: see the
If you want to distribute units of analysis, you should be using unitchecker, or something like it (it's not a lot of code), as it assumes that intermediate results between units are serialized. Perhaps I should add an Example in go/analysis (but not |
I see. I think I missed this part:
Thanks for clarifying (I should learn to read comment 😄 ) |
I think Since this is a non-main package, I don't think it should log anything. Consider instead providing that functionality as callbacks. For instance, replace |
Good point. On closer inspection, it only logs one message, a constant, at the very beginning. We can simply get rid of Verbose. Thanks for pointing this out. |
This proposal doesn't include a "convenience" API for flags, but it does allow you to control all the relevant pieces of the driver and the analyzers from the outside:
for _, a := range analyzers {
a.Flags.VisitAll(func(f *flag.Flag) {
flag.Var(f.Value, a.Name + "." + f.Name, f.Usage) // --analyzer.flag=...
})
}
flag.Parse() We can always add more convenience functions later as clear needs arise, but the primary goal here is to make the tricky logic of the driver available in a form other than a complete application |
Thanks @Antonboom, I think you are saying that the proposal addresses the four issues you listed and that you are in favor of accepting it. I hope we can get it approved soon. (Please do correct me if I have misunderstood or if you think we should change the proposed API in some way.) |
No, no. I'm waiting this change a lot. I attached this example just with the hope that maybe it will push you to new thoughts. For example:
|
Great, thanks. Yes, in all three cases the new API makes these problems the client responsibility:
The amount of code in the load function is very small, and client applications are likely going to want fine-grained control over it, so I'd rather not introduce a helper function just yet. (The proposed API does state its requirements for the Mode flag, but that's all it needs to say.)
This logic is unrelated to analysis; it's there because singlechecker.Main is a main function, and any significant application has logic like this in its main function.
We may want to add some helper functions for flags, but it's surprisingly fiddly to come up with a good API for it, so I thought I would leave it out for now and see what needs arise. The important thing is that clients are unstuck, even if they have to copy a little code. |
This proposal has been added to the active column of the proposals project |
I am in favor of this proposal. However, I think we should consider having this package operate on an fs.FS, rather than the I suppose we can add In general, I'm not sure about adding all the helpers in this package. They seem more related to a particular command-line interface than they do to the code of the analysis algorithm. But I don't feel strongly. |
I assume you are primarily referring to ApplyFixes. It does seem reasonable to parameterize it over the file system in some way. An FS would be suitable for reading, but we still need a means to write the results. (There's no writable equivalent of FS, and proposal #45757 was declined.) I'll have a think. As for Analyze, the client is expected to have already obtained the packages.Packages, so it's too late to affect how files are read. That said, some analyzers do call os.OpenFile and I recently filed a proposal #62292 to add an OpenFile operation to analysis.Pass, in which case this interface would want an FS field so that reads done within the analyzer are consistent with whatever overlays were passed to go/packages.
The three Print functions aren't fundamental. They could be left for a later proposal, but I've included them to make it easy to build something like unitchecker without having to copy tons of code. The PrintDiagnostics function could also be usefully parameterized over an FS since it prints lines of context which it must read from somewhere. |
FWIW go/packages accepts overlays -- that's how gopls uses it. |
Understood, but my point is that the client does the packages.Load call before the call to Analyze, so Analyze has no need for the file system (at least until #62292 is accepted). |
Ack, sorry I misinterpreted your statement as saying that go/packages would have necessarily read from the OS. In that case, yes it does appear that this is only relevant to ApplyFixes. Should we accept a ReadFile/WriteFile interface? |
Yes, those two would be almost sufficient, but the implementation of ApplyFixes currently uses the robustio.FileID type (as a map key) to make it safe in the presence of file-system level aliasing: symbolic links, hard links, case insensitivity, path uncleanness. The algorithm could be expressed in terms of os.SameFile but only at the expense of a quadratic number of calls. So in addition to |
The quadratic term can be reduced to a trivial coefficient by using the content as the primary distinguishing key, calling SameFile only when the contents match, which should be rare for Go source files. The FS interfaces (even StatFS) don't seem to have an abstraction of os.SameFile, so we would need to users to provide it. Or we could say that, if users provide a custom FS, then file-system level aliasing becomes their problem. |
After discussion with Russ, I have removed a number of functions and simplified some of the names and signatures. See revised first note in this issue, or updated CL. |
ApplyFixes has been dropped from the proposal for now, in part because of the confusion of abstraction levels: the input is provided as a data structure produced from files read by the client calling packages.Load, but the ApplyFixes operation interacts with the file system directly. We should probably consider all these overlay and FS-related issues holistically in #62292. |
Dropping the comments to make the API easier to skim, I think we are down to:
The JSONDiagnostics and TextDiagnostics names seem wrong to me. I had suggested PrintJSON and PrintText, which make the action clearer and sort next to each other. For Options, I am still not sure about mentioning the flags. When we spoke last week I thought I understood you to say that unitchecker or multichecker define those flags, but I can't find any evidence of that in the x/tools repo:
|
I can change the names easily enough. The fact that they don't call fmt.Print, but write to a writer, made me think Print wasn't quite the right verb, but I don't feel strongly. More importantly, they write to an io.Writer but don't return an error. Should I make them return an error, or return their result as a []byte? I am inclined toward the former.
I misspoke, and misremembered when I wrote the doc comments "-p: ..." etc above: they are not regular flags, but single-letter arguments to a single |
Returning an error is fine. Better not to assume one giant []byte. |
@dominikh's feedback may be useful here. |
Here is what the standard library has to say about writing to an
|
scanner.PrintError seems like sufficient justification for PrintJSON and PrintText. |
Ok, Print it is.
|
Based on the discussion above, this proposal seems like a likely accept.
|
I'm slightly concerned about the proposed In Staticcheck's analysis runner, we instead construct the package graph from packages loaded with
and then compute ASTs and type information on demand when a package is ready to be analyzed, i.e. all of its dependencies have been analyzed. Once we're done, we discard the AST and type information. This results in overall much lower memory usage. One downside of our approach is that we load type information for dependencies from export data repeatedly, leading to more garbage and more CPU time spent on parsing export data and GC. (We do the same discarding and reloading for facts, but that part probably contributes much less to overall memory usage.) |
Also, are there any plans to add callbacks to support caching? That is, avoid recomputing facts and diagnostics for packages that haven't changed. |
Hi @dominikh.
You're right that it isn't the most asymptotically efficient solution. It would be more efficient to have Analyze produce (and then evict) type-annotated syntax as it goes, and for {single,multi}checker to take advantage of this optimization too. I suppose nothing stops us from retrofitting this optimization into the API proposed here: if syntax and/or types are missing, then Analyze could parse and type-check as it goes, using a parallel structure to avoid mutating the go/packages graph. But the motivation for this new API is for users that want to load and analyze a large program, re-using the tricky bits of {single,multi}checker.Main while inserting their own logic in between package.Load and Analyze and between Analyze and Exit, which the existing API makes impossible. So I expect that for them, the cost of holding everything in memory at once is quite acceptable. As always, asymptotically efficient separate analysis with persistent memoization of subproblems is best done using the unitchecker approach ( |
Are we sure we want to keep the "and if IsRoot" part? Keeping it does encourage compatibility with unitchecker and facts, but it will make the If we have analyzers Why is |
Correct.
Because Results are potentially very large data structures, such as the SSA representation of all the function bodies, and we want to evict them from memory. By contrast, Diagnostics are typically small and few in number. |
No change in consensus, so accepted. 🎉
|
We propose to create a new package, golang.org/x/tools/go/analysis/checker, to provide a function, Analyze, that applies a set of analyzers to a set of packages and returns a data structure providing access to the complete results of the operation.
Background: the go/analysis framework defines an interface for static checkers that inspect a parsed, type-checked Go package and report mistakes or suggest fixes. The interface allows checkers (e.g. printf, nilness) to be written independent of the driver, and drivers (e.g. go vet) to be written independent of the checkers. The x/tools repo provides two drivers. The first, unitchecker, used by go vet, performs "separate analysis", of one package at a time, using serialized types and facts for dependencies. The second, internal/checker, uses go/packages to load packages and all their dependencies from source. Currently it has two public interfaces, singlechecker and multichecker, that differ only trivially: a singlechecker has only one analyzer, whereas a multichecker contains many and has a slightly more complicated command-line interface. However, both of these public interfaces consist only of a Main function that does the entire job and then calls os.Exit.
Users reported that this was not a very useful building block, and that they wanted to incorporate analysis into larger applications, or add some extra steps after package loading but before analysis, or some extra postprocessing of the analysis results. See:
We propose to create a new package with the following public API.
The primary entry point is Analyze; it accepts a set of packages and analyzers and runs a unit of analysis (an "Action") for each element of their cross product, plus all necessary horizontal and vertical dependencies. The result is exposed as a graph, only the roots of which are provided in the Result struct; this approach is similar to the one used in
go/packages.Load
.To reduce churn and skew, the above API code is just the declarations but not all of the comments. You can see the complete API and implementation in https://go.dev/cl/411907.
Questions:
checker
, so as not to suggest commonality with unitchecker? "srcchecker", perhaps? (If we could start over, I would use "driver" for {single,multi,unit}checker and this package, and use "checker" for analysis.Analyzer.)The text was updated successfully, but these errors were encountered: