Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/parser: allow continuing parsing after package only / imports only #24273

Closed
josharian opened this issue Mar 6, 2018 · 14 comments
Closed

go/parser: allow continuing parsing after package only / imports only #24273

josharian opened this issue Mar 6, 2018 · 14 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Mar 6, 2018

I sometimes find myself wanting to parse only the imports (or only the package) from a file and then, if it is interesting, continue parsing the rest of the file.

go/parser doesn't support this right now. Instead you have to choose a non-ideal workaround:

  • read the entire file up front: memory inefficient)
  • seek to the beginning: only works with an io.Seeker, but e.g. go/build only guarantees an io.ReadCloser
  • use io.TeeReader and bytes.Buffer and io.MultiReader to cache the read bytes and then replay them: inefficient, fiddly
  • use io.TeeReader and io.Pipe to simultaneously start multiple parsers: wasteful in the early exit case

It'd be nice if instead we could just start the parser in package-only or imports-only mode and then ask the parser to keep going.

If we decide not to fix this for go/parser, it might still be worth addressing in the new parser's API.

cc @griesemer @mvdan

@mvdan
Copy link
Member

@mvdan mvdan commented Mar 6, 2018

This is an interesting idea. What happens at the moment with other cases like this one, such as when go build has to figure out whether files are excluded or not from a build depending on their build tags?

It'd be nice if instead we could just start the parser in package-only or imports-only mode and then ask the parser to keep going.

Have you given thought to a lower-level API? For example, besides the func that parses an entire file, funcs that parse a package clause, import declarations, and a top-level declaration. That could have other exit-early uses, for example if you were only interested in a const declared near the top of a file. And it could potentially mean less knobs like ImportsOnly.

@josharian
Copy link
Contributor Author

@josharian josharian commented Mar 6, 2018

Yeah, a "step by step" API would work nicely. First step package clause, then each subsequent step is one top level declaration. (Note that it might take several such steps to get through all imports.)

One downside is that the object resolution that go/parser does couldn't be completed for file-scoped and package-scoped objects until everything is parsed, which might be confusing.

@josharian
Copy link
Contributor Author

@josharian josharian commented Mar 6, 2018

Also would have to think about how this interacts with parsing and association of comments.

@mvdan
Copy link
Member

@mvdan mvdan commented Mar 6, 2018

Also would have to think about how this interacts with parsing and association of comments.

I was thinking in the happy land where comments are all attached to nodes and never floating around :) With that in mind, it's a similar case to object resolution. You aren't guaranteed to have all the comments in their right places until the entire file has been parsed (for example, for trailing comments right after a declaration).

It would also be neat if the func parsing a file was written in terms of those lower-level funcs. Then, if one needed some custom logic, they would just have to copy the ten-line func body and change it.

@mvdan
Copy link
Member

@mvdan mvdan commented Mar 6, 2018

Going back to go/parser vs syntax - I personally think the lower-level API is a bit too invasive to include in the existing go/parser. It would require introducing a type and a bunch of methods. And the new signatures would likely not match ParseFile much at all.

But I don't see an easier way to fit this into the existing API. Another parser mode can't fix it, as there's still no way to tell the parser to stop or continue. And you'd always have to deal with the free-floating comments, which at least would make the API more verbose.

@mvdan
Copy link
Member

@mvdan mvdan commented Mar 6, 2018

Here's a somewhat related idea - it would be nice if the parser also allowed parsing other kinds of nodes alone. At the moment, only an entire file or a single expression are possible. I would find it very useful if it also allowed parsing of a single declaration (as discussed above), a single type expression, and a single statement, among others.

At the moment, the only workaround is to use templating to form a full file, which is what a package of mine does. However, that results in lots of boilerplate code, confusing errors if something goes wrong, and position information that is very wrong. That's what gofmt -r does too, I think. See: https://github.com/mvdan/gogrep/blob/fb1920eae0dfe92b189205768dc459e0e5ad21d8/expr_parse.go#L86-L219

@josharian
Copy link
Contributor Author

@josharian josharian commented Mar 6, 2018

I believe that github.com/josharian/impl does some of this as well.

There is https://golang.org/pkg/go/parser/#ParseExpr and https://golang.org/pkg/go/parser/#ParseExprFrom.

I can imagine that decl parsing API might be one building block of an incremental parser, though.

@griesemer
Copy link
Contributor

@griesemer griesemer commented Mar 7, 2018

@josharian The go/parser has has Mode argument, and one of the modes is ImportsOnly. Is that what you are looking for?

@josharian
Copy link
Contributor Author

@josharian josharian commented Mar 7, 2018

No. I want to parse ImportsOnly and then, based on the results, decide whether or not to continue parsing the rest of the file.

@griesemer
Copy link
Contributor

@griesemer griesemer commented Mar 7, 2018

@josharian It may be cheap enough to use ImportsOnly first, and based on the result parse the entire file again. Another option could be to provide some kind of callback (but that would change the API).

@griesemer
Copy link
Contributor

@griesemer griesemer commented Mar 7, 2018

An "incremental" parser makes the API more complex and requires more consistency checks; it's also harder to guarantee the result. One point of the new syntax package is that it should have a simpler API (excluding the AST nodes for a moment). I was envisioning that a single Parse function should be able to determine what it parses and do the right thing (package, declaration, statement, expression). The result will tell what it parsed.

Regarding the option to parse up to a certain point and continue on demand: I'd do this with a callback where the callback has a chance to inspect the current AST and (probably) the next token. Either way, it has to be essentially zero cost when not needed.

But unless you have severe time constraints, I think there's a work-around right now as I suggested in #24273 (comment) .

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 12, 2018

ImportsOnly ought to be fast enough, and if not we should make it faster. It doesn't seem necessary to add more API here.

Decision: We won't do this.

@josharian
Copy link
Contributor Author

@josharian josharian commented Mar 12, 2018

I'm not going to appeal this--it seems like a reasonable decision--but just to set the record straight, the issue is not really around how fast/slow ImportsOnly is. It's about the awkwardness of managing the input, if you don't want to read the entire file into memory. See the list of workarounds in the original bug report.

@griesemer
Copy link
Contributor

@griesemer griesemer commented Mar 12, 2018

@josharian Understood. The parser (the scanner, really) requires the file to be in memory after all (at the bottom it doesn't use a reader, it uses a []byte). There's no easy way to change that so that not all of the file is loaded into memory if not needed; besides doing the work outside somehow. Which is another reason we don't want to change this API which is about 10 years old at this point.

The new syntax package uses an io.Reader all the way down and thus only reads what is needed (i.e., buffered).

@golang golang locked and limited conversation to collaborators Mar 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.