Skip to content

proposal: go/ast: add Start token.Pos fields to BlockStmt and FieldList #73584

@adonovan

Description

@adonovan

Background: The go/ast package was designed with batch-oriented compiler-like applications in mind, where well-formed ASTs are overwhelmingly the common case. However, it is now the basis of many tools, including gopls, an interactive IDE backend, where ill-formed trees are the norm. We have observed that a major source of crashes in gopls is out-of-bounds accesses of various kinds causes by the parser creating syntax trees whose Pos/End range is not a subrange of its parent node. This usually happens because the End value is either (a) zero, because the final token was missing, or (b) greater than File.FileEnd, because the value was derived by adding an offset from a prior token under the assumption that the file is well formed. For example a file containing only package has an invented name token, _, whose end position is beyond EOF.

We would like to establish the invariant that every node returned by the parser has a Pos/End range that includes the ranges of its children. (For Files, pretend that Pos/End refers to FileStart/FileEnd for this discussion.) This change is mostly straightforward and compatible. However, due to unfortunate constraints imposed by doc comments, some syntax nodes must be created even when their source text is entirely missing. They are BlockStmt and FieldList. Consider an "if" token at EOF. This produces an IfStmt, and every IfStmt must (regrettably) have a BlockStmt, even though there are no braces nor statements. Consequently, the BlockStmt exists but is zero, and its Pos and End methods have nothing to go on. Its End method returns zero, and so does the End method of the parent IfStmt. Similarly, a "func" token at EOF yields a FuncDecl, and every FuncDecl must (regrettably) have a Params, even though there are no parens nor parameter fields. Again, a zero FieldList is created, and its Pos and End are zero.

This problem can be fixed by having the parser record the start position of these missing nodes. Adding a non-exported field to BlockStmt abnd FieldList would permit their Pos and End methods to do the right thing. However, a lot of client code assumes that all fields of ast.Node struct types are public, and misbehaves otherwise. Therefore, these new fields must be public.

Proposal: We propose to add two new fields, both Start token.Pos, to FieldList and BlockStmt. They would be unconditionally set to the start position of the syntax node.

package ast

type BlockStmt struct {
+       Start token.Pos // start position of the block (even when missing, as in IfStmt.Body for "if" at EOF)
        ...
}
type FieldList struct {
+       Start token.Pos // start position of the field list (even when missing, as in FuncDecl.Params for "func" at EOF)
        ...
}

Neither addition increases the size class of its struct.

Sketch implementation:

  • https://go.dev/cl/668238 - go/ast and go/parser changes. Patchset 5 shows limitations of using a nonexported field.
  • https://go.dev/cl/668677 - gopls changes, mostly to the abstraction-breaking completion logic, and workarounds for non-exported fields.

Metadata

Metadata

Assignees

Labels

LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a toolProposalgopls/parsingIssues related to parsing / poor parser recovery.

Type

No type

Projects

Status

Hold

Relationships

None yet

Development

No branches or pull requests

Issue actions