Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/packages: Load in NeedName|NeedFiles mode up to 1000+ times slower than go/build #31893

Open
dmitshur opened this issue May 7, 2019 · 0 comments

Comments

Projects
None yet
1 participant
@dmitshur
Copy link
Member

commented May 7, 2019

This issue is related to #29452 and #31087, but is more narrowly focused on the performance of the packages.NeedName | packages.NeedFiles mode.

Background

The Go specification defines Go source file organization that is very conducive of extracting partial information with high performance and efficiency.

Specifically, each .go file always has 3 categories of content in well-specified order. The package clause is always first, followed by a possibly empty set of import declarations, and then finally the rest of its content.

This means that to find the package name in a .go file, a Go parser can stop after it parses the package clause. To find all imported packages in a .go file, the parser can stop after it reaches the first non-import declaration.

The Go parser provided by the go/parser package provides fine control over when to stop parsing a .go file via a Mode type:

const (
	PackageClauseOnly Mode = 1 << iota // stop parsing after package clause
	ImportsOnly                        // stop parsing after import declarations
	...
)

The Go specification also describes the concept of a Go package as follows:

Go programs are constructed by linking together packages. A package in turn is constructed from one or more source files that together declare constants, types, variables and functions belonging to the package and which are accessible in all files of the same package.

The go command defines a 4th interesting category of information, build tags or constraints. They are also highly constrained, in that they can appear at the top of a .go source file only. More precisely, from go/build documentation:

Constraints may appear in any kind of source file (not just Go), but they must appear near the top of the file, preceded only by blank lines and other line comments. These rules mean that in Go files a build constraint must appear before the package clause.

The go/build package makes good use of these properties when loading a Go package. The resolved package provides information about the package name, Go and other source files, and imports. To acquire that information, it invokes the Go parser in the following mode:

parser.ParseFile([...], parser.ImportsOnly|parser.ParseComments)

It stops parsing after import declarations, and it needs to parse comments in order to find the build constraint comments. At no point is it necessary to use the internet, since all the necessary information is already available in the provided .go files.

API of go/build

The go/build API does not offer a lot of control over the process of loading a package. It has a ImportMode type which can either be 0 to load Go packages normally (almost all information), or FindOnly:

const (
	// If FindOnly is set, Import stops after locating the directory
	// that should contain the sources for a package. It does not
	// read any files in the directory.
	FindOnly ImportMode = 1 << iota

	...
)

API of go/packages

In contrast to that, the go/packages API provides a much greater degree of freedom about requesting information about the loaded package. See the LoadMode type:

const (
	// NeedName adds Name and PkgPath.
	NeedName LoadMode = 1 << iota

	// NeedFiles adds GoFiles and OtherFiles.
	NeedFiles

	// NeedCompiledGoFiles adds CompiledGoFiles.
	NeedCompiledGoFiles

	// NeedImports adds Imports. If NeedDeps is not set, the Imports field will contain
	// "placeholder" Packages with only the ID set.
	NeedImports

	// NeedDeps adds the fields requested by the LoadMode in the packages in Imports. If NeedImports
	// is not set NeedDeps has no effect.
	NeedDeps

	...
)

As a result, it should be possible to achieve equal or better performance with go/packages API when the user requests as little information as just NeedName or NeedFiles (or both) without NeedImports, since it wouldn't be necessary to parse the .go files beyond the package clause.

Environment

$ go version
go version go1.12.5 darwin/amd64

All timing information was collected on a 2017 MacBook Pro with 256 GB flash storage.

Issue

The performance when using the go/packages package in packages.NeedName | packages.NeedFiles mode is significantly worse compared to the go/build package.

To reproduce the issue, create a target package to be imported in a temporary directory /tmp/target with the following files:

-- go.mod --
module m

go 1.12

require (
	github.com/google/go-github v17.0.0+incompatible
	github.com/google/go-querystring v1.0.0 // indirect
)

-- go.sum --
github.com/google/go-github v17.0.0+incompatible h1:N0LgJ1j65A7kfXrZnUDaYCs/Sf4rEjNlfyDHW9dolSY=
github.com/google/go-github v17.0.0+incompatible/go.mod h1:zLgOLi98H3fifZn+44m+umXrS52loVEgC2AApnigrVQ=
github.com/google/go-querystring v1.0.0 h1:Xkwi/a1rcvNg1PPYe5vI8GbeBY/jrVuDX5ASuANWTrk=
github.com/google/go-querystring v1.0.0/go.mod h1:odCYkC5MyYFN7vkCjXpyrEuKhc/BUO6wN/zVPAxq5ck=

-- main.go --
// +build ignore

package main

func main() {}

-- p.go --
package p

import "github.com/google/go-github/github"

var _ *github.Client

Then, inside another directory, create the following test command:

-- go.mod --
module m

go 1.12

require golang.org/x/tools v0.0.0-20190506145303-2d16b83fe98c

-- main.go --
package main

import (
	"fmt"
	"go/build"
	"time"

	"golang.org/x/tools/go/packages"
)

func main() {
	for driver, f := range map[string]func(dir string) (name string, goFiles []string, _ error){
		"go/build":    useGoBuild,
		"go/packages": useGoPackages,
	} {
		t0 := time.Now()
		name, goFiles, error := f("/tmp/target")
		t1 := time.Now()

		fmt.Println(driver)
		fmt.Println("  time:", t1.Sub(t0))
		fmt.Println("  error:", error)
		if error == nil {
			fmt.Println("  name:", name)
			fmt.Println("  goFiles:", goFiles)
		}
		fmt.Println()
	}
}

func useGoBuild(dir string) (name string, goFiles []string, _ error) {
	p, err := build.ImportDir(dir, 0)
	if err != nil {
		return "", nil, fmt.Errorf("build.ImportDir: %v", err)
	}
	return p.Name, p.GoFiles, nil
}

func useGoPackages(dir string) (name string, goFiles []string, _ error) {
	cfg := &packages.Config{
		Mode: packages.NeedName | packages.NeedFiles,
		Dir:  dir,
	}
	pkgs, err := packages.Load(cfg, ".")
	if err != nil {
		return "", nil, fmt.Errorf("packages.Load: %v", err)
	}
	if len(pkgs) != 1 {
		return "", nil, fmt.Errorf("got %d packages, want exactly 1 package", len(pkgs))
	}
	p := pkgs[0]
	if len(p.Errors) > 0 {
		return "", nil, fmt.Errorf("p.Errors: %v", p.Errors)
	}
	return p.Name, p.GoFiles, nil
}

Run it as follows:

$ export GO111MODULE=on
$ export GOPROXY=direct
$ export GOPATH=$(mktemp -d)
$ go build -o /tmp/loadpkg
$ /tmp/loadpkg
go/build
  time: 1.743083ms
  error: <nil>
  name: p
  goFiles: [p.go]

go/packages
  time: 1.822050861s
  error: <nil>
  name: p
  goFiles: [/tmp/target/p.go]

$ /tmp/loadpkg
go/build
  time: 1.441372ms
  error: <nil>
  name: p
  goFiles: [p.go]

go/packages
  time: 322.397ms
  error: <nil>
  name: p
  goFiles: [/tmp/target/p.go]

Notice that during the first invocation, go/build took 1.7 milliseconds to produce correct results, while go/packages took 1.8 seconds.

On the second invocation, go/build took 1.5 milliseconds (perhaps due to warmer disk cache), and go/packages took 322 milliseconds. Successive runs produce similar results.

go/packages became much faster on second run compared to first because the github.com/google/go-github and github.com/google/go-querystring modules were downloaded and extracted into the local module cache from the internet the first time, which doesn't need to happen the second time.

However, 300~ milliseconds is still significantly slower than 2~ milliseconds that go/build takes to produce the same correct results for this query. Additionally, all the necessary information to answer a query in NeedName|NeedFiles mode is available in the .go files on disk, so nothing needs to be downloaded from internet to return the correct result.

/cc @matloob per owners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.