gopls/internal/lsp/cache: make type-checking incremental

In this CL type-checked packages are made entirely independent of each other, and package export data and indexes are stored in a file cache. As a result, gopls uses significantly less memory, and (with a warm cache) starts significantly faster. Other benchmarks have regressed slightly due to the additional I/O and export data loading, but not significantly so, and we have some ideas for how to further narrow or even close the performance gap. In the benchmarks below, based on the x/tools repository, we can see that in-use memory was reduced by 88%, and startup time with a warm cache by 65% (this is the best case where nothing has changed). Other benchmarks regressed by 10-50%, much of which can be addressed by improvements to the objectpath package (#51017), and by making package data serialization asynchronous to type-checking. Notably, we observe larger regressions in implementations, references, and rename because the index implementations (by Alan Donovan) preceded this change to type-checking, and so these benchmark statistics compare in-memory index performance to on-disk index performance. Again, we can optimize these if necessary by keeping certain index information in memory, or by decoding more selectively. name old in_use_bytes new in_use_bytes delta InitialWorkspaceLoad/tools-12 432M ± 2% 50M ± 2% -88.54% (p=0.000 n=10+10) name old time/op new time/op delta StructCompletion/tools-12 27.2ms ± 5% 31.8ms ± 9% +16.99% (p=0.000 n=9+9) ImportCompletion/tools-12 2.07ms ± 8% 2.21ms ± 6% +6.64% (p=0.004 n=9+9) SliceCompletion/tools-12 29.0ms ± 5% 32.7ms ± 5% +12.78% (p=0.000 n=10+9) FuncDeepCompletion/tools-12 39.6ms ± 6% 39.3ms ± 3% ~ (p=0.853 n=10+10) CompletionFollowingEdit/tools-12 72.7ms ± 7% 108.1ms ± 7% +48.59% (p=0.000 n=9+9) Definition/tools-12 525µs ± 6% 601µs ± 2% +14.33% (p=0.000 n=9+10) DidChange/tools-12 6.17ms ± 7% 6.77ms ± 2% +9.64% (p=0.000 n=10+10) Hover/tools-12 2.11ms ± 5% 2.61ms ± 3% +23.87% (p=0.000 n=10+10) Implementations/tools-12 4.04ms ± 3% 60.19ms ± 3% +1389.77% (p=0.000 n=9+10) InitialWorkspaceLoad/tools-12 3.84s ± 4% 1.33s ± 2% -65.47% (p=0.000 n=10+9) References/tools-12 9.72ms ± 6% 24.28ms ± 6% +149.83% (p=0.000 n=10+10) Rename/tools-12 121ms ± 8% 168ms ±12% +38.92% (p=0.000 n=10+10) WorkspaceSymbols/tools-12 14.4ms ± 6% 15.6ms ± 3% +8.76% (p=0.000 n=9+10) This CL is one step closer to the end* of a long journey to reduce memory usage and statefulness in gopls, so that it can be more performant and reliable. Specifically, this CL implements a new type-checking pass that loads and stores export data, cross references, serialized diagnostics, and method set indexes in the file system. Concurrent type-checking passes may share in-progress work, but after type-checking only active packages are kept in memory. Consequently, there can be no global relationship between type-checked packages. The work to break any dependence on global relationships was done over a long time leading up to this CL. In order to approach the previous type-checking performance, the following new optimizations are made: - the global FileSet is completely removed: repeatedly importing from export data resulted in a tremendous amount of unnecessary token.File information, and so FileSets had to be scoped to packages - files are parsed as a batch and stored in the LRU cache implemented in the preceding CL - type-checking is also turned into a batch process, so that overlapping nodes in the package graph may be shared during large type-checking operations such as the initial workspace load This new execution model enables several simplifications: - We no longer need to trim the AST before type-checking: TypeCheckMode and ParseExported are gone. - We no longer need to do careful bookkeeping around parsed files: all parsing uses the LRU parse cache. - It is no longer necessary to estimate cache heap usage in debug information. There is still much more to do. This new model for gopls's execution requires significant testing and experimentation. There may be new bugs in the complicated new algorithms that enable this change, or bugs related to the new reliance on export data (this may be the first time export data for packages with type errors is significantly exercised). There may be new environments where the new execution model does not have the same beneficial effect. (On the other hand, there may be some where it has an even more beneficial effect, such as resource limited environments like dev containers.) There are also a lot of new opportunities for optimization now that we are no longer tied to a rigid structure of in-memory data. *Furthermore, the following planned work is simply not done yet: - Implement precise pruning based on "deep" hash of imports. - Rewrite unimported completion, now that we no longer have cached import paths. For #57987 Change-Id: Iedfc16656f79e314be448b892b710b9e63f72551 Reviewed-on: https://go-review.googlesource.com/c/tools/+/466975 Run-TryBot: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
golang · Mar 3, 2023 · 21d2256 · 21d2256
1 parent ae05609
commit 21d2256
Show file tree

Hide file tree

Showing 57 changed files with 1,218 additions and 1,652 deletions.
diff --git a/gopls/internal/lsp/cache/analysis.go b/gopls/internal/lsp/cache/analysis.go
@@ -11,6 +11,7 @@ import (
 	"context"
 	"crypto/sha256"
 	"encoding/gob"
+	"encoding/json"
 	"errors"
 	"fmt"
 	"go/ast"
@@ -1155,14 +1156,18 @@ func mustDecode(data []byte, ptr interface{}) {
 	}
 }
 
-// -- data types for serialization of analysis.Diagnostic --
+// -- data types for serialization of analysis.Diagnostic and source.Diagnostic --
 
 type gobDiagnostic struct {
 	Location       protocol.Location
-	Category       string
+	Severity       protocol.DiagnosticSeverity
+	Code           string
+	CodeHref       string
+	Source         string
 	Message        string
 	SuggestedFixes []gobSuggestedFix
 	Related        []gobRelatedInformation
+	Tags           []protocol.DiagnosticTag
 }
 
 type gobRelatedInformation struct {
@@ -1171,8 +1176,16 @@ type gobRelatedInformation struct {
 }
 
 type gobSuggestedFix struct {
-	Message   string
-	TextEdits []gobTextEdit
+	Message    string
+	TextEdits  []gobTextEdit
+	Command    *gobCommand
+	ActionKind protocol.CodeActionKind
+}
+
+type gobCommand struct {
+	Title     string
+	Command   string
+	Arguments []json.RawMessage
 }
 
 type gobTextEdit struct {
@@ -1218,11 +1231,17 @@ func toGobDiagnostic(posToLocation func(start, end token.Pos) (protocol.Location
 	if err != nil {
 		return gobDiagnostic{}, err
 	}
+
 	return gobDiagnostic{
-		Location:       loc,
-		Category:       diag.Category,
+		Location: loc,
+		// Severity for analysis diagnostics is dynamic, based on user
+		// configuration per analyzer.
+		// Code and CodeHref are unset for Analysis diagnostics,
+		// TODO(rfindley): set Code fields if/when golang/go#57906 is accepted.
+		Source:         diag.Category,
 		Message:        diag.Message,
-		Related:        related,
 		SuggestedFixes: fixes,
+		Related:        related,
+		// Analysis diagnostics do not contain tags.
 	}, nil
 }
diff --git a/gopls/internal/lsp/cache/cache.go b/gopls/internal/lsp/cache/cache.go
@@ -6,13 +6,7 @@ package cache
 
 import (
 	"context"
-	"fmt"
-	"go/ast"
-	"go/token"
-	"go/types"
-	"html/template"
 	"reflect"
-	"sort"
 	"strconv"
 	"sync/atomic"
 
@@ -29,22 +23,15 @@ import (
 // Both the fset and store may be nil, but if store is non-nil so must be fset
 // (and they must always be used together), otherwise it may be possible to get
 // cached data referencing token.Pos values not mapped by the FileSet.
-func New(fset *token.FileSet, store *memoize.Store) *Cache {
+func New(store *memoize.Store) *Cache {
 	index := atomic.AddInt64(&cacheIndex, 1)
 
-	if store != nil && fset == nil {
-		panic("non-nil store with nil fset")
-	}
-	if fset == nil {
-		fset = token.NewFileSet()
-	}
 	if store == nil {
 		store = &memoize.Store{}
 	}
 
 	c := &Cache{
 		id:         strconv.FormatInt(index, 10),
-		fset:       fset,
 		store:      store,
 		memoizedFS: &memoizedFS{filesByID: map[robustio.FileID][]*DiskFile{}},
 	}
@@ -56,8 +43,7 @@ func New(fset *token.FileSet, store *memoize.Store) *Cache {
 // TODO(rfindley): once fset and store need not be bundled together, the Cache
 // type can be eliminated.
 type Cache struct {
-	id   string
-	fset *token.FileSet
+	id string
 
 	store *memoize.Store
 
@@ -90,98 +76,3 @@ var cacheIndex, sessionIndex, viewIndex int64
 
 func (c *Cache) ID() string                     { return c.id }
 func (c *Cache) MemStats() map[reflect.Type]int { return c.store.Stats() }
-
-type packageStat struct {
-	id        PackageID
-	mode      source.ParseMode
-	file      int64
-	ast       int64
-	types     int64
-	typesInfo int64
-	total     int64
-}
-
-func (c *Cache) PackageStats(withNames bool) template.HTML {
-	var packageStats []packageStat
-	c.store.DebugOnlyIterate(func(k, v interface{}) {
-		switch k.(type) {
-		case packageHandleKey:
-			v := v.(typeCheckResult)
-			if v.pkg == nil {
-				break
-			}
-			typsCost := typesCost(v.pkg.types.Scope())
-			typInfoCost := typesInfoCost(v.pkg.typesInfo)
-			stat := packageStat{
-				id:        v.pkg.id,
-				mode:      v.pkg.mode,
-				types:     typsCost,
-				typesInfo: typInfoCost,
-			}
-			for _, f := range v.pkg.compiledGoFiles {
-				stat.file += int64(len(f.Src))
-				stat.ast += astCost(f.File)
-			}
-			stat.total = stat.file + stat.ast + stat.types + stat.typesInfo
-			packageStats = append(packageStats, stat)
-		}
-	})
-	var totalCost int64
-	for _, stat := range packageStats {
-		totalCost += stat.total
-	}
-	sort.Slice(packageStats, func(i, j int) bool {
-		return packageStats[i].total > packageStats[j].total
-	})
-	html := "<table><thead><td>Name</td><td>total = file + ast + types + types info</td></thead>\n"
-	human := func(n int64) string {
-		return fmt.Sprintf("%.2f", float64(n)/(1024*1024))
-	}
-	var printedCost int64
-	for _, stat := range packageStats {
-		name := stat.id
-		if !withNames {
-			name = "-"
-		}
-		html += fmt.Sprintf("<tr><td>%v (%v)</td><td>%v = %v + %v + %v + %v</td></tr>\n", name, stat.mode,
-			human(stat.total), human(stat.file), human(stat.ast), human(stat.types), human(stat.typesInfo))
-		printedCost += stat.total
-		if float64(printedCost) > float64(totalCost)*.9 {
-			break
-		}
-	}
-	html += "</table>\n"
-	return template.HTML(html)
-}
-
-func astCost(f *ast.File) int64 {
-	if f == nil {
-		return 0
-	}
-	var count int64
-	ast.Inspect(f, func(_ ast.Node) bool {
-		count += 32 // nodes are pretty small.
-		return true
-	})
-	return count
-}
-
-func typesCost(scope *types.Scope) int64 {
-	cost := 64 + int64(scope.Len())*128 // types.object looks pretty big
-	for i := 0; i < scope.NumChildren(); i++ {
-		cost += typesCost(scope.Child(i))
-	}
-	return cost
-}
-
-func typesInfoCost(info *types.Info) int64 {
-	// Most of these refer to existing objects, with the exception of InitOrder, Selections, and Types.
-	cost := 24*len(info.Defs) +
-		32*len(info.Implicits) +
-		256*len(info.InitOrder) + // these are big, but there aren't many of them.
-		32*len(info.Scopes) +
-		128*len(info.Selections) + // wild guess
-		128*len(info.Types) + // wild guess
-		32*len(info.Uses)
-	return int64(cost)
-}