Skip to content

Commit

Permalink
gopls/internal/lsp/cache: make type-checking incremental
Browse files Browse the repository at this point in the history
In this CL type-checked packages are made entirely independent of each
other, and package export data and indexes are stored in a file cache.
As a result, gopls uses significantly less memory, and (with a warm
cache) starts significantly faster. Other benchmarks have regressed
slightly due to the additional I/O and export data loading, but not
significantly so, and we have some ideas for how to further narrow or
even close the performance gap.

In the benchmarks below, based on the x/tools repository, we can see
that in-use memory was reduced by 88%, and startup time with a warm
cache by 65% (this is the best case where nothing has changed). Other
benchmarks regressed by 10-50%, much of which can be addressed by
improvements to the objectpath package (#51017), and by making
package data serialization asynchronous to type-checking.

Notably, we observe larger regressions in implementations, references,
and rename because the index implementations (by Alan Donovan) preceded
this change to type-checking, and so these benchmark statistics compare
in-memory index performance to on-disk index performance. Again, we can
optimize these if necessary by keeping certain index information in
memory, or by decoding more selectively.

name                              old in_use_bytes  new in_use_bytes  delta
InitialWorkspaceLoad/tools-12            432M ± 2%          50M ± 2%    -88.54%  (p=0.000 n=10+10)

name                              old time/op       new time/op       delta
StructCompletion/tools-12              27.2ms ± 5%       31.8ms ± 9%    +16.99%  (p=0.000 n=9+9)
ImportCompletion/tools-12              2.07ms ± 8%       2.21ms ± 6%     +6.64%  (p=0.004 n=9+9)
SliceCompletion/tools-12               29.0ms ± 5%       32.7ms ± 5%    +12.78%  (p=0.000 n=10+9)
FuncDeepCompletion/tools-12            39.6ms ± 6%       39.3ms ± 3%       ~     (p=0.853 n=10+10)
CompletionFollowingEdit/tools-12       72.7ms ± 7%      108.1ms ± 7%    +48.59%  (p=0.000 n=9+9)
Definition/tools-12                     525µs ± 6%        601µs ± 2%    +14.33%  (p=0.000 n=9+10)
DidChange/tools-12                     6.17ms ± 7%       6.77ms ± 2%     +9.64%  (p=0.000 n=10+10)
Hover/tools-12                         2.11ms ± 5%       2.61ms ± 3%    +23.87%  (p=0.000 n=10+10)
Implementations/tools-12               4.04ms ± 3%      60.19ms ± 3%  +1389.77%  (p=0.000 n=9+10)
InitialWorkspaceLoad/tools-12           3.84s ± 4%        1.33s ± 2%    -65.47%  (p=0.000 n=10+9)
References/tools-12                    9.72ms ± 6%      24.28ms ± 6%   +149.83%  (p=0.000 n=10+10)
Rename/tools-12                         121ms ± 8%        168ms ±12%    +38.92%  (p=0.000 n=10+10)
WorkspaceSymbols/tools-12              14.4ms ± 6%       15.6ms ± 3%     +8.76%  (p=0.000 n=9+10)

This CL is one step closer to the end* of a long journey to reduce
memory usage and statefulness in gopls, so that it can be more
performant and reliable.

Specifically, this CL implements a new type-checking pass that loads and
stores export data, cross references, serialized diagnostics, and method
set indexes in the file system. Concurrent type-checking passes may
share in-progress work, but after type-checking only active packages are
kept in memory. Consequently, there can be no global relationship
between type-checked packages. The work to break any dependence on
global relationships was done over a long time leading up to this CL.

In order to approach the previous type-checking performance, the
following new optimizations are made:
 - the global FileSet is completely removed: repeatedly importing from
   export data resulted in a tremendous amount of unnecessary token.File
   information, and so FileSets had to be scoped to packages
 - files are parsed as a batch and stored in the LRU cache implemented
   in the preceding CL
 - type-checking is also turned into a batch process, so that
   overlapping nodes in the package graph may be shared during large
   type-checking operations such as the initial workspace load

This new execution model enables several simplifications:
 - We no longer need to trim the AST before type-checking:
   TypeCheckMode and ParseExported are gone.
 - We no longer need to do careful bookkeeping around parsed files: all
   parsing uses the LRU parse cache.
 - It is no longer necessary to estimate cache heap usage in debug
   information.

There is still much more to do. This new model for gopls's execution
requires significant testing and experimentation. There may be new bugs
in the complicated new algorithms that enable this change, or bugs
related to the new reliance on export data (this may be the first time
export data for packages with type errors is significantly exercised).
There may be new environments where the new execution model does not
have the same beneficial effect. (On the other hand, there may be
some where it has an even more beneficial effect, such as resource
limited environments like dev containers.) There are also a lot of new
opportunities for optimization now that we are no longer tied to a rigid
structure of in-memory data.

*Furthermore, the following planned work is simply not done yet:
 - Implement precise pruning based on "deep" hash of imports.
 - Rewrite unimported completion, now that we no longer have cached
   import paths.

For #57987

Change-Id: Iedfc16656f79e314be448b892b710b9e63f72551
Reviewed-on: https://go-review.googlesource.com/c/tools/+/466975
Run-TryBot: Robert Findley <rfindley@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopls-CI: kokoro <noreply+kokoro@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
  • Loading branch information
findleyr committed Mar 3, 2023
1 parent ae05609 commit 21d2256
Show file tree
Hide file tree
Showing 57 changed files with 1,218 additions and 1,652 deletions.
33 changes: 26 additions & 7 deletions gopls/internal/lsp/cache/analysis.go
Expand Up @@ -11,6 +11,7 @@ import (
"context"
"crypto/sha256"
"encoding/gob"
"encoding/json"
"errors"
"fmt"
"go/ast"
Expand Down Expand Up @@ -1155,14 +1156,18 @@ func mustDecode(data []byte, ptr interface{}) {
}
}

// -- data types for serialization of analysis.Diagnostic --
// -- data types for serialization of analysis.Diagnostic and source.Diagnostic --

type gobDiagnostic struct {
Location protocol.Location
Category string
Severity protocol.DiagnosticSeverity
Code string
CodeHref string
Source string
Message string
SuggestedFixes []gobSuggestedFix
Related []gobRelatedInformation
Tags []protocol.DiagnosticTag
}

type gobRelatedInformation struct {
Expand All @@ -1171,8 +1176,16 @@ type gobRelatedInformation struct {
}

type gobSuggestedFix struct {
Message string
TextEdits []gobTextEdit
Message string
TextEdits []gobTextEdit
Command *gobCommand
ActionKind protocol.CodeActionKind
}

type gobCommand struct {
Title string
Command string
Arguments []json.RawMessage
}

type gobTextEdit struct {
Expand Down Expand Up @@ -1218,11 +1231,17 @@ func toGobDiagnostic(posToLocation func(start, end token.Pos) (protocol.Location
if err != nil {
return gobDiagnostic{}, err
}

return gobDiagnostic{
Location: loc,
Category: diag.Category,
Location: loc,
// Severity for analysis diagnostics is dynamic, based on user
// configuration per analyzer.
// Code and CodeHref are unset for Analysis diagnostics,
// TODO(rfindley): set Code fields if/when golang/go#57906 is accepted.
Source: diag.Category,
Message: diag.Message,
Related: related,
SuggestedFixes: fixes,
Related: related,
// Analysis diagnostics do not contain tags.
}, nil
}
113 changes: 2 additions & 111 deletions gopls/internal/lsp/cache/cache.go
Expand Up @@ -6,13 +6,7 @@ package cache

import (
"context"
"fmt"
"go/ast"
"go/token"
"go/types"
"html/template"
"reflect"
"sort"
"strconv"
"sync/atomic"

Expand All @@ -29,22 +23,15 @@ import (
// Both the fset and store may be nil, but if store is non-nil so must be fset
// (and they must always be used together), otherwise it may be possible to get
// cached data referencing token.Pos values not mapped by the FileSet.
func New(fset *token.FileSet, store *memoize.Store) *Cache {
func New(store *memoize.Store) *Cache {
index := atomic.AddInt64(&cacheIndex, 1)

if store != nil && fset == nil {
panic("non-nil store with nil fset")
}
if fset == nil {
fset = token.NewFileSet()
}
if store == nil {
store = &memoize.Store{}
}

c := &Cache{
id: strconv.FormatInt(index, 10),
fset: fset,
store: store,
memoizedFS: &memoizedFS{filesByID: map[robustio.FileID][]*DiskFile{}},
}
Expand All @@ -56,8 +43,7 @@ func New(fset *token.FileSet, store *memoize.Store) *Cache {
// TODO(rfindley): once fset and store need not be bundled together, the Cache
// type can be eliminated.
type Cache struct {
id string
fset *token.FileSet
id string

store *memoize.Store

Expand Down Expand Up @@ -90,98 +76,3 @@ var cacheIndex, sessionIndex, viewIndex int64

func (c *Cache) ID() string { return c.id }
func (c *Cache) MemStats() map[reflect.Type]int { return c.store.Stats() }

type packageStat struct {
id PackageID
mode source.ParseMode
file int64
ast int64
types int64
typesInfo int64
total int64
}

func (c *Cache) PackageStats(withNames bool) template.HTML {
var packageStats []packageStat
c.store.DebugOnlyIterate(func(k, v interface{}) {
switch k.(type) {
case packageHandleKey:
v := v.(typeCheckResult)
if v.pkg == nil {
break
}
typsCost := typesCost(v.pkg.types.Scope())
typInfoCost := typesInfoCost(v.pkg.typesInfo)
stat := packageStat{
id: v.pkg.id,
mode: v.pkg.mode,
types: typsCost,
typesInfo: typInfoCost,
}
for _, f := range v.pkg.compiledGoFiles {
stat.file += int64(len(f.Src))
stat.ast += astCost(f.File)
}
stat.total = stat.file + stat.ast + stat.types + stat.typesInfo
packageStats = append(packageStats, stat)
}
})
var totalCost int64
for _, stat := range packageStats {
totalCost += stat.total
}
sort.Slice(packageStats, func(i, j int) bool {
return packageStats[i].total > packageStats[j].total
})
html := "<table><thead><td>Name</td><td>total = file + ast + types + types info</td></thead>\n"
human := func(n int64) string {
return fmt.Sprintf("%.2f", float64(n)/(1024*1024))
}
var printedCost int64
for _, stat := range packageStats {
name := stat.id
if !withNames {
name = "-"
}
html += fmt.Sprintf("<tr><td>%v (%v)</td><td>%v = %v + %v + %v + %v</td></tr>\n", name, stat.mode,
human(stat.total), human(stat.file), human(stat.ast), human(stat.types), human(stat.typesInfo))
printedCost += stat.total
if float64(printedCost) > float64(totalCost)*.9 {
break
}
}
html += "</table>\n"
return template.HTML(html)
}

func astCost(f *ast.File) int64 {
if f == nil {
return 0
}
var count int64
ast.Inspect(f, func(_ ast.Node) bool {
count += 32 // nodes are pretty small.
return true
})
return count
}

func typesCost(scope *types.Scope) int64 {
cost := 64 + int64(scope.Len())*128 // types.object looks pretty big
for i := 0; i < scope.NumChildren(); i++ {
cost += typesCost(scope.Child(i))
}
return cost
}

func typesInfoCost(info *types.Info) int64 {
// Most of these refer to existing objects, with the exception of InitOrder, Selections, and Types.
cost := 24*len(info.Defs) +
32*len(info.Implicits) +
256*len(info.InitOrder) + // these are big, but there aren't many of them.
32*len(info.Scopes) +
128*len(info.Selections) + // wild guess
128*len(info.Types) + // wild guess
32*len(info.Uses)
return int64(cost)
}

0 comments on commit 21d2256

Please sign in to comment.