Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/internal/gcimporter: too much CPU wasted in token.(*File).SetLines called from fakeFileSet #46586

Closed
g7r opened this issue Jun 5, 2021 · 13 comments
Assignees

Comments

@g7r
Copy link
Contributor

@g7r g7r commented Jun 5, 2021

What version of Go are you using (go version)?

$ go version
go version devel go1.17-ba64592e0b Sat Jun 5 10:00:51 2021 +0300 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/sz/.cache/go-build"
GOENV="/home/sz/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/sz/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/sz/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/sz/src/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/sz/src/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.17-ba64592e0b Sat Jun 5 10:00:51 2021 +0300"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2414470627=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I'm trying to deal with long go vet execution times for our codebase. My previous issue was #46564. Hopefully, its PR will be merged soon. The next CPU hotspot I've hit into is token.(*File).SetLines function. It is being called from fakeFileSet code https://github.com/golang/go/blob/master/src/cmd/compile/internal/importer/support.go#L39-L47 :

		// Allocate the fake linebreak indices on first use.
		// TODO(adonovan): opt: save ~512KB using a more complex scheme?
		fakeLinesOnce.Do(func() {
			fakeLines = make([]int, maxlines)
			for i := range fakeLines {
				fakeLines[i] = i
			}
		})
		f.SetLines(fakeLines)

We see that fakeLines int slice with contents like {0, 1, 2, ...} is being fed into SetLines. Here's SetLines implementation https://github.com/golang/go/blob/master/src/go/token/position.go#L178-L192 :

// SetLines sets the line offsets for a file and reports whether it succeeded.
// The line offsets are the offsets of the first character of each line;
// for instance for the content "ab\nc\n" the line offsets are {0, 3}.
// An empty file has an empty line offset table.
// Each line offset must be larger than the offset for the previous line
// and smaller than the file size; otherwise SetLines fails and returns
// false.
// Callers must not mutate the provided slice after SetLines returns.
//
func (f *File) SetLines(lines []int) bool {
	// verify validity of lines table
	size := f.size
	for i, offset := range lines {
		if i > 0 && offset <= lines[i-1] || size <= offset {
			return false
		}
	}

	// set lines table
	f.mutex.Lock()
	f.lines = lines
	f.mutex.Unlock()
	return true
}

So we have a rather large constant slice with 65536 elements being validated again and again in a hot code path.

SetLines function takes up to 60% of total CPU time in my benchmarks. I'm ready to implement a fix for that problem but I'm kind of limited with ideas. Is it possible to add a separate function like SetFakeLines that will use internal known to be valid line slice? Or maybe we don't need to call SetLines in fakeFileSet at all?

See attached profile: cpu.prof.gz
Screenshot: 2021-06-05-142104_1221x458_scrot

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Jun 5, 2021

@cuonglm
Copy link
Member

@cuonglm cuonglm commented Jun 5, 2021

I'm surprise that cmd/compile involves here. IIRC, it should be go/internal/gcimporter instead https://github.com/golang/go/blob/master/src/go/internal/gcimporter/support.go#L43-L47

@g7r
Copy link
Contributor Author

@g7r g7r commented Jun 5, 2021

@cuonglm, my bad, you are right. I haven't realized that there are two very similar types with the same name in different packages. If you look at the attached CPU profile, you'll see my mistake. Sorry.

But it doesn't really matter. The hotspot looks exactly the same, just resides in a different file.

@josharian
Copy link
Contributor

@josharian josharian commented Jun 6, 2021

@g7r
Copy link
Contributor Author

@g7r g7r commented Aug 17, 2021

The problem still persists on Go 1.17. The profile taken on Go 1.17 shows that SetLines still eats too much CPU.

@g7r
Copy link
Contributor Author

@g7r g7r commented Aug 17, 2021

I can (and want) to implement a fix. The thing is that I don't know all the design considerations so I don't see the big picture.

I propose two local solutions for the problem both changing public API of token package:

  1. Add (*File).SetFakeLines() method that should assign [65536]int{0,1,2,...} to lines field without validation.
  2. Add (*FileSet).AddFakeFile() method that will initialize lines field to [65536]int{0,1,2,...}.

What do you think?

@dominikh
Copy link
Member

@dominikh dominikh commented Oct 19, 2021

I don't think go/token should inherit the notion of "fake files". If we want to add new API, it should probably just be a faster SetLines that skips the sanity check.

As a somewhat dirty alternative, gcimporter could break out the unsafe hammer and set the field directly. Staticcheck has been doing that for over a year (dominikh/go-tools@941932f) at no ill effect.

@findleyr
Copy link
Contributor

@findleyr findleyr commented Oct 20, 2021

Another alternative might be to not set lines that we don't need

  • fake the calculation of token.Pos during import, since it's trivial
  • keep track of the largest line value seen for each file, and then only set fakeLines[:maxline] at the end of importing.

Not the cleanest fix but I think it would reduce the CPU cost significantly without requiring any API changes or use of unsafe.

@findleyr findleyr changed the title cmd/compile: too much CPU wasted in token.(*File).SetLines called from fakeFileSet go/internal/gcimporter: too much CPU wasted in token.(*File).SetLines called from fakeFileSet Oct 20, 2021
@findleyr findleyr self-assigned this Oct 20, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 20, 2021

Change https://golang.org/cl/357291 mentions this issue: go/internal/gcimporter: avoid setting unnecessary lines in fakeFileSet

@findleyr
Copy link
Contributor

@findleyr findleyr commented Oct 20, 2021

CL 357291 reduces the execution time of TestImportStdLib by around 50% in testing, which roughly aligns with the pprof results above. Removing the safety checks in token.File.SetLines doesn't show a significant additional improvement, so I think this CL is probably good enough.

(note that this CL is in x/tools, by convenience, but I'll update the standard library importer as well).

@gopherbot gopherbot closed this in cbf1d01 Oct 20, 2021
@gopherbot
Copy link

@gopherbot gopherbot commented Oct 20, 2021

Change https://golang.org/cl/357429 mentions this issue: go/internal/gcimporter: avoid setting unnecessary lines in fakeFileSet

gopherbot pushed a commit that referenced this issue Oct 20, 2021
This is a clean port of CL 357291 from x/tools.

For #46586

Change-Id: Ib22087ae7fe8477d368acd230b263b83cdad4d36
Reviewed-on: https://go-review.googlesource.com/c/go/+/357429
Trust: Robert Findley <rfindley@google.com>
Run-TryBot: Robert Findley <rfindley@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
@findleyr
Copy link
Contributor

@findleyr findleyr commented Oct 20, 2021

@g7r thank you for the detailed report, and for offering to fix this.

Also let me acknowledge that getting little feedback for months and then having someone send a fix is probably an unsatisfying interaction. Sorry about that. Our attentions are sliced pretty thin for this cycle and while I saw this issue go by, I didn't really have time to think about it until it popped up this week in another context. If there didn't happen to be a workaround, I think we'd have welcomed a proposal.

Thanks again for your help getting this fixed.

@g7r
Copy link
Contributor Author

@g7r g7r commented Oct 20, 2021

@findleyr, the problem is going to be fixed after all. That's the reason I've made this report. Thanks for the effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants