Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: test failures with bad file number on solaris-amd64-oraclerel #58782

Open
gopherbot opened this issue Feb 28, 2023 · 16 comments
Open
Labels
gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Solaris Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@gopherbot
Copy link

gopherbot commented Feb 28, 2023

#!watchflakes
post <- pkg ~ `^golang\.org/x/tools` && log ~ `bad file number`

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestDefinition (22.48s)
    integration_test.go:193: gopls definition a.go:7:2: exited with code 2, want success: true (gopls definition a.go:7:2: exit=2 stdout=<<./a.go:3:6-7: defined here as func f()>> stderr=<<panic: /opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/bug/bug.go:56: internal error reading shared cache: RLock /opt/golang/.cache/gopls/18b62982/analysis/6b/6b316992556c8430def264891a46ec93579ccbc5189907222dcd2b789e667b3c: bad file number

        goroutine 2162 [running]:
        golang.org/x/tools/internal/bug.Report({0xc000048160, 0xaa}, 0x0)
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/bug/bug.go:71 +0x468
        golang.org/x/tools/internal/bug.Errorf({0xf2a0f8?, 0x93ec461a8964f2de?}, {0xc002927c70?, 0x3c7b669e782bcd2d?, 0xee80be?})
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/bug/bug.go:56 +0x3f
        golang.org/x/tools/gopls/internal/lsp/cache.analyzeImpl({0x116c950?, 0xc003282190}, 0xc000cf2280, {0xc001d3c350, 0x2, 0x2}, {0xc0007b0790, 0xc})
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/gopls/internal/lsp/cache/analysis.go:454 +0x774
        golang.org/x/tools/gopls/internal/lsp/cache.(*snapshot).analyze.func1({0x116c950?, 0xc003282190?}, {0xeda6a0?, 0xc000cf2280?})
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/gopls/internal/lsp/cache/analysis.go:339 +0x5a
        golang.org/x/tools/internal/memoize.(*Promise).run.func2.1()
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/memoize/memoize.go:187 +0xa9
        runtime/trace.WithRegion({0x116c950?, 0xc003282190?}, {0xc0024f4180, 0x14}, 0xc0005e4f90)
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/runtime/trace/annotation.go:141 +0xe3
        golang.org/x/tools/internal/memoize.(*Promise).run.func2()
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/memoize/memoize.go:180 +0x145
        created by golang.org/x/tools/internal/memoize.(*Promise).run
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/memoize/memoize.go:179 +0x1ea
        >>)

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 28, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "golang.org/x/tools/gopls/internal/lsp/cmd/test" && test == "TestDefinition"
2023-02-27 20:01 solaris-amd64-oraclerel tools@b0fcf2a0 go@1362737f x/tools/gopls/internal/lsp/cmd/test.TestDefinition (log)
--- FAIL: TestDefinition (22.48s)
    integration_test.go:193: gopls definition a.go:7:2: exited with code 2, want success: true (gopls definition a.go:7:2: exit=2 stdout=<<./a.go:3:6-7: defined here as func f()>> stderr=<<panic: /opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/bug/bug.go:56: internal error reading shared cache: RLock /opt/golang/.cache/gopls/18b62982/analysis/6b/6b316992556c8430def264891a46ec93579ccbc5189907222dcd2b789e667b3c: bad file number

        goroutine 2162 [running]:
        golang.org/x/tools/internal/bug.Report({0xc000048160, 0xaa}, 0x0)
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/bug/bug.go:71 +0x468
        golang.org/x/tools/internal/bug.Errorf({0xf2a0f8?, 0x93ec461a8964f2de?}, {0xc002927c70?, 0x3c7b669e782bcd2d?, 0xee80be?})
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/bug/bug.go:56 +0x3f
        golang.org/x/tools/gopls/internal/lsp/cache.analyzeImpl({0x116c950?, 0xc003282190}, 0xc000cf2280, {0xc001d3c350, 0x2, 0x2}, {0xc0007b0790, 0xc})
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/gopls/internal/lsp/cache/analysis.go:454 +0x774
        golang.org/x/tools/gopls/internal/lsp/cache.(*snapshot).analyze.func1({0x116c950?, 0xc003282190?}, {0xeda6a0?, 0xc000cf2280?})
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/gopls/internal/lsp/cache/analysis.go:339 +0x5a
        golang.org/x/tools/internal/memoize.(*Promise).run.func2.1()
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/memoize/memoize.go:187 +0xa9
        runtime/trace.WithRegion({0x116c950?, 0xc003282190?}, {0xc0024f4180, 0x14}, 0xc0005e4f90)
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/go/src/runtime/trace/annotation.go:141 +0xe3
        golang.org/x/tools/internal/memoize.(*Promise).run.func2()
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/memoize/memoize.go:180 +0x145
        created by golang.org/x/tools/internal/memoize.(*Promise).run
        	/opt/golang/tmp/workdir-host-solaris-oracle-amd64-oraclerel/gopath/src/golang.org/x/tools/internal/memoize/memoize.go:179 +0x1ea
        >>)

watchflakes

@gopherbot gopherbot added Tools This label describes issues relating to any tools in the x/tools repository. gopls Issues related to the Go language server, gopls. labels Feb 28, 2023
@gopherbot gopherbot added this to the Unreleased milestone Feb 28, 2023
@findleyr
Copy link
Contributor

CC @adonovan

@findleyr
Copy link
Contributor

findleyr commented Mar 2, 2023

I think we need to make a decision about on which OS we are going to support the new filecache logic.

I'd rather we use an in-memory fallback than completely stop supporting several OS that have unreliable FS behavior.

@adonovan
Copy link
Member

adonovan commented Mar 2, 2023

I think we need to make a decision about on which OS we are going to support the new filecache logic.

I argue for supporting POSIX (aix darwin linux {free,open,net}bsd illumos solaris) and Windows, but not Plan9. That means we disable any test that flakes on those OSs.

What about Android and iOS? Does even a single person use gopls on those platforms?

I'd rather we use an in-memory fallback than completely stop supporting several OS that have unreliable FS behavior.

I think we can rely on POSIX and Windows for the operations gopls needs. netbsd has been a little flaky on the builders recently, but I can't tell yet whether that's a problem with the kernel or just our builder. Let's keep an eye out for it. Ditto solaris.

I'd really rather not make a memory-based fallback.

@hyangah
Copy link
Contributor

hyangah commented Mar 8, 2023

I wonder if it makes more sense to disable gopls testing completely from the platforms we don't think gopls would work well, and make it clear what platforms are supported in the user documentation. (So, no false promise). My complaint about the platform-based skipped tests is it's hard to know from the builder state whether the functionality is actually working or not.

If there are users who want to use gopls from such excluded platforms, we need to ask to open issues (or proposal) with support plan/active contribution.

@adonovan
Copy link
Member

I wonder if it makes more sense to disable gopls testing completely from the platforms we don't think gopls would work well, and make it clear what platforms are supported in the user documentation. (So, no false promise). My complaint about the platform-based skipped tests is it's hard to know from the builder state whether the functionality is actually working or not.

Yes, that sounds like the right way to do it.

I couldn't tell from a glance at at https://cs.opensource.google/go/x/build/+/master:cmd/coordinator/buildstatus.go;l=813?q=gopls&ss=go%2Fx%2Fbuild how the set of (os, arch, package) test combinations is computed. Perhaps someone who knows this code better could point me in the right direction.

@findleyr
Copy link
Contributor

CC @golang/release

I'm afraid you may need to update x/build/dashboard.BuildConfig.buildsRepo to specialize to module (not just repo).
https://cs.opensource.google/go/x/build/+/master:dashboard/builders.go;l=783;drc=c5b0bb2e56eceb3121e2c3e66185e97f704733d6

Perhaps someone on the release team knows a better way.

@bcmills
Copy link
Member

bcmills commented Mar 14, 2023

FWIW, this particular test failure is on solaris, which is a POSIX platform and probably ought to be supported.

The EBADF error from the RLock operation comes from fcntl, which is used on aix and solaris and no other platforms (the rest all support flock, which has a simpler API).

The Solaris documentation for fcntl says:

EBADF
The fildes argument is not a valid open file descriptor; or the cmd argument is F_SETLK, F_SETLK64, F_SETLKW, or F_SETLKW64, the type of lock, l_type, is a shared lock (F_RDLCK), and fildes is not a valid file descriptor open for reading; or the type of lock l_type is an exclusive lock (F_WRLCK) and fildes is not a valid file descriptor open for writing.

Since fcntl is used on only a couple of platforms, it is possible that this failure is exposing a bug in our use of it.

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg ~ `^golang\.org/x/tools` && log ~ `bad file number`
2023-03-06 20:14 solaris-amd64-oraclerel tools@b72edd12 go@4a305be9 x/tools/gopls/internal/lsp/cmd/test.TestReferences (log)
--- FAIL: TestReferences (11.95s)
    integration_test.go:490: gopls references a.go:4:10: exited with code 2, want success: true (gopls references a.go:4:10: exit=2 stdout=<<>> stderr=<<2023/03/06 21:47:31 Error:2023/03/06 21:47:31 storing export data for math: Lock /opt/golang/.cache/gopls/4a243052/export/8d/8da63d17e89929b19e4cafc1feba1dff33fd8cd659c2da17cfeb787fc86c6f00: bad file number
        gopls: RLock /opt/golang/.cache/gopls/4a243052/export/aa/aafe5da2a89ea707ff749d5217c3db6b6f7ac8a5974a591c675c40711109ce83: bad file number
        >>)
    integration_test.go:491: gopls references a.go:4:10: stdout does not match [a.go:4:6-13]; got <<>>
    integration_test.go:492: gopls references a.go:4:10: stdout does not match [b.go:4:6-13]; got <<>>
2023-03-13 18:43 solaris-amd64-oraclerel tools@243a9484 go@b852f395 x/tools/gopls/internal/lsp/cmd/test.TestReferences (log)
--- FAIL: TestReferences (10.31s)
    integration_test.go:490: gopls references a.go:4:10: exited with code 2, want success: true (gopls references a.go:4:10: exit=2 stdout=<<>> stderr=<<gopls: RLock /opt/golang/.cache/gopls/0d935b09/export/5c/5c19d490bd139935740bf392e86bbde5fc4f527c4689a1481a4fc93b522e0516: bad file number
        >>)
    integration_test.go:491: gopls references a.go:4:10: stdout does not match [a.go:4:6-13]; got <<>>
    integration_test.go:492: gopls references a.go:4:10: stdout does not match [b.go:4:6-13]; got <<>>

watchflakes

@adonovan
Copy link
Member

I agree with @bcmills that the log proves this chain of calls:

lockedfile.Read
lockedfile.Open
lockedfile.OpenFile(RDONLY, 0)
openFile
filelock.RLock
lock (in filelock_fcntl.go)
setlkw (since the error is an RLock PathError with a message other than "inode for file changed")
syscall.FcntlFlock
libc_fcntl

It's not possible that f's finalizer is run during the the call into libc, as the setlkw call is followed by f.Name(), and also f was a key in the global inodes map.

But it's possible the syscall.Flock_t is getting garbage collected since the only reference to it is a uintptr passed to the syscall.

@bcmills
Copy link
Member

bcmills commented Mar 14, 2023

But it's possible the syscall.Flock_t is getting garbage collected since the only reference to it is a uintptr passed to the syscall.

syscall.FcntlFlock accepts the *Flock_t as a pointer, and only converts it to uintptr directly in the call to sysvicall6, which is implemented in assembly.

If I understand the discussion on #34684 correctly, the fact that sysvicall6 is implemented in assembly should cause the compiler and runtime to consider its uintptr arguments reachable for the duration of the call.

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg ~ `^golang\.org/x/tools` && log ~ `bad file number`
2023-03-14 19:39 solaris-amd64-oraclerel tools@537c4aa6 go@fbf4c04f x/tools/gopls/internal/lsp/cmd/test.TestReferences (log)
--- FAIL: TestReferences (9.65s)
    integration_test.go:490: gopls references a.go:4:10: exited with code 2, want success: true (gopls references a.go:4:10: exit=2 stdout=<<>> stderr=<<gopls: RLock /opt/golang/.cache/gopls/13b3e2cd/export/07/07477a18c5bba2369a1c976e137c269813b882f46ff3017436a0bbd1cfbfcc4c: bad file number
        >>)
    integration_test.go:491: gopls references a.go:4:10: stdout does not match [a.go:4:6-13]; got <<>>
    integration_test.go:492: gopls references a.go:4:10: stdout does not match [b.go:4:6-13]; got <<>>

watchflakes

@bcmills bcmills changed the title x/tools/gopls/internal/lsp/cmd/test: TestDefinition failures x/tools/gopls: test failures with bad file number on solaris-amd64-oraclerel Mar 14, 2023
@bcmills
Copy link
Member

bcmills commented Mar 14, 2023

If this were due to a bug in our use of FcntlFlock, I would expect the same failure mode to show up on the aix builder (which uses the same implementation). I don't see anything in gopls that specifically excludes that platform, so I'm more inclined to suspect a bug in either the Solaris filesystem or the Go standard library on the solaris platform.

(attn @golang/solaris)

@bcmills
Copy link
Member

bcmills commented Mar 14, 2023

On the other hand, it seems odd that we see this failure mode when testing x/tools/gopls but not cmd/go.

(Maybe the bug in the filesystem has to do with deleting a file concurrently with locking it? We almost never delete files in the Go module cache during tests.)

@suzmue suzmue modified the milestones: Unreleased, gopls/v0.12.0 Mar 16, 2023
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg ~ `^golang\.org/x/tools` && log ~ `bad file number`
2023-03-16 15:43 solaris-amd64-oraclerel tools@6e5f4255 go@006b35c0 x/tools/gopls/internal/lsp/cmd/test.TestReferences (log)
--- FAIL: TestReferences (9.77s)
    integration_test.go:490: gopls references a.go:4:10: exited with code 2, want success: true (gopls references a.go:4:10: exit=2 stdout=<<>> stderr=<<gopls: RLock /opt/golang/.cache/gopls/fa8d3aab/export/da/da6f91af61aae44ebc294a2f70e6ee39e07fd7245959a8e872d6e832156c8ed1: bad file number
        >>)
    integration_test.go:491: gopls references a.go:4:10: stdout does not match [a.go:4:6-13]; got <<>>
    integration_test.go:492: gopls references a.go:4:10: stdout does not match [b.go:4:6-13]; got <<>>

watchflakes

@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg ~ `^golang\.org/x/tools` && log ~ `bad file number`
2023-03-16 15:43 solaris-amd64-oraclerel tools@6e5f4255 go@fa42da15 x/tools/gopls/internal/lsp/cmd/test.TestReferences (log)
--- FAIL: TestReferences (13.16s)
    integration_test.go:490: gopls references a.go:4:10: exited with code 2, want success: true (gopls references a.go:4:10: exit=2 stdout=<<>> stderr=<<gopls: RLock /opt/golang/.cache/gopls/7e80f41e/export/b2/b2aa9aca13567c4713d965eeb7be1ef33d79f2f034b0339067690f3953c63200: bad file number
        >>)
    integration_test.go:491: gopls references a.go:4:10: stdout does not match [a.go:4:6-13]; got <<>>
    integration_test.go:492: gopls references a.go:4:10: stdout does not match [b.go:4:6-13]; got <<>>

watchflakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Solaris Tools This label describes issues relating to any tools in the x/tools repository.
Projects
Status: No status
Development

No branches or pull requests

6 participants