Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: unable to type check #416

Closed
martin-sucha opened this issue May 11, 2020 · 16 comments · Fixed by #425
Closed

panic: unable to type check #416

martin-sucha opened this issue May 11, 2020 · 16 comments · Fixed by #425

Comments

@martin-sucha
Copy link

Describe the bug
We've seen a transient panic in revive in our CI job (it did not occur when the job was retried).

revive -exclude vendor/... -formatter friendly -config .revive.toml ./...
 panic: unable to type check [redacted file A]:[redacted file A]:3:8: could not import [redacted pkg B] (can't find import: [redacted pkg B])
 goroutine 123 [running]:
 github.com/mgechev/revive/rule.(*EmptyBlockRule).Apply(0x9e06d0, 0xc000f70b80, 0x0, 0x0, 0x0, 0x9e06d0, 0x0, 0x0)
    /go/src/github.com/mgechev/revive/rule/empty-block.go:23 +0x20e
 github.com/mgechev/revive/lint.(*File).lint(0xc000f70b80, 0xc0003b6800, 0x27, 0x40, 0x0, 0x3fd0000000000000, 0xc000030a40, 0x7, 0xc00007dfb0, 0x1, ...)
    /go/src/github.com/mgechev/revive/lint/file.go:108 +0x3b6
 github.com/mgechev/revive/lint.(*Package).lint.func1(0xc0003b6800, 0x27, 0x40, 0x0, 0x3fd0000000000000, 0xc000030a40, 0x7, 0xc00007dfb0, 0x1, 0x1, ...)
    /go/src/github.com/mgechev/revive/lint/package.go:173 +0xb5
 created by github.com/mgechev/revive/lint.(*Package).lint
    /go/src/github.com/mgechev/revive/lint/package.go:172 +0x179

panic("unable to type check " + file.Name + ":" + err.Error())

To Reproduce

I couldn't reproduce the issue - it worked when I retried the job. Could it have been caused by some network issue when getting the package from our internal git repo?

Expected behavior
Expected to see better error message describing why it couldn't find the import.

Desktop (please complete the following information):

  • Version of Go: 1.14.2
@chavacava
Copy link
Collaborator

Hi @martin-sucha, thanks for filling the issue.
Since yesterday, the rule empty-block needs to type check the code before analyse it (see #386)
When type check fails, we do not have other information than the one we print in the error message.

To alleviate the problem, we could:

  1. avoid the panic, add a failure in the output of revive (see Run revive over a project with an invalid go source file #364) and do not analyse the file
  2. same of above but revive analyses the file (without type information)
  3. ignore the error (no related failure) and analyse the file

Personally, I prefer options 2 and 3. The last one been iso-functional with the previous version of the rule.

@martin-sucha
Copy link
Author

Okay, so I've seen another instance of the panic, indeed seems to be related to network since an unrelated job failed at the same time that clearly shown network error.

Could you please confirm whether the type check actually downloads data from network or not? If it doesn't I need to check whether we have a bug in other parts of the CI job (which should have reported it instead). If it does, I'd like to pinpoint why the error message does not state that - I see the can't find import error is returned in Go sources at https://github.com/golang/go/blob/810c27e9be647ce4da8930ff3625a856041ae5b2/src/go/internal/gcimporter/gcimporter.go#L115, but it looks like it just searches some directories.

As for the changes here, we could do option 3, but since we need the type information to skip empty block for channel draining, missing package could cause the rule to emit wrong message (empty block) instead of the real one (missing package), which would be even harder to debug - so I would prefer an option that reports the type check failure. If we could provide more information about why a package could not be found in the error, even better.

I've noticed that other call sites of TypeCheck always ignore the error - is there a particular reason why the other rules don't handle the error?

@mgechev
Copy link
Owner

mgechev commented May 13, 2020

Okay, so I've seen another instance of the panic, indeed seems to be related to network since an unrelated job failed at the same time that clearly shown network error.

I'd be surprised if the type checker uses network resources, this would introduce large latency.

I'm thinking if the completeness we get with type checking is worth it the performance penalty. Maybe instead of solving #386, we can just reduce the confidence level.

@chavacava @martin-sucha what do you think?

@chavacava
Copy link
Collaborator

@mgechev, I was thinking on something like that

If type-checking is not OK we can go on but reducing the confidence level of failures at for-range statements

@chavacava
Copy link
Collaborator

@martin-sucha wrote

I've noticed that other call sites of TypeCheck always ignore the error - is there a particular reason why the other rules don't handle the error?

It is a coherence problem we need to address

@mrVanboy
Copy link

Some more information from the investigation:

  1. I pulled a docker image of the job from the CI (it has only go mod download).
  2. Installed revive with GO111MODULE=off go get -u -v github.com/mgechev/revive.
  3. Run revive -exclude vendor/... -formatter friendly -config .revive.toml ./...

I've done 3rd step with and without network connection by applying docker network disconnect bridge CONTAINER_NAME. The results were the same and unstable:

After ~10 (this number changes randomly) successful runs, I got the similar panic mentioned by @martin-sucha. Packages in the panic messages were different (publicly available from github and private from our internal VCS).

Also, I tried to reproduce a similar flow with github.com/mgechev/revive repo without success, unfortunately.

Dockerfile where I tried to reproduce the error
FROM  golang:1.14.2@sha256:09b04534495af5148e4cc67c8ac55408307c2d7b9e6ce70f6e05f7f02e427f68

ADD https://github.com/mgechev/revive/archive/3119f5881c5dd864f346426752eb745d3eddb17d.tar.gz /tmp/revive.tar.gz

RUN     mkdir -p /revive &&\
        tar xf /tmp/revive.tar.gz -C /revive --strip 1 &&\
        rm -f /tmp/revive.tar.gz

WORKDIR /revive

RUN GO111MODULE=off go get -u -v github.com/mgechev/revive

@mgechev
Copy link
Owner

mgechev commented May 18, 2020

@chavacava I'd suggest to revert the fix which introduced type checking and reduce the confidence level. Looks like there were fewer cases when folks had issues with channel draining compared to the regression. WDYT?

@chavacava
Copy link
Collaborator

@mgechev I'm OK with reverting and reduce confidence (to how much?)
Another option is: try type checking, if fails apply the rule by lowering the confidence only on issues related to for-range (the single case where type info is used)
Let me know what do you prefer, I'll try to fix it tomorrow.

@mgechev
Copy link
Owner

mgechev commented May 20, 2020

The first option seems more deterministic. Alternatively, we can also match the most common channel training AST pattern, and still reduce with 0.1.

@martin-sucha
Copy link
Author

martin-sucha commented May 21, 2020

I see you removed the type check, which will fix the issue with panic.

However, the type checking might still be failing in the other rules as well, the error is just suppressed, so it would be good to find the root cause of this.

I managed to reproduce the panic (on version of code before you removed it from the empty-block rule) with just revive repo and it seems to happen more often than on the private repo we encountered the issue on. To reproduce, just checkout https://github.com/kiwicom/revive/commits/ms/issue-416-repro branch, build revive with make build and then run ./repro.sh. Alternately there is a version that records the run with Mozilla RR (go build -gcflags "all=-N -l" && ./repro-rr.sh) and a version that runs it in Docker (./repro-docker.sh)

Ultimately, I think we should find out whether the can't find import is caused by an issue in revive (e.g. not calling types.Config.Check correctly in some cases) or whether this is Go issue (and report it there, possibly with even smaller reproducer).

@martin-sucha
Copy link
Author

Also, I found out that there is a code path in https://github.com/golang/go/blob/dfd613e0e4fd93ef945e9fbd6d42b79bcaf73905/src/go/build/build.go#L1017 that spawns go subprocess, so I tried to add a global mutex around config.Check call (package.go:104), but that did not prevent the panic from happening.

@chavacava
Copy link
Collaborator

chavacava commented May 21, 2020

@martin-sucha thanks for your analysis and the tooling.
I've checked-out your branch and made some tests and, I have both, good and bad news. The good one: I think I've found (accidentally) a fix for the problem. The bad news: I do not yet understand the "fix".

Trying to get more information about the error, I've modified the configuration of the type-checker to:

config := &types.Config{
		// By setting a no-op error reporter, the type checker does as much work as possible.
		Error: func(err error) { panic(err.Error()) },
		//WAS Error:    func(error) {},
		Importer: newImporter(p.fset),
	}

Then I've used your script to reproduce the problem but... it works, no more panics
I've tested against K8S code-base and now revive is able to type-check it.

@chavacava
Copy link
Collaborator

Well further investigations show that setting Config with an Error function that does not include a call to panic will make type-checking to fail, contrary if the function calls panic the type-checking goes on.

@martin-sucha
Copy link
Author

Hmm, the behavior you described above is caused by

	defer func() {
		if r := recover(); r != nil {
			err, _ = r.(error)
			p = nil
			return
		}
	}()

in revive/lint/package.go check() - if the value that was recovered is anything else than an error, err is set to nil and the panic is recovered. (you called panic with err.Error(), which is a string, not an error)

If I change it to:

	defer func() {
		if r := recover(); r != nil {
			if r2, ok := r.(error); ok {
				err = r2
			} else {
				panic(r)
			}
			p = nil
			return
		}
	}()

then the panic is reported correctly.

@martin-sucha
Copy link
Author

Based on git history, the recover was added there to fix #59

@chavacava
Copy link
Collaborator

Based on git history, the recover was added there to fix #59

A good example on why it is never a good idea to recover from panics :)

I continue trying to find the root of the problem.
This issue golang/go#31821 suggest that this kind of problems could be related with the lack of support of modules in go/build.

The importer we use in the checker configuration (from gcexportdata.NewImporter) is based on go/build, its doc says:

Export data files are located using "go build" workspace conventions and the build.Default context.

I'll follow the recommendation of using go/packages to see if we can fix this pb
(side note: mgechev/dots also uses go/build)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants