Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report suspicious paths through call-graph #31

Closed
kalexmills opened this issue Nov 29, 2020 · 9 comments
Closed

report suspicious paths through call-graph #31

kalexmills opened this issue Nov 29, 2020 · 9 comments
Labels
enhancement New feature or request vet-bot the issue is part of VetBot's responsibility
Milestone

Comments

@kalexmills
Copy link
Contributor

kalexmills commented Nov 29, 2020

There's no reason we cannot report the suspicious call-graph paths the tool is using to make its assessment(s). This would require storing those traces at the BFS.

As a first step, I intend to get a minimal amount of callgraph information in place -- just reporting what currently exists on the stack of the callgraph whenever the BFS hits a triggering node.

I'll link source information in as a second step.

@kalexmills kalexmills added enhancement New feature or request vet-bot the issue is part of VetBot's responsibility labels Nov 29, 2020
@aclements
Copy link

I think this would be really helpful. I was just looking at github-vet/rangeloop-pointer-findings#4957, and it's quite difficult to trace down the code that supposedly starts a goroutine. I'd looked at another one earlier that was similar and I eventually gave up.

Could the tool report if there's only a single path or multiple paths to avoid false assurance? Or maybe in the case of a function starting a goroutine, it could just point to where the go statement is. In general, if there are multiple reasons why a range capture is suspect, does it report all of them?

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 21, 2020

I think this would be really helpful. I was just looking at github-vet/rangeloop-pointer-findings#4957, and it's quite difficult to trace down the code that supposedly starts a goroutine. I'd looked at another one earlier that was similar and I eventually gave up.

I completely agree.

In general, if there are multiple reasons why a range capture is suspect, does it report all of them?

Currently, it stops at the first issue found. There's no reason this is required, though. I'm going to open a separate ticket for it (EDIT: see #56).

Could the tool report if there's only a single path or multiple paths to avoid false assurance? Or maybe in the case of a function starting a goroutine, it could just point to where the go statement is.

This would be a little tricky because we're only relying on syntactic information in the approximate call graph. It is totally doable, provided the community adjusts their expectations slightly. There are at least two ways to do this.

Option 1

Just report a bad path taken through the call graph. The call graph only stores a function's name and arity (i.e. {foo 1} for a function of one argument named 'foo'), so your output will look a bit like this.

 {foo 2} -> {doSomething 4} -> {bar 1} -> {startAGoRoutine 1}

This could be implemented as a first step, but it's not very user-friendly, and it doesn't do enough to report where in the source the user should be looking.

Option 2

Enrich the call graph with the source code positions of the function declarations (if they are not already there). Report each 'path' through the callgraph as a DAG by mapping each node from the above onto its list of possible source positions.

Suppose there are multiple bar functions with arity 1 defined in the source code. Your output with Option 2 might look something like this.

 foo (a.go:32)  -> doSomething (util.go:612)  -> [ bar (bar.go:232),         ->   startAGoRoutine (widget.go:23)
                                                   bar (other-bar.go:124) ] 

There are two possible places where the bar call could be coming from. Right now VetBot has no way to know which path is taken, but it can tell you where to look.

The output would be slightly more Byzantine (the above certainly not the best presentation) but at least it makes the tool explain its reasoning, and would function as a vital sanity check.

Committing to a DAG representation would also allow us to report the presence of multiple possible paths through the callgraph that may lead to a Goroutine.

On the other hand, it is quite possible that in large, highly-coupled codebases the output would provide us with large DAGs to sift through. At the very least it would tell us which issues are going to take much more time to analyze.

In the case of these larger DAGs I don't think there is a way around the potential complexity without using a more accurate callgraph (the best way would be to use type information but I've found that challenging for reasons we have already discussed).

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 21, 2020

@aclements one other thing to be aware of is that the issue that you're looking at may be a false-positive (we accept false-positives, it is part of why we're using crowdsourcing).

Any of the calls into third-party packages that VetBot finds along the way through the callgraph are flagged as suspicious, both as 'starting a goroutine' and 'possibly writing a variable'.

As I am writing this, I've realized that it's incredibly misleading to label these as 'function starts a goroutine'. It would be much, much better to have VetBot report that the function has called into a third-party library whose source VetBot couldn't view.

EDIT: I've opened #57 to address this.

EDIT#2: After looking at the nogofunc analyzer again, I'm no longer certain the above is necessarily the case. I don't think what you're seeing in github-vet/rangeloop-pointer-findings#4957 is a false-positive. If everything is right with the world, it should be the case that there's a path from the rangeloop to something that calls a goroutine. I'll work on getting VetBot to explain itself so folks don't have to search through potentially huge callgraphs on their own.

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 22, 2020

The hotfix I'm deploying tonight may be... er... less than helpful. It's outputting a lot of basic data in some cases which I will need to present differently in a later change.

I still want to get it running so it can surface other bugs (if they exist).

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 23, 2020

@aclements you may well have found a bug. So thank you for your help in testing! Also, sorry. I hope you did not waste too much time.

In adding this hotfix, I've included a sanity check where we abort and log in case we can't find a path back through the callgraph to a function we know starts a goroutine.

I am seeing it triggering in the logs. It's entirely possible that the issue you picked up was a false-positive due to a defect (see #63).

EDIT: you can see an example issue here with the (nearly useless) paths found through the callgraph being logged below. There is more that can (and should) be done to clean up the output, but hopefully it will be useful to further verify the tool's correctness.

EDIT 2: We might be learning how badly the approximate callgraph approximates the actual callgraph in some codebases.

EDIT 3: OTOH, the feature seems really helpful for this issue here.

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 23, 2020

Issues seen in this repo after #6556 (i.e. #6557 and later) should now contain evidence of the path taken through the callgraph to find the goroutine.

Doing the same thing for the 'function may a store reference to a pointer' type findings could be a bit more involved.

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 23, 2020

🤔 integrating graphviz may not be a completely terrible idea.

EDIT: #72

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 25, 2020

Even once this lands, the output can still be very confusing, which points to the necessity of #64 (and perhaps type-checking, at least for these instances).

Just look what happens when the kubernetes repository finds a pointer passed to the generically-named Add function.
github-vet/rangeloop-pointer-findings#8056

The resulting callgraph is monstrous -- even viewed as a graph! (EDIT: see #93 for one reason why; it's not just the size of the repo 😅).

My next plan is to get a table from signatures to links to function definitions found in the source, to assist in the hunt. I don't think it will help in cases like this, but could still be very helpful.

@kalexmills
Copy link
Contributor Author

kalexmills commented Dec 26, 2020

The basic version of this should be done and merged. If we decide later that something like Option 2 is needed, we can use a separate ticket (see #111). So far I haven't needed it; but I have only checked a few.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vet-bot the issue is part of VetBot's responsibility
Projects
None yet
Development

No branches or pull requests

2 participants