New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic cross function analysis #57
Conversation
263e557
to
d2d4c30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure any approach which reaches across pass.Pkg
boundaries is going to be viable.
internal/pkg/cfa/analyzer.go
Outdated
FactTypes: []analysis.Fact{new(funcFact)}, | ||
} | ||
|
||
var pkgDenylist = []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Some info in addition to the comment: most of these packages are unlikely to contain functions that most code is going to need. But the big issue is that for most of them I ran into weird bugs. I think investigating these bugs is out of the scope of the current PR, and that this is an acceptable workaround in the meantime.
(I think this info does not belong in the comment, so I am putting it here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling cycles in the call graph + Some other fixes = No longer need the denylist! 🎉
internal/pkg/cfa/analyzer.go
Outdated
) | ||
|
||
// ResultType is a mapping from types.Object to cfa.Function | ||
type ResultType = Functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the purpose of this alias. What does this do that using Functions
directly doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ResultType
and Functions
serve different, but related purposes. On its own, ResultType
doesn't tell us much. But for the user of the analyzer, it is convenient to be able to get the result by using pass.ResultOf[cfa.Analyzer].(*cfa.ResultType)
. However, once the result has been obtained, Functions
is a more helpful type name.
For context, we are using this pattern in the fieldpropagators
and fieldtags
analyzers as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are no longer using that pattern anywhere else, and it doesn't buy us much, I've removed the alias.
internal/pkg/cfa/analyzer.go
Outdated
// this function is in a denylisted package; exporting a fact on another package's objects is an error | ||
// some functions do not have objects | ||
if fn.Pkg.Pkg != pass.Pkg || fn.Object() == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has no apparent relationship to the denylist.
The real problem here is that analysis pass is not meant to extend past the boarder of a given package. From the documentation:
A Pass describes a single unit of work: the application of a particular Analyzer to a particular package of Go code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the relationship with the denylist is not clear. Additional comments may be needed?
This check is here to avoid precisely what you are warning against.
When we encounter a function call, we want to know how it behaves, so we need to see if it has been analyzed in the past. If it hasn't been analyzed, we need to analyze it. At this point, if we realize that the function 1. has not been analyzed, and 2. is in a different package, then in fact we should not analyze it, and the only way that this situation can happen (with the current implementation) is that the function we are interested in was defined in a denylisted package.
Should I expand the comment to give this additional context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more denylist ⇒ this cannot happen anymore, so I've removed the check.
internal/pkg/cfa/cfa.go
Outdated
// Does this function's nth argument reach a sink? | ||
Sinks(arg int) bool | ||
|
||
// If this argument is tainted, which return values are tainted? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference arguments could also be tainted, yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The current implementation does not support that.
"example.com/core" | ||
) | ||
|
||
func OneParamSinkWrapper(a interface{}) { // want OneParamSinkWrapper:"genericFunc{ sinks: \\[0\\], taints: \\[\\[\\]\\] }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it's not how Go represents arrays, if the String()
on the fact reported something like sinks: a, b, c; taints: d, e, f
, we wouldn't have to escape so much in the regexp here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True.
The current approach sacrifices readability of the want
comment for simplicity of the implementation. While the comment would be much more readable if done in the way you describe, we would need to carry around some extra data in the struct: the names of the parameters. While it's not a big extra load to carry, ultimately it would only be used for testing and I'm not sure it's worth it. For the taints
field I actually don't see how we could use what you suggest. We need some kind of collection for each return value, and return values often do not have names.
Would genericFunc{ sinks: {0}, taints: {{}} }
be any better? 🤔
(Or maybe even genericFunc{ sinks: (0), taints: (()) }
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it turns out neither of the options I propose are any more practical, because ()
is interpreted as a regex group and {}
is interpreted as a regex quantifier.
I've changed the []
to <>
(angle brackets). WDYT?
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package cfa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The visited
should track it, but we'll want tests that cover a cycle in the callgraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added such a test (a recursive function) and to my surprise, it failed (stack overflow)! In retrospect, it is obvious why the test is failing: when we try to analyze the function, we end up on a call to itself, discover that it has not been analyzed, and start a new analysis...
Simple recursive functions should work now.
Corecursive functions still trip us up, though. I have an idea for how to deal with this, but this will likely require broader changes. Should I hold off until a future PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All kinds of cycles are handled now, though I suspect there are cases where the analysis may be incomplete or too liberal (biased towards false positives). I think ultimately handling cycles properly may require us to discover cycles in advance and perform several rounds of analysis on each cycle, e.g. in a first pass you do not traverse through calls to a function in the cycle, then you re-do the analysis using the knowledge acquired in the first pass, and you repeat until you reach a fixed point. This may not be the case though, a future PR will be needed to clear this up.
It may not be obvious, but it doesn't. It imports facts from
Good question. At the moment, no. If we reach a consensus that this is a better way forward, then I will close #50. But until then, this is just an alternative approach. |
I found a way to rewrite the function-visiting code so that it is much more straightforward and no longer requires a I don't understand why the tests are failing in the CI. Everything works fine locally. Currently investigating. |
The PR has been merged and the failures are gone. |
Closing because of staleness. This will likely be revisited in the future, but right now isn't a priority. |
This pull request implements cross-function analysis. Note that the analyzer that it introduces is not integrated with levee at the moment.
I believe that this new functionality will eventually allow us to handle all the test cases in #50, and more.
What do I mean by cross-function analysis?
Cross-function analysis is the ability to follow the flow of values through function calls. In order to do this, my implementation analyzes the bodies of functions in order to determine what they do with their arguments. When analyzing the body of a function, the effect of a call to a previously analyzed function can therefore be understood in detail. For example, if we call a function with a tainted value in its 1st argument, we can ask the function whether its 1st argument reaches a sink. If it does, then we could e.g. produce a report. We may also be interested in knowing whether calling a function with a tainted value means that any of its return values are tainted. If that is the case, then we will likely want to investigate those return values.