Implement basic cross function analysis #57

mlevesquedion · 2020-08-21T15:16:59Z

This pull request implements cross-function analysis. Note that the analyzer that it introduces is not integrated with levee at the moment.

I believe that this new functionality will eventually allow us to handle all the test cases in #50, and more.

What do I mean by cross-function analysis?

Cross-function analysis is the ability to follow the flow of values through function calls. In order to do this, my implementation analyzes the bodies of functions in order to determine what they do with their arguments. When analyzing the body of a function, the effect of a call to a previously analyzed function can therefore be understood in detail. For example, if we call a function with a tainted value in its 1st argument, we can ask the function whether its 1st argument reaches a sink. If it does, then we could e.g. produce a report. We may also be interested in knowing whether calling a function with a tainted value means that any of its return values are tainted. If that is the case, then we will likely want to investigate those return values.

Tests pass
Appropriate changes to README are included in PR

PurelyApplied

I'm not sure any approach which reaches across pass.Pkg boundaries is going to be viable.

PurelyApplied · 2020-08-21T16:25:23Z

internal/pkg/cfa/analyzer.go

+	FactTypes:  []analysis.Fact{new(funcFact)},
+}
+
+var pkgDenylist = []string{


A comment here, please.

Done.

Some info in addition to the comment: most of these packages are unlikely to contain functions that most code is going to need. But the big issue is that for most of them I ran into weird bugs. I think investigating these bugs is out of the scope of the current PR, and that this is an acceptable workaround in the meantime.

(I think this info does not belong in the comment, so I am putting it here).

Handling cycles in the call graph + Some other fixes = No longer need the denylist! 🎉

PurelyApplied · 2020-08-21T16:26:32Z

internal/pkg/cfa/analyzer.go

+)
+
+// ResultType is a mapping from types.Object to cfa.Function
+type ResultType = Functions


I don't see the purpose of this alias. What does this do that using Functions directly doesn't?

I think ResultType and Functions serve different, but related purposes. On its own, ResultType doesn't tell us much. But for the user of the analyzer, it is convenient to be able to get the result by using pass.ResultOf[cfa.Analyzer].(*cfa.ResultType). However, once the result has been obtained, Functions is a more helpful type name.

For context, we are using this pattern in the fieldpropagators and fieldtags analyzers as well.

Since we are no longer using that pattern anywhere else, and it doesn't buy us much, I've removed the alias.

PurelyApplied · 2020-08-21T16:44:08Z

internal/pkg/cfa/analyzer.go

+	// this function is in a denylisted package; exporting a fact on another package's objects is an error
+	// some functions do not have objects
+	if fn.Pkg.Pkg != pass.Pkg || fn.Object() == nil {


This has no apparent relationship to the denylist.

The real problem here is that analysis pass is not meant to extend past the boarder of a given package. From the documentation:

A Pass describes a single unit of work: the application of a particular Analyzer to a particular package of Go code.

I agree that the relationship with the denylist is not clear. Additional comments may be needed?

This check is here to avoid precisely what you are warning against.

When we encounter a function call, we want to know how it behaves, so we need to see if it has been analyzed in the past. If it hasn't been analyzed, we need to analyze it. At this point, if we realize that the function 1. has not been analyzed, and 2. is in a different package, then in fact we should not analyze it, and the only way that this situation can happen (with the current implementation) is that the function we are interested in was defined in a denylisted package.

Should I expand the comment to give this additional context?

No more denylist ⇒ this cannot happen anymore, so I've removed the check.

PurelyApplied · 2020-08-21T16:46:46Z

internal/pkg/cfa/cfa.go

+	// Does this function's nth argument reach a sink?
+	Sinks(arg int) bool
+
+	// If this argument is tainted, which return values are tainted?


Reference arguments could also be tainted, yes?

Good point. The current implementation does not support that.

PurelyApplied · 2020-08-21T16:50:11Z

internal/pkg/cfa/testdata/src/example.com/tests/tests.go

+	"example.com/core"
+)
+
+func OneParamSinkWrapper(a interface{}) { // want OneParamSinkWrapper:"genericFunc{ sinks: \\[0\\], taints: \\[\\[\\]\\] }"


While it's not how Go represents arrays, if the String() on the fact reported something like sinks: a, b, c; taints: d, e, f, we wouldn't have to escape so much in the regexp here.

True.

The current approach sacrifices readability of the want comment for simplicity of the implementation. While the comment would be much more readable if done in the way you describe, we would need to carry around some extra data in the struct: the names of the parameters. While it's not a big extra load to carry, ultimately it would only be used for testing and I'm not sure it's worth it. For the taints field I actually don't see how we could use what you suggest. We need some kind of collection for each return value, and return values often do not have names.

Would genericFunc{ sinks: {0}, taints: {{}} } be any better? 🤔

(Or maybe even genericFunc{ sinks: (0), taints: (()) }.)

As it turns out neither of the options I propose are any more practical, because () is interpreted as a regex group and {} is interpreted as a regex quantifier.

I've changed the [] to <> (angle brackets). WDYT?

PurelyApplied · 2020-08-21T16:51:59Z

internal/pkg/cfa/testdata/src/example.com/tests/tests.go

+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package cfa


The visited should track it, but we'll want tests that cover a cycle in the callgraph.

I have added such a test (a recursive function) and to my surprise, it failed (stack overflow)! In retrospect, it is obvious why the test is failing: when we try to analyze the function, we end up on a call to itself, discover that it has not been analyzed, and start a new analysis...

Simple recursive functions should work now.

Corecursive functions still trip us up, though. I have an idea for how to deal with this, but this will likely require broader changes. Should I hold off until a future PR?

All kinds of cycles are handled now, though I suspect there are cases where the analysis may be incomplete or too liberal (biased towards false positives). I think ultimately handling cycles properly may require us to discover cycles in advance and perform several rounds of analysis on each cycle, e.g. in a first pass you do not traverse through calls to a function in the cycle, then you re-do the analysis using the knowledge acquired in the first pass, and you repeat until you reach a fixed point. This may not be the case though, a future PR will be needed to clear this up.

PurelyApplied · 2020-08-21T16:55:45Z

I believe that this new functionality will eventually allow us to handle all the test cases in #50 with ease, and more.

Is this meant to subsume #50, then? You should close that PR if that is the case.

mlevesquedion · 2020-08-21T17:21:15Z

I'm not sure any approach which reaches across pass.Pkg boundaries is going to be viable.

It may not be obvious, but it doesn't. It imports facts from Objects defined in other Pkgs, which is ok and what facts are intended for.

Is this meant to subsume #50, then? You should close that PR if that is the case.

Good question. At the moment, no. If we reach a consensus that this is a better way forward, then I will close #50. But until then, this is just an alternative approach.

mlevesquedion · 2020-08-27T16:13:21Z

I found a way to rewrite the function-visiting code so that it is much more straightforward and no longer requires a Results method on genericFunc.

I don't understand why the tests are failing in the CI. Everything works fine locally. Currently investigating.

mlevesquedion · 2020-08-27T16:44:51Z

~~The failure is due to a bug in config.IsSinkFunction that was merged yesterday. The failures will go away once the fix is merged: #69.~~

The PR has been merged and the failures are gone.

mlevesquedion · 2020-11-19T22:37:47Z

Closing because of staleness. This will likely be revisited in the future, but right now isn't a priority.

mlevesquedion force-pushed the cfa branch 8 times, most recently from 263e557 to d2d4c30 Compare August 21, 2020 15:45

implement cfa

e8303a1

mlevesquedion force-pushed the cfa branch from d2d4c30 to e8303a1 Compare August 21, 2020 16:01

mlevesquedion changed the title ~~WIP: Implement cross function analysis~~ Implement cross function analysis Aug 21, 2020

mlevesquedion requested review from PurelyApplied and immutableT August 21, 2020 16:05

PurelyApplied requested changes Aug 21, 2020

View reviewed changes

mlevesquedion added 11 commits August 21, 2020 13:56

address review comments

910aa01

change a word for clarity

cf6e42d

remove extraneous comma

7ebcf57

use angle brackets to avoid having to escape in want regex

5ff402b

handle cycles

4bcb713

add cross package test

1eae20e

remove denylist

f956dc8

split tests for clarity

99fefd6

rename file

e7c7f6f

no longer need to guard against external pkg exports

28f18fd

add missing license header

b9abcd8

This was referenced Aug 24, 2020

Introduce function abstraction for cross-function analysis #58

Closed

Add tests for cross-function analysis #59

Merged

refactor, add tests

63b701e

mlevesquedion added 2 commits August 27, 2020 11:45

streamline func visiting code

e81239c

change variable name

ccc0c0b

Merge remote-tracking branch 'upstream/master' into cfa

2190a91

mlevesquedion added 5 commits August 27, 2020 12:59

remove unnecessary part of config

324cc16

Merge remote-tracking branch 'upstream/master' into cfa

1b0fedc

resolve conflicts

af762c6

Merge branch 'master' into cfa

efb9c60

remove extraneous type alias

4246112

mlevesquedion changed the title ~~Implement cross function analysis~~ Implement basic cross function analysis Sep 10, 2020

PurelyApplied mentioned this pull request Sep 10, 2020

Adopt some manner of cross-function labeling to identify interprocedural propagation into non-source types. #99

Open

mlevesquedion mentioned this pull request Sep 11, 2020

Adopt pointer or other package to identify passing sources as interface parameters. #98

Open

mlevesquedion marked this pull request as draft October 25, 2020 15:17

mlevesquedion closed this Nov 19, 2020

mlevesquedion mentioned this pull request Dec 29, 2020

Cleanup: Delete unused "CFA" package #244

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement basic cross function analysis #57

Implement basic cross function analysis #57

mlevesquedion commented Aug 21, 2020 •

edited

PurelyApplied left a comment

PurelyApplied Aug 21, 2020

mlevesquedion Aug 21, 2020

mlevesquedion Aug 24, 2020 •

edited

PurelyApplied Aug 21, 2020

mlevesquedion Aug 21, 2020

mlevesquedion Sep 8, 2020

PurelyApplied Aug 21, 2020

mlevesquedion Aug 21, 2020 •

edited

mlevesquedion Aug 24, 2020

PurelyApplied Aug 21, 2020

mlevesquedion Aug 21, 2020

PurelyApplied Aug 21, 2020

mlevesquedion Aug 21, 2020

mlevesquedion Aug 24, 2020

PurelyApplied Aug 21, 2020

mlevesquedion Aug 21, 2020 •

edited

mlevesquedion Aug 24, 2020

PurelyApplied commented Aug 21, 2020

mlevesquedion commented Aug 21, 2020

mlevesquedion commented Aug 27, 2020 •

edited

mlevesquedion commented Aug 27, 2020 •

edited

mlevesquedion commented Nov 19, 2020

Implement basic cross function analysis #57

Implement basic cross function analysis #57

Conversation

mlevesquedion commented Aug 21, 2020 • edited

What do I mean by cross-function analysis?

PurelyApplied left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Aug 24, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Aug 21, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlevesquedion Aug 21, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PurelyApplied commented Aug 21, 2020

mlevesquedion commented Aug 21, 2020

mlevesquedion commented Aug 27, 2020 • edited

mlevesquedion commented Aug 27, 2020 • edited

mlevesquedion commented Nov 19, 2020

mlevesquedion commented Aug 21, 2020 •

edited

mlevesquedion Aug 24, 2020 •

edited

mlevesquedion Aug 21, 2020 •

edited

mlevesquedion Aug 21, 2020 •

edited

mlevesquedion commented Aug 27, 2020 •

edited

mlevesquedion commented Aug 27, 2020 •

edited