Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic cross function analysis #57

Closed
wants to merge 21 commits into from

Conversation

mlevesquedion
Copy link
Contributor

@mlevesquedion mlevesquedion commented Aug 21, 2020

This pull request implements cross-function analysis. Note that the analyzer that it introduces is not integrated with levee at the moment.

I believe that this new functionality will eventually allow us to handle all the test cases in #50, and more.

What do I mean by cross-function analysis?

Cross-function analysis is the ability to follow the flow of values through function calls. In order to do this, my implementation analyzes the bodies of functions in order to determine what they do with their arguments. When analyzing the body of a function, the effect of a call to a previously analyzed function can therefore be understood in detail. For example, if we call a function with a tainted value in its 1st argument, we can ask the function whether its 1st argument reaches a sink. If it does, then we could e.g. produce a report. We may also be interested in knowing whether calling a function with a tainted value means that any of its return values are tainted. If that is the case, then we will likely want to investigate those return values.

  • Tests pass
  • Appropriate changes to README are included in PR

@mlevesquedion mlevesquedion changed the title WIP: Implement cross function analysis Implement cross function analysis Aug 21, 2020
Copy link
Collaborator

@PurelyApplied PurelyApplied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure any approach which reaches across pass.Pkg boundaries is going to be viable.

FactTypes: []analysis.Fact{new(funcFact)},
}

var pkgDenylist = []string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Some info in addition to the comment: most of these packages are unlikely to contain functions that most code is going to need. But the big issue is that for most of them I ran into weird bugs. I think investigating these bugs is out of the scope of the current PR, and that this is an acceptable workaround in the meantime.

(I think this info does not belong in the comment, so I am putting it here).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling cycles in the call graph + Some other fixes = No longer need the denylist! 🎉

)

// ResultType is a mapping from types.Object to cfa.Function
type ResultType = Functions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the purpose of this alias. What does this do that using Functions directly doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ResultType and Functions serve different, but related purposes. On its own, ResultType doesn't tell us much. But for the user of the analyzer, it is convenient to be able to get the result by using pass.ResultOf[cfa.Analyzer].(*cfa.ResultType). However, once the result has been obtained, Functions is a more helpful type name.

For context, we are using this pattern in the fieldpropagators and fieldtags analyzers as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are no longer using that pattern anywhere else, and it doesn't buy us much, I've removed the alias.

Comment on lines 102 to 104
// this function is in a denylisted package; exporting a fact on another package's objects is an error
// some functions do not have objects
if fn.Pkg.Pkg != pass.Pkg || fn.Object() == nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no apparent relationship to the denylist.

The real problem here is that analysis pass is not meant to extend past the boarder of a given package. From the documentation:

A Pass describes a single unit of work: the application of a particular Analyzer to a particular package of Go code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the relationship with the denylist is not clear. Additional comments may be needed?

This check is here to avoid precisely what you are warning against.

When we encounter a function call, we want to know how it behaves, so we need to see if it has been analyzed in the past. If it hasn't been analyzed, we need to analyze it. At this point, if we realize that the function 1. has not been analyzed, and 2. is in a different package, then in fact we should not analyze it, and the only way that this situation can happen (with the current implementation) is that the function we are interested in was defined in a denylisted package.

Should I expand the comment to give this additional context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more denylist ⇒ this cannot happen anymore, so I've removed the check.

// Does this function's nth argument reach a sink?
Sinks(arg int) bool

// If this argument is tainted, which return values are tainted?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference arguments could also be tainted, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The current implementation does not support that.

"example.com/core"
)

func OneParamSinkWrapper(a interface{}) { // want OneParamSinkWrapper:"genericFunc{ sinks: \\[0\\], taints: \\[\\[\\]\\] }"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's not how Go represents arrays, if the String() on the fact reported something like sinks: a, b, c; taints: d, e, f, we wouldn't have to escape so much in the regexp here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

The current approach sacrifices readability of the want comment for simplicity of the implementation. While the comment would be much more readable if done in the way you describe, we would need to carry around some extra data in the struct: the names of the parameters. While it's not a big extra load to carry, ultimately it would only be used for testing and I'm not sure it's worth it. For the taints field I actually don't see how we could use what you suggest. We need some kind of collection for each return value, and return values often do not have names.

Would genericFunc{ sinks: {0}, taints: {{}} } be any better? 🤔

(Or maybe even genericFunc{ sinks: (0), taints: (()) }.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out neither of the options I propose are any more practical, because () is interpreted as a regex group and {} is interpreted as a regex quantifier.

I've changed the [] to <> (angle brackets). WDYT?

// See the License for the specific language governing permissions and
// limitations under the License.

package cfa
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The visited should track it, but we'll want tests that cover a cycle in the callgraph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added such a test (a recursive function) and to my surprise, it failed (stack overflow)! In retrospect, it is obvious why the test is failing: when we try to analyze the function, we end up on a call to itself, discover that it has not been analyzed, and start a new analysis...

Simple recursive functions should work now.

Corecursive functions still trip us up, though. I have an idea for how to deal with this, but this will likely require broader changes. Should I hold off until a future PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All kinds of cycles are handled now, though I suspect there are cases where the analysis may be incomplete or too liberal (biased towards false positives). I think ultimately handling cycles properly may require us to discover cycles in advance and perform several rounds of analysis on each cycle, e.g. in a first pass you do not traverse through calls to a function in the cycle, then you re-do the analysis using the knowledge acquired in the first pass, and you repeat until you reach a fixed point. This may not be the case though, a future PR will be needed to clear this up.

@PurelyApplied
Copy link
Collaborator

I believe that this new functionality will eventually allow us to handle all the test cases in #50 with ease, and more.

Is this meant to subsume #50, then? You should close that PR if that is the case.

@mlevesquedion
Copy link
Contributor Author

I'm not sure any approach which reaches across pass.Pkg boundaries is going to be viable.

It may not be obvious, but it doesn't. It imports facts from Objects defined in other Pkgs, which is ok and what facts are intended for.

Is this meant to subsume #50, then? You should close that PR if that is the case.

Good question. At the moment, no. If we reach a consensus that this is a better way forward, then I will close #50. But until then, this is just an alternative approach.

@mlevesquedion
Copy link
Contributor Author

mlevesquedion commented Aug 27, 2020

I found a way to rewrite the function-visiting code so that it is much more straightforward and no longer requires a Results method on genericFunc.

I don't understand why the tests are failing in the CI. Everything works fine locally. Currently investigating.

@mlevesquedion
Copy link
Contributor Author

mlevesquedion commented Aug 27, 2020

The failure is due to a bug in config.IsSinkFunction that was merged yesterday. The failures will go away once the fix is merged: #69.

The PR has been merged and the failures are gone.

@mlevesquedion
Copy link
Contributor Author

Closing because of staleness. This will likely be revisited in the future, but right now isn't a priority.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants