Skip to content

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented May 6, 2019

Finally, the PR that brings C# in line with Java and C/C++ on the data flow implementation. The PR is structured into several commits, which are best reviewed one-by-one:

  • 26debb8: A simple toString() change.
  • a6fa6df: Copies the shared data flow files from Java/C++. All changes required for C# were done on Generalize data-flow library in preparation for C# adoption #1226, so DataFlowImpl(Common).qll need not be reviewed.
  • c6a471e: This is the commit where all the work is done. Highlights:
    • General refactoring to fit with the shared data flow implementation.
    • Move CFG splitting logic into ControlFlowReachability.qll.
    • Replace isAdditionalFlowStepIntoCall() with TaintedParameterNode.
    • Redefine ReturnNode to be the actual values that are returned, which should yield better path information.
  • 7b7a1ec: A simple file move.

Here are some performance numbers (full report here):

Project              23faa942ced@639d715d0  23faa942ced@949b3601d
pythonnet            101                    115                    1.13861
OneOf                773                    908                    1.17464
corefx               2195                   2621                   1.19408
nhibernate-core      2841                   3451                   1.21471
roslyn               2653                   3282                   1.23709
monodevelop          3487                   4388                   1.25839
PowerShell           838                    1094                   1.30549
OrchardCore          1000                   1307                   1.307
EntityFrameworkCore  2023                   2745                   1.3569
coreclr              287                    393                    1.36934
ql                   285                    417                    1.46316
runuo                670                    1040                   1.55224
mono                 1893                   3584                   1.89329

I already have a few performance improvements ready for the shared implementation, which should bring performance back at par, but I would rather do those in a follow-up PR.

hvitved added 4 commits May 6, 2019 14:54
- General refactoring to fit with the shared data flow implementation.
- Move CFG splitting logic into `ControlFlowReachability.qll`.
- Replace `isAdditionalFlowStepIntoCall()` with `TaintedParameterNode`.
- Redefine `ReturnNode` to be the actual values that are returned, which should
  yield better path information.
- No longer consider overrides in CIL calls.
@hvitved hvitved added the C# label May 6, 2019
@hvitved hvitved requested a review from calumgrant May 6, 2019 13:35
@hvitved hvitved requested a review from a team as a code owner May 6, 2019 13:35
@hvitved
Copy link
Contributor Author

hvitved commented May 9, 2019

I also ran a performance test on a huge customer snapshot; here performance went from 17695s to 84240s, but I have verified that my upcoming improvements roughly restores performance (19518s).

Copy link
Contributor

@calumgrant calumgrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great stuff! You have done a lot of work.

@@ -32,6 +33,8 @@ class CallContext extends TCallContext {
/** An empty call context. */
class EmptyCallContext extends CallContext, TEmptyCallContext {
override string toString() { result = "<empty>" }

override Location getLocation() { result instanceof EmptyLocation }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override EmptyLocation getLocation() { any() }
?

private import DataFlowPrivate
private import DataFlowPublic

private ControlFlowElement getAScope(boolean exactScope) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like there could be a class ControlFlowScope here instead

}

cached
DotNet::Callable getEnclosingCallable(Node node) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use classes here? There seems to be a lot of synthetic dispatch here that would be easier to maintain using QL classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is simply to make sure that the cached predicates are evaluated in the same stage (in the cached module DataFlowPrivateCached).

@@ -378,7 +379,9 @@ class ArrayCreation extends Expr, @array_creation_expr {
class AnonymousFunctionExpr extends Expr, Callable, @anonymous_function_expr {
override string getName() { result = "<anonymous>" }

override Type getReturnType() { result = getType().(DelegateType).getReturnType() }
override Type getReturnType() {
result = getType().(SystemLinqExpressions::DelegateExtType).getDelegateType().getReturnType()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.

@@ -1,21 +1,13 @@
edges
| ExposureInTransmittedData.cs:16:32:16:39 | access to local variable password | ExposureInTransmittedData.cs:16:32:16:39 | access to local variable password |
| ExposureInTransmittedData.cs:20:32:20:44 | call to method ToString | ExposureInTransmittedData.cs:20:32:20:44 | call to method ToString |
| ExposureInTransmittedData.cs:24:32:24:41 | access to property Message | ExposureInTransmittedData.cs:24:32:24:41 | access to property Message |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does all of this self-flow come from? It does not seem to be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the shared implementation represents paths of length 0 as paths of length 1 by adding a synthetic sink node. This is also the reason why the nodes predicate is no longer needed. (ping @aschackmull)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that these are therefore only additions to the edges query predicate, not the select predicate.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Although we might need to add a nodes predicate in order to save the PathNode.toString as labels, but that's a separate concern.

#select
| HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | The hard-coded value "myPa55word" flows to $@ which is compared against $@. | HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | "myPa55word" | HardcodedCredentials.cs:17:13:17:20 | access to local variable password | access to local variable password |
| HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | The hard-coded value "myPa55word" flows to $@ which is compared against $@. | HardcodedCredentials.cs:17:25:17:36 | "myPa55word" | "myPa55word" | HardcodedCredentials.cs:17:13:17:20 | access to local variable password | access to local variable password |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some unnecessary duplication of the results. I think these duplicated results must be removed (also in other queries).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is actually because of a bug in the query; it uses hasFlow, but should have used hasFlowPath.

@@ -46,25 +45,22 @@ abstract class ArgumentCallContext extends CallContext {
* Holds if this call context represents the argument at position `i` of the
* call expression `call`.
*/
abstract predicate isArgument(DotNet::Expr call, int i);
abstract predicate isArgument(Expr call, int i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we don't bother with CIL::Expr because we don't analyse virtual dispatch in CIL anyway. So it makes sense to remove these from the call context for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is because these call contexts are now only used when resolving delegate targets in DelegateDataFlow.qll, and there flow is restricted to C# code. The shared DataFlowImplCommon.qll has its own notion of a call context.

@calumgrant calumgrant merged commit 6a198ff into github:master May 15, 2019
@hvitved hvitved deleted the csharp/dataflow/shared-implementation branch May 16, 2019 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants