Skip to content

Conversation

erik-krogh
Copy link
Contributor

@erik-krogh erik-krogh commented Mar 9, 2020

I moved both the taint-steps and the new data-flow-steps for arrays into an Arrays.qll file.

Gets us a TP for CVE-2018-3726.

A slightly outdated evaluation shows reasonable performance.

The new result from the evaluation is a TP.

I tried to see how our analysis worked if I removed the taint-steps and only relied on the new data-flow steps.
The result was that we missed a lot of TPs due to taint-sources being arrays, and the data-flow steps need a write/read pair.

Here are examples of the new data-flow edges that are added.

@erik-krogh erik-krogh added JS Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish labels Mar 9, 2020
@erik-krogh erik-krogh requested a review from a team as a code owner March 9, 2020 09:36
@erik-krogh
Copy link
Contributor Author

I'm fixing up the tests (also found a bug from that).
And I'll look at using DynamicPropertyAccess.qll for the array reads.

Copy link
Contributor

@asgerf asgerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments after a cursory read through - I'll give it a more thorough look later this week

@@ -598,18 +598,23 @@ class ArrayConstructorInvokeNode extends DataFlow::InvokeNode {
* new Array('apple', 'orange')
* Array(16)
* new Array(16)
* Array.from(1,2,3);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how Array.from works. Was this addition needed for something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it was not, that was a mistake from reading the taint steps on arrays a little too quickly.

@@ -52,7 +52,7 @@ private DataFlow::SourceNode argumentList(SystemCommandExecution sys, DataFlow::
result = pred.backtrack(t2, t)
or
t = t2.continue() and
TaintTracking::arrayFunctionTaintStep(result, pred, _)
ArrayTaintTracking::arrayFunctionTaintStep(result, pred, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think the API would be nicer if we expose all our taint steps as TaintTracking::xxxStep (we could be better at doing this, but let's not regress from what we already have).

If moving the code into Arrays.qll is mainly for internal organization, it should not be reflected in the public API.


/**
* Holds if `pred` should be stored in the object `succ` under the property `prop`.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override predicates inherit the qldoc and should rarely have a qldoc comment of their own.

@erik-krogh
Copy link
Contributor Author

It seems like EnumeratedPropName from DynamicPropertyAccess.qll doesn't match how I would need to use it in this PR, unless EnumeratedPropName is refactored.

In an ordinary for(var i = 0; i < arr.length; i++){...} loop there is no EnumeratedPropName, as the i is an index and not a property name.
If we expand the EnumeratedPropName to include those kinds of loops, then it could work for this case, but otherwise I don't think it will.

Also, I think I would need some more refactor, getting more focus on the enumerated object and the accesses on the object, rather than the property-name.
Because there is not always an EnumeratedPropName to refer to. E.g. in the below example there is no EnumeratedPropName, but there is both a source-object and a source-property, which is what I'm interested in.

lodash.forEach(arr, (e) => sink(e))

@asgerf
Copy link
Contributor

asgerf commented Mar 9, 2020

Sorry, I was thinking about the getAnEnumeratedArrayElement predicate in the same file, and whether it would make sense to share that code.

@erik-krogh
Copy link
Contributor Author

erik-krogh commented Mar 9, 2020

Sorry, I was thinking about the getAnEnumeratedArrayElement predicate in the same file, and whether it would make sense to share that code.

Yes it would, but I would have preferred to support a EnumeratedPropName refactorization, as it would give me support for libraries like lodash.forEach, Object.keys(..) and the others EnumeratedPropName implementations basically for free.

I'll use the getAnEnumeratedArrayElement, and maybe get back to the refactor at a later time.

@erik-krogh
Copy link
Contributor Author

I'll use the getAnEnumeratedArrayElement

And that destroyed performance in bwip-js, I've reverted back to my previous approach where I search for an index declared as part of a for loop.

@erik-krogh
Copy link
Contributor Author

erik-krogh commented Mar 11, 2020

A quick evaluation looks good.

@erik-krogh
Copy link
Contributor Author

A more detailed evaluation shows a less clear picture.

Specifically the sqlteaching benchmark is bad.
It looks to be the js/insecure-randomness query in particular that blows up. The isAdditionalTaintStep in that query blows up in combination with the new steps added by this PR.
I'm looking into it.

@asgerf
Copy link
Contributor

asgerf commented Mar 18, 2020

Since #3070 landed I guess the performance should be good now? Are you running a final evaluation?

@erik-krogh
Copy link
Contributor Author

Since #3070 landed I guess the performance should be good now? Are you running a final evaluation?

Yes, it should be ready tonight.
But the preliminary results doesn't look all that positive.

@erik-krogh
Copy link
Contributor Author

Since #3070 landed I guess the performance should be good now? Are you running a final evaluation?

This is the new evaluation.

I think the sqlteaching teaching benchmark can be ignored, the performance is way worse, but there is a good reason for that (the exploratory flow being about 4 orders of magnitude bigger).

But even if we ignore sqlteaching, it doesn't look all that good.

@asgerf
Copy link
Contributor

asgerf commented Mar 19, 2020

I have a branch that makes exploratory flow more precise. I've confirmed that merging it with your PR eliminates the overhead you observed in sqlteaching (it becomes the same as current master).

As for the rest of the evaluation, there's obviously a biased noise in the wall clock timings, but it could be hiding a genuine slowdown. Could you pick one of the slowest ones from that report other than sqlteaching and test it in isolation to see the actual overhead?

@erik-krogh
Copy link
Contributor Author

erik-krogh commented Mar 21, 2020

As for the rest of the evaluation, there's obviously a biased noise in the wall clock timings, but it could be hiding a genuine slowdown. Could you pick one of the slowest ones from that report other than sqlteaching and test it in isolation to see the actual overhead?

Here is an evaluation of just one of the slowest ones: https://git.semmle.com/erik/dist-compare-reports/tree/profiling-js-esben.northeurope.cloudapp.azure.com_1584809774080

It looks like the wallclock was biased in the previous evaluation.

@erik-krogh erik-krogh removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Mar 23, 2020
@semmle-qlci semmle-qlci merged commit cf5b1f0 into github:master Mar 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants