JS: Add data-flow steps for arrays using a pseudo-property #3019

erik-krogh · 2020-03-09T09:36:50Z

I moved both the taint-steps and the new data-flow-steps for arrays into an Arrays.qll file.

Gets us a TP for CVE-2018-3726.

A slightly outdated evaluation shows reasonable performance.

The new result from the evaluation is a TP.

I tried to see how our analysis worked if I removed the taint-steps and only relied on the new data-flow steps.
The result was that we missed a lot of TPs due to taint-sources being arrays, and the data-flow steps need a write/read pair.

Here are examples of the new data-flow edges that are added.

erik-krogh · 2020-03-09T14:49:27Z

I'm fixing up the tests (also found a bug from that).
And I'll look at using DynamicPropertyAccess.qll for the array reads.

asgerf

Just a few comments after a cursory read through - I'll give it a more thorough look later this week

asgerf · 2020-03-09T14:55:49Z

javascript/ql/src/semmle/javascript/dataflow/Nodes.qll

@@ -598,18 +598,23 @@ class ArrayConstructorInvokeNode extends DataFlow::InvokeNode {
 * new Array('apple', 'orange')
 * Array(16)
 * new Array(16)
+ * Array.from(1,2,3);


That's not how Array.from works. Was this addition needed for something?

No it was not, that was a mistake from reading the taint steps on arrays a little too quickly.

asgerf · 2020-03-09T14:59:38Z

javascript/ql/src/semmle/javascript/security/dataflow/IndirectCommandArgument.qll

@@ -52,7 +52,7 @@ private DataFlow::SourceNode argumentList(SystemCommandExecution sys, DataFlow::
    result = pred.backtrack(t2, t)
    or
    t = t2.continue() and
-    TaintTracking::arrayFunctionTaintStep(result, pred, _)
+    ArrayTaintTracking::arrayFunctionTaintStep(result, pred, _)


Hm, I think the API would be nicer if we expose all our taint steps as TaintTracking::xxxStep (we could be better at doing this, but let's not regress from what we already have).

If moving the code into Arrays.qll is mainly for internal organization, it should not be reflected in the public API.

asgerf · 2020-03-09T15:09:31Z

javascript/ql/src/semmle/javascript/Arrays.qll

+
+    /**
+     * Holds if `pred` should be stored in the object `succ` under the property `prop`.
+     */


override predicates inherit the qldoc and should rarely have a qldoc comment of their own.

…behavior

erik-krogh · 2020-03-09T16:50:11Z

It seems like EnumeratedPropName from DynamicPropertyAccess.qll doesn't match how I would need to use it in this PR, unless EnumeratedPropName is refactored.

In an ordinary for(var i = 0; i < arr.length; i++){...} loop there is no EnumeratedPropName, as the i is an index and not a property name.
If we expand the EnumeratedPropName to include those kinds of loops, then it could work for this case, but otherwise I don't think it will.

Also, I think I would need some more refactor, getting more focus on the enumerated object and the accesses on the object, rather than the property-name.
Because there is not always an EnumeratedPropName to refer to. E.g. in the below example there is no EnumeratedPropName, but there is both a source-object and a source-property, which is what I'm interested in.

lodash.forEach(arr, (e) => sink(e))

asgerf · 2020-03-09T17:04:13Z

Sorry, I was thinking about the getAnEnumeratedArrayElement predicate in the same file, and whether it would make sense to share that code.

erik-krogh · 2020-03-09T18:43:18Z

Sorry, I was thinking about the getAnEnumeratedArrayElement predicate in the same file, and whether it would make sense to share that code.

Yes it would, but I would have preferred to support a EnumeratedPropName refactorization, as it would give me support for libraries like lodash.forEach, Object.keys(..) and the others EnumeratedPropName implementations basically for free.

I'll use the getAnEnumeratedArrayElement, and maybe get back to the refactor at a later time.

erik-krogh · 2020-03-10T13:50:18Z

I'll use the getAnEnumeratedArrayElement

And that destroyed performance in bwip-js, I've reverted back to my previous approach where I search for an index declared as part of a for loop.

erik-krogh · 2020-03-11T08:38:36Z

A quick evaluation looks good.

erik-krogh · 2020-03-12T20:05:28Z

A more detailed evaluation shows a less clear picture.

Specifically the sqlteaching benchmark is bad.
It looks to be the js/insecure-randomness query in particular that blows up. The isAdditionalTaintStep in that query blows up in combination with the new steps added by this PR.
I'm looking into it.

asgerf · 2020-03-18T11:33:42Z

Since #3070 landed I guess the performance should be good now? Are you running a final evaluation?

erik-krogh · 2020-03-18T12:14:32Z

Since #3070 landed I guess the performance should be good now? Are you running a final evaluation?

Yes, it should be ready tonight.
But the preliminary results doesn't look all that positive.

erik-krogh · 2020-03-18T20:37:29Z

Since #3070 landed I guess the performance should be good now? Are you running a final evaluation?

This is the new evaluation.

I think the sqlteaching teaching benchmark can be ignored, the performance is way worse, but there is a good reason for that (the exploratory flow being about 4 orders of magnitude bigger).

But even if we ignore sqlteaching, it doesn't look all that good.

asgerf · 2020-03-19T01:39:29Z

I have a branch that makes exploratory flow more precise. I've confirmed that merging it with your PR eliminates the overhead you observed in sqlteaching (it becomes the same as current master).

As for the rest of the evaluation, there's obviously a biased noise in the wall clock timings, but it could be hiding a genuine slowdown. Could you pick one of the slowest ones from that report other than sqlteaching and test it in isolation to see the actual overhead?

erik-krogh · 2020-03-21T21:24:09Z

As for the rest of the evaluation, there's obviously a biased noise in the wall clock timings, but it could be hiding a genuine slowdown. Could you pick one of the slowest ones from that report other than sqlteaching and test it in isolation to see the actual overhead?

Here is an evaluation of just one of the slowest ones: https://git.semmle.com/erik/dist-compare-reports/tree/profiling-js-esben.northeurope.cloudapp.azure.com_1584809774080

It looks like the wallclock was biased in the previous evaluation.

erik-krogh added 4 commits March 9, 2020 09:20

move existing array taint stracking into Arrays.qll

14740d4

add test for data-flow on arrays

8e3cf5c

add data-flow steps for arrays

dc4e361

move Array.from to ArrayCreationNode

0f0187d

erik-krogh added JS Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish labels Mar 9, 2020

erik-krogh requested a review from a team as a code owner March 9, 2020 09:36

asgerf reviewed Mar 9, 2020

View reviewed changes

erik-krogh added 5 commits March 9, 2020 16:45

two bugfixes

b4b0569

update expected output

68ffd52

revert Array.from change

a476fc5

remove redundant qldoc, and change parameter names to better reflect …

5099416

…behavior

expose arrayFunctionTaintStep in TaintTracking.qll

981eef2

erik-krogh added 2 commits March 9, 2020 19:47

add test case for tuple-like use

ad52d64

autoformat and update expected output

62ae484

erik-krogh force-pushed the ArrayStep branch from 0753a6e to 62ae484 Compare March 10, 2020 13:49

update expected output

fa26ce9

autoformat

91bc124

erik-krogh force-pushed the ArrayStep branch from 1ccdaee to 91bc124 Compare March 12, 2020 09:45

erik-krogh mentioned this pull request Mar 16, 2020

JS: add isRelevant(succ) to flowStep predicate #3070

Merged

erik-krogh removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Mar 23, 2020

asgerf approved these changes Mar 25, 2020

View reviewed changes

semmle-qlci merged commit cf5b1f0 into github:master Mar 25, 2020

erik-krogh mentioned this pull request Mar 25, 2020

JS: add more isRelevant() calls #3095

Merged

JS: Add data-flow steps for arrays using a pseudo-property #3019

JS: Add data-flow steps for arrays using a pseudo-property #3019

Uh oh!

Conversation

erik-krogh commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erik-krogh commented Mar 9, 2020

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

asgerf Mar 9, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh Mar 9, 2020

Choose a reason for hiding this comment

Uh oh!

asgerf Mar 9, 2020

Choose a reason for hiding this comment

Uh oh!

asgerf Mar 9, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh commented Mar 9, 2020

Uh oh!

asgerf commented Mar 9, 2020

Uh oh!

erik-krogh commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erik-krogh commented Mar 10, 2020

Uh oh!

erik-krogh commented Mar 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

erik-krogh commented Mar 12, 2020

Uh oh!

asgerf commented Mar 18, 2020

Uh oh!

erik-krogh commented Mar 18, 2020

Uh oh!

erik-krogh commented Mar 18, 2020

Uh oh!

asgerf commented Mar 19, 2020

Uh oh!

erik-krogh commented Mar 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

erik-krogh commented Mar 9, 2020 •

edited

Loading

erik-krogh commented Mar 9, 2020 •

edited

Loading

erik-krogh commented Mar 11, 2020 •

edited

Loading

erik-krogh commented Mar 21, 2020 •

edited

Loading