JS: using pseudo-properties to model URL parsing #2761

erik-krogh · 2020-02-04T15:40:42Z

Now the property searchParams is modeled using a pseudo-property.

Additionally I use another pseudo-property to model that calling a getter on a URLSearchParams retrieves the parsed parameters (see hiddenUrlPseudoProperty and its uses).

This means that the previous behavior is preserved, and tracking of the properties now work interprocedurally.

E.g. with the previous local handling of searchParams we didn't track flow out of this return (found using this query).

asgerf

I found the code a bit difficult to read so for now I'll just try an relay back my understanding of the code, so we can catch any misunderstandings.

The pseudo-property represents two things lumped into a single property (which is fine):

the search parameters of a URL object
the keys and values of a Map-like object (URLSearchParams)

So a property read x.searchParams is modelled as a load-store step of the pseudo-property in order to transition from case 1 to case 2 when accessing the search parameters of a URL object.

There's then a load step out of the pseudo-property when calling get().

Assuming I got that right, I like the approach. I'd like it if you could make the code a little more accessible, though. Perhaps also add support for handle fragment data while you're at it 👍

asgerf · 2020-02-04T15:54:28Z

javascript/ql/src/semmle/javascript/dataflow/TaintTracking.qll

   */
-  predicate isUrlSearchParams(DataFlow::SourceNode params, DataFlow::Node input) {
+  private predicate isUrlSearchParams(DataFlow::SourceNode params, DataFlow::Node input) {


We should add a deprecated alias for this for backwards compatibility (even if it doesn't have the original behavior anymore, deprecation warnings are better than compilation errors).

I think I'll just remove the private modifier.
The name is still fitting for the behavior, so I don't feel like adding yet another alias.

asgerf · 2020-02-04T15:57:19Z

javascript/ql/src/semmle/javascript/dataflow/TaintTracking.qll

    override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
-      pred = source and succ = this
+      isUrlSearchParams(succ, pred)


this is not bound in here (and likewise for the other member predicates)

I see two ways of solving this.

Do the same as StringManipulationTaintStep: extend DataFlow::ValueNode and just bind succ = this.

Split out the taint-steps into multiple classes, and add a characteristic predicates for each of the new classes. (Will become even more verbose).

I went with 1) for now.

asgerf · 2020-02-04T15:59:24Z

javascript/ql/src/semmle/javascript/dataflow/TaintTracking.qll

+    /**
+     * Holds if the property `prop` should be copied from the object `pred` to the object `succ`.
+     * 
+     * This step is used to copy a value the value of our pseudo-property that can later be accessed using a `get` or `getAll` call. 


Suggested change

* This step is used to copy a value the value of our pseudo-property that can later be accessed using a `get` or `getAll` call.

* This step is used to copy the value of our pseudo-property that can later be accessed using a `get` or `getAll` call.

asgerf · 2020-02-04T16:09:47Z

javascript/ql/src/semmle/javascript/dataflow/TaintTracking.qll

+     */
+    override predicate loadStoreStep(DataFlow::Node pred, DataFlow::Node succ, string prop) {
+      prop = hiddenUrlPseudoProperty() and
+      exists(DataFlow::PropRead write | write = succ | 


Why is this PropRead in a variable called write? 😕

…different properties

erik-krogh · 2020-02-05T10:04:20Z

I found the code a bit difficult to read so for now I'll just try an relay back my understanding of the code, so we can catch any misunderstandings.

The pseudo-property represents two things lumped into a single property (which is fine):

the search parameters of a URL object

the keys and values of a Map-like object (URLSearchParams)

So a property read x.searchParams is modelled as a load-store step of the pseudo-property in order to transition from case 1 to case 2 when accessing the search parameters of a URL object.

There's then a load step out of the pseudo-property when calling get().

Assuming I got that right, I like the approach. I'd like it if you could make the code a little more accessible, though. Perhaps also add support for handle fragment data while you're at it

Your understanding is correct.

I had the two cases lumped into the same pseudo-property because a load-store step only supported loading and storing the same property name.
I changed it such that a load-store step can load one property and store another.
I'm thereby better able to distinguish the two cases in the code.

I'll look into fragment data.

erik-krogh · 2020-02-06T12:16:42Z

An evaluation was uneventful.

asgerf · 2020-02-20T13:28:16Z

Sorry for not following up on this earlier.

I like the solution, but there are a few issues that mean we're not realizing the full benefit of this change. Overall I'd like us to be able to flag this sample vuln:

function getUrl() {
    return new URL(document.location);
}
$(getUrl().hash.substring(1)); // NOT OK

There are two issues with this at the moment:

The .hash property is a sanitizer in the Xss query, which is very restrictive, but was necessary for avoiding FPs due to variations of the $(location.hash) pattern. This was one of the main motivations to do more precise tracking of URLs. Due to this sanitizer, we still don't flag this vuln:
Barriers/sanitizers currently block flow even when the tracked value is inside a property. Depending on how we resolve the above issue, this may or may not become a problem. I've discussed it with @max-schaefer and I'll experiment with changing this.

The PR would be good to land IMO, but I'd like to have some more data to verify that we're doing the right thing. I'll experiment a little bit to see what the best solution is so we can get this PR landed.

erik-krogh · 2020-03-21T16:06:45Z

Overall I'd like us to be able to flag this sample vuln:
function getUrl() {
    return new URL(document.location);
}
$(getUrl().hash.substring(1)); // NOT OK

If I merge in #2919 the above example will be flagged, and e.g. $(window.location.hash) is still not flagged.

But only if the source has flowlabel "data" (The source in the Xss query has flowlabel "taint").

Here are some examples of new flow-edges, they are not really interesting. (The results might be better once #2919 hits LGTM).

I'll fix the merge conflict after #2919 has been merged, and I might also do another evaluation at that point.

erik-krogh · 2020-03-25T08:38:46Z

I did a new evaluation.
Performance looks ok, but no new results.

asgerf

If we add this to DomBasedXss::Configuration we should be able to handle the hash example:

override predicate isAdditionalLoadStoreStep(
  DataFlow::Node pred, DataFlow::Node succ, string predProp, string succProp
) {
  exists(DataFlow::PropRead read |
    pred = read.getBase() and
    succ = read and
    read.getPropertyName() = "hash" and
    predProp = "hash" and
    succProp = "$UrlSuffix"
  )
}

override predicate isAdditionalLoadStep(DataFlow::Node pred, DataFlow::Node succ, string prop) {
  exists(DataFlow::MethodCallNode call, string name |
    name = "substr" or name = "substring" or name = "slice"
  |
    call.getMethodName() = name and
    not call.getArgument(0).getIntValue() = 0 and
    pred = call.getReceiver() and
    succ = call and
    prop = "$UrlSuffix"
  )
}

(this is why I wanted sanitizers to not block objects)

But only if the source has flowlabel "data"

This would make it ignore all sanitizers, due to #2919, so it's kind of a no-go.

I'm a little sad that we haven't been able to find any concrete results from this (not for lack of trying), but it seems reasonably safe. We should probably avoid sinking more time into this until we have some motivating (real) examples on hand.

asgerf · 2020-03-25T14:51:28Z

javascript/ql/src/Security/CWE-079/Xss.actual

@@ -0,0 +1,3 @@
+nodes
+edges
+#select


Remove .actual file

erik-krogh · 2020-03-26T14:47:19Z

If we add this to DomBasedXss::Configuration we should be able to handle the hash example:

I'll get that in there, then I'll run one last evaluation based on that.

erik-krogh · 2020-03-28T09:35:22Z

If we add this to DomBasedXss::Configuration we should be able to handle the hash example:

I'll get that in there, then I'll run one last evaluation based on that.

Here is an evaluation on many benchmarks just with Xss.ql.
Still no new results, but performance is good.

using pseudo-properties to model URL parsing

8d37c03

erik-krogh added JS Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish labels Feb 4, 2020

asgerf reviewed Feb 4, 2020

View reviewed changes

erik-krogh added 7 commits February 5, 2020 09:40

generalize isAdditionalLoadStoreStep such that it loads and stores …

e525cf0

…different properties

change the pseudo-property on URL to a two-stage process

76aca02

remove private modifer on isUrlSearchParams

35a7e15

address review feedback

ec9c370

update expected test output

ffc6fdd

update docstrings

30d5eb5

bind this in each of the step methods of UrlSearchParamsTaintStep

88bb1dc

add "hash" and "search" to URL taint step

da28d3b

erik-krogh marked this pull request as ready for review February 6, 2020 12:16

erik-krogh requested a review from a team as a code owner February 6, 2020 12:16

erik-krogh removed the Awaiting evaluation Do not merge yet, this PR is waiting for an evaluation to finish label Feb 6, 2020

erik-krogh added 2 commits March 24, 2020 00:23

Merge remote-tracking branch 'upstream/master' into UrlSearch

fa710c5

autoformat

1d8e103

asgerf reviewed Mar 26, 2020

View reviewed changes

delete Xss.actual

a850616

erik-krogh added 2 commits March 26, 2020 15:47

add urlSuffix support to DomBasedXSS

6b507c6

autoformat

d3e1a25

erik-krogh force-pushed the UrlSearch branch from 3d0030a to d3e1a25 Compare March 27, 2020 08:35

erik-krogh added 2 commits March 27, 2020 10:02

add test case for XSS on url suffix

58af63d

autoformat

0ebbd80

Merge branch 'master' of git.semmle.com:Semmle/ql into UrlSearch

4864e77

asgerf approved these changes Mar 31, 2020

View reviewed changes

semmle-qlci merged commit 0feb7f8 into github:master Mar 31, 2020

	* This step is used to copy a value the value of our pseudo-property that can later be accessed using a `get` or `getAll` call.
	* This step is used to copy the value of our pseudo-property that can later be accessed using a `get` or `getAll` call.

@@ @@ -0,0 +1,3 @@ @@
+              nodes
+              edges
+              #select

JS: using pseudo-properties to model URL parsing #2761

JS: using pseudo-properties to model URL parsing #2761

Uh oh!

Conversation

erik-krogh commented Feb 4, 2020

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 4, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh Feb 5, 2020

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 4, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh Feb 5, 2020

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 4, 2020

Choose a reason for hiding this comment

Uh oh!

asgerf Feb 4, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh commented Feb 5, 2020

Uh oh!

erik-krogh commented Feb 6, 2020

Uh oh!

asgerf commented Feb 20, 2020

Uh oh!

erik-krogh commented Mar 21, 2020

Uh oh!

erik-krogh commented Mar 25, 2020

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

asgerf Mar 25, 2020

Choose a reason for hiding this comment

Uh oh!

erik-krogh commented Mar 26, 2020

Uh oh!

erik-krogh commented Mar 28, 2020

Uh oh!

Uh oh!