JS: Add UntrustedDataToExternalAPI query #4694

asgerf · 2020-11-19T14:13:41Z

Adds a JS version of the query @lcartey added in #3938. The public-facing aspects of the query (naming, API, etc) are based on that query.

We use API graphs to find calls to functions that came from an external API. This isn't a hard guarantee that the call is in fact external, and indeeed we have to filter out common built-ins method names like substring and indexOf as they generate too much noise.

Since JS doesn't have nice canonical names for everything, a large part of the problem is coming up with readable names for things. Since the query can have thousands of results I opted for readability over raw precision. We must be able to scan over the sink names to find the interesting sinks in a reasonable amount of time.

I found it was important to place the sink very close to the external call mentioned in the name. An earlier formulation of the query used API graphs to generate sinks based on a property RHS of an object escaping into an external library, but the sink could be arbitrarily far way from the external call and it was often too hard to see what was happening based on that sink location.

To avoid losing that flow, however, we use a flow label to track arbitrarily deep into an object and then report sinks where the value contains a user controlled value.

Mainly to verify that the query doesn't blow up, I ran an evaluation compared to an earlier commit with this query rebased on top. (The query has changed a bit since that run; I might want to run another evaluation.)

max-schaefer

A few comments from a superficial review. Impressively sophisticated; it's a shame you had to fight the API-graphs library in so many places, I hope we can find the time to go back and improve its interface to make this sort of thing easier in future.

...l/src/semmle/javascript/security/dataflow/ExternalAPIUsedWithUntrustedDataCustomizations.qll

max-schaefer · 2020-11-19T15:13:24Z

...l/src/semmle/javascript/security/dataflow/ExternalAPIUsedWithUntrustedDataCustomizations.qll

+      // getParameter(i) requires a bindingset for i, so use the raw label
+      param = base.getASuccessor("parameter " + lbl) and
+      lbl != "-1" // ignore receiver


Is this the same as param = base.getAParameter() and not param = base.getReceiver(), which you've also used above? If so, should we have a predicate for that?

Changed to param = base.getAParameter() and not param = base.getReceiver() for now. Looks like I needed the parameter index at some earlier version of the predicate and forgot to clean it up.

It might make sense to look at the API for this. Generally I'd be in favor of being consistent with the data-flow API here (where the receiver isn't considered to be an argument or parameter) and just introduce the verbose getParameterOrReceiver(i) and getAParameterOrReceiver for the cases where you want both. We could introduce similar predicates in the data-flow API.

That makes sense.

Maybe in another PR. The refactoring would probably warrant more evaluations than I'd care to spend on this query.

mchammer01 · 2020-11-19T16:17:22Z

Thanks for the ping. I'll review this tomorrow (had a bit of a disastrous day today and I am really behind). Hope it's ok!

Co-authored-by: Max Schaefer <54907921+max-schaefer@users.noreply.github.com>

asgerf · 2020-11-20T10:33:39Z

@mchammer01 no worries. It was GH who pinged you due to code ownership of the qhelp file. I should mention that the qhelp is nearly identical to the Java version from #3938.

mchammer01 · 2020-11-20T10:37:53Z

Oh I didn't know that (feel free to ignore some of my comments then) - just in the middle of reviewing this for you 😉
Waiting for the preview so I can see how it renders.

mchammer01

@asgerf - this LGTM ✨
As mentioned to you, feel free to ignore some of my comments if the content within the qhelp file is nearly identical to that of the equivalent query for Java, which is already live.
Would be good if you could reword the query description (ql file) as it reads a bit oddly.

Where do we stand with regards to release notes? I know we've changed the way we generate these so I am not sure when these get written and by whom (and whether I need to review anything).

mchammer01 · 2020-11-20T10:16:39Z

javascript/ql/src/Security/CWE-020/ExternalAPIsUsedWithUntrustedData.ql

@@ -0,0 +1,17 @@
+/**
+ * @name Frequency counts for external APIs that are used with untrusted data
+ * @description This reports the external APIs that are used with untrusted data, along with how


This description doesn't follow the format specified in https://github.com/github/codeql/blob/master/docs/query-metadata-style-guide.md#query-descriptions-description.
Alternatively, what do you think of something like "Use this query to return the external APIs...."?

mchammer01 · 2020-11-20T10:22:10Z