Skip to content

Conversation

asgerf
Copy link
Contributor

@asgerf asgerf commented Nov 19, 2020

Adds a JS version of the query @lcartey added in #3938. The public-facing aspects of the query (naming, API, etc) are based on that query.

We use API graphs to find calls to functions that came from an external API. This isn't a hard guarantee that the call is in fact external, and indeeed we have to filter out common built-ins method names like substring and indexOf as they generate too much noise.

Since JS doesn't have nice canonical names for everything, a large part of the problem is coming up with readable names for things. Since the query can have thousands of results I opted for readability over raw precision. We must be able to scan over the sink names to find the interesting sinks in a reasonable amount of time.

I found it was important to place the sink very close to the external call mentioned in the name. An earlier formulation of the query used API graphs to generate sinks based on a property RHS of an object escaping into an external library, but the sink could be arbitrarily far way from the external call and it was often too hard to see what was happening based on that sink location.

To avoid losing that flow, however, we use a flow label to track arbitrarily deep into an object and then report sinks where the value contains a user controlled value.

Mainly to verify that the query doesn't blow up, I ran an evaluation compared to an earlier commit with this query rebased on top. (The query has changed a bit since that run; I might want to run another evaluation.)

@asgerf asgerf added the JS label Nov 19, 2020
@asgerf asgerf requested review from mchammer01 and a team as code owners November 19, 2020 14:13
Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments from a superficial review. Impressively sophisticated; it's a shame you had to fight the API-graphs library in so many places, I hope we can find the time to go back and improve its interface to make this sort of thing easier in future.

Comment on lines 162 to 164
// getParameter(i) requires a bindingset for i, so use the raw label
param = base.getASuccessor("parameter " + lbl) and
lbl != "-1" // ignore receiver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as param = base.getAParameter() and not param = base.getReceiver(), which you've also used above? If so, should we have a predicate for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to param = base.getAParameter() and not param = base.getReceiver() for now. Looks like I needed the parameter index at some earlier version of the predicate and forgot to clean it up.

It might make sense to look at the API for this. Generally I'd be in favor of being consistent with the data-flow API here (where the receiver isn't considered to be an argument or parameter) and just introduce the verbose getParameterOrReceiver(i) and getAParameterOrReceiver for the cases where you want both. We could introduce similar predicates in the data-flow API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in another PR. The refactoring would probably warrant more evaluations than I'd care to spend on this query.

@mchammer01
Copy link
Contributor

Thanks for the ping. I'll review this tomorrow (had a bit of a disastrous day today and I am really behind). Hope it's ok!

asgerf and others added 2 commits November 20, 2020 10:15
Co-authored-by: Max Schaefer <54907921+max-schaefer@users.noreply.github.com>
@asgerf
Copy link
Contributor Author

asgerf commented Nov 20, 2020

@mchammer01 no worries. It was GH who pinged you due to code ownership of the qhelp file. I should mention that the qhelp is nearly identical to the Java version from #3938.

@mchammer01
Copy link
Contributor

Oh I didn't know that (feel free to ignore some of my comments then) - just in the middle of reviewing this for you 😉
Waiting for the preview so I can see how it renders.

mchammer01
mchammer01 previously approved these changes Nov 20, 2020
Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asgerf - this LGTM ✨
As mentioned to you, feel free to ignore some of my comments if the content within the qhelp file is nearly identical to that of the equivalent query for Java, which is already live.
Would be good if you could reword the query description (ql file) as it reads a bit oddly.

Where do we stand with regards to release notes? I know we've changed the way we generate these so I am not sure when these get written and by whom (and whether I need to review anything).

@@ -0,0 +1,17 @@
/**
* @name Frequency counts for external APIs that are used with untrusted data
* @description This reports the external APIs that are used with untrusted data, along with how
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description doesn't follow the format specified in https://github.com/github/codeql/blob/master/docs/query-metadata-style-guide.md#query-descriptions-description.
Alternatively, what do you think of something like "Use this query to return the external APIs...."?

<overview>
<p>Using unsanitized untrusted data in an external API can cause a variety of security issues. This query reports
all external APIs that are used with untrusted data, along with how frequently the API is used, and how many
unique sources of untrusted data flow to this API. This query is designed primarily to help identify which APIs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unique sources of untrusted data flow to this API. This query is designed primarily to help identify which APIs
unique sources of untrusted data flow to this API. This query is primarily designed to help identify the APIs

<p>Using unsanitized untrusted data in an external API can cause a variety of security issues. This query reports
all external APIs that are used with untrusted data, along with how frequently the API is used, and how many
unique sources of untrusted data flow to this API. This query is designed primarily to help identify which APIs
may be relevant for security analysis of this application.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am not sure I understand what you mean here, so let me know if my suggestion doesn't make any sense

Suggested change
may be relevant for security analysis of this application.</p>
that may be relevant for the analysis of security within this application.</p>

may be relevant for security analysis of this application.</p>

<p>An external API is defined as a call to a function that is not defined in the source code, not overridden
in the source code, and is not modeled as a taint step in the default taint library. External APIs may be from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
in the source code, and is not modeled as a taint step in the default taint library. External APIs may be from the
in the source code, and not modeled as a taint step in the default taint library. External APIs may come from the

<p>An external API is defined as a call to a function that is not defined in the source code, not overridden
in the source code, and is not modeled as a taint step in the default taint library. External APIs may be from the
third party dependencies or from internal dependencies. The query will report the external package name, followed
by an access path leading to the function, followed <code>[param x]</code> where <code>x</code>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand here. Have we got two successive "followed" or is the second one an example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a typo in the second one, it was supposed to say followed by [param x], but yes, followed by occurs twice.


<p>If the query were to return the API <code>express().get.[callback].[param 'res'].send() [param 0]</code>,
this could correspond to the <code>X</code> in <code>express().get('/foo', (req, res) => res.send(X))</code>.
First we should consider whether this a security relevant sink. In this case, this is writing to a HTTP response, so we should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
First we should consider whether this a security relevant sink. In this case, this is writing to a HTTP response, so we should
First we should consider whether this a security-relevant sink. In this case, this is writing to a HTTP response, so we should

@asgerf
Copy link
Contributor Author

asgerf commented Nov 20, 2020

@mchammer01 Sorry to put you through the review unnecessarily. I'd rather not try to mobilise a coordinated change to multiple queries while they are in flight, so I'd prefer to leave it as-is. I'll just mention that these are not ordinary queries, they're not run by default, mainly for use by specialists, and so probably shouldn't be held to the same standards as those in a query suite.

@max-schaefer can I ask you for another review? (you're requesting changes)

@asgerf asgerf merged commit adc7bbf into github:main Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants