Skip to content

JS: add query for detecting ignored calls to Array.prototype.concat#2203

Merged
semmle-qlci merged 11 commits intogithub:masterfrom
erik-krogh:ignorePureFunction
Nov 6, 2019
Merged

JS: add query for detecting ignored calls to Array.prototype.concat#2203
semmle-qlci merged 11 commits intogithub:masterfrom
erik-krogh:ignorePureFunction

Conversation

@erik-krogh
Copy link
Contributor

A simple and precise query for detecting ignored calls to Array.prototype.concat.

These are the current results: https://lgtm.com/query/6200487140988633440/

It started out as a query for detecting ignored calls to pure functions, but that query had too many weird results.

Type-tracking is used to detect if the receiver is an array, otherwise the query would flag various concat methods from different libraries.

Type-tracking is similarly used to detect if the argument to Array.prototype.concat is an array.
This might be a bit too restrictive, and I'm considering removing that restriction.
But it by keeping it the query doesn't flag a log of unit-tests of the concat method itself.
These are the results we remove from restricting the argument to be an array:
https://lgtm.com/query/1009789502976493454/
(An example of why I'm considering removing that restriction: Someone called concat instead of push.)

Performance is good.
There is only a slight outlier with closure-library, where the type-tracking of arrays take a while.

@erik-krogh erik-krogh added the JS label Oct 25, 2019
@erik-krogh erik-krogh requested review from a team and mchammer01 as code owners October 25, 2019 09:31
@asger-semmle
Copy link
Contributor

Have you tried backtracking from the receiver of a concat instead of forward-tracking all arrays? That might perform better on closure-library.

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back, would it perhaps make sense to generalise this query to look for any array method that is annotated with @nosideeffects in our externs definitions? I imagine, for instance, that developers might occasionally use array.slice where they meant array.splice (the former, of course, not having any side effects).

call.getCalleeName() = "concat" and
call.getNumArgument() = 1 and
(call.getArgument(0).getALocalSource() = array() or isIncomplete(call.getArgument(0))) and
not call.getArgument(0).asExpr().(ArrayExpr).getSize() = 0 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the receiver being the empty array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call does nothing if the input is literally the empty array, and I got tired of looking at tests like these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand your answer. I do agree with this whitelisting. I was just wondering about the symmetric case of [].concat(xs), which I saw in a few tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I misread your comment, sorry. I was thinking about the way I wrote the query and thought you were talking about the argument not the receiver.

You have a good point with the receiver being the empty array. I think we should remove those.

Comment on lines +36 to +37
isArrayMethod(call) and
call.getCalleeName() = "concat" and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think those two lines can be simplified to

Suggested change
isArrayMethod(call) and
call.getCalleeName() = "concat" and
call = array().getAMethodCall("concat")

(Though that won't work if you do type backtracking instead, which I think is a good idea.)

@mchammer01
Copy link
Contributor

Is this PR ready for an editorial review or shall I hold off until all the technical discussions have taken place?

@erik-krogh
Copy link
Contributor Author

Is this PR ready for an editorial review or shall I hold off until all the technical discussions have taken place?

Hold off for now.

@erik-krogh
Copy link
Contributor Author

I implemented the backtracking instead, and it did indeed fix the performance of closure-library.
However, I have to relate the array-creation with the call to the array method, and I therefore ended up with the call as an argument through the type-backtracking.
Its not pretty, but it works.

We risk a performance-blowup if the isIgnoredPureArrayCall predicate is large, but it should be small in practice, so I don't think there is an issue.

With the backtracking there was not pretty way to restrict the concat calls to only the ones that accepted Arrays, so I removed that part of the query. (As mentioned before, I considered removing it anyway).

I'm doing a new performance evaluation on defaults right now.

@erik-krogh
Copy link
Contributor Author

erik-krogh commented Oct 25, 2019

Taking a step back, would it perhaps make sense to generalise this query to look for any array method that is annotated with @nosideeffects in our externs definitions? I imagine, for instance, that developers might occasionally use array.slice where they meant array.splice (the former, of course, not having any side effects).

I added the join/slice methods.
I haven't seen how/if they are misused, so I don't know the precision of the query for those methods.

I don't think methods like map/filter should be added to the query. I've seen too many of them being used as basically forEach.

The remaining Array methods are indexOf and lastIndexOf but I highly doubt that anyone will be confused as to whether those have side-effects.

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I should have explained by suggestion about @nosideeffects a little more clearly: we include the externs definitions in every snapshot, so instead of hard-coding a list of methods that are known to be side-effect free in the query, you can check for a method name m whether there is an externs declaration for a method Array.prototype.m that is annotated as @nosideeffects.

import Expressions.ExprHasNoEffect

DataFlow::SourceNode callsArray(DataFlow::TypeBackTracker t, DataFlow::MethodCallNode call) {
isIgnoredPureArrayCall(call) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to distribute this into the first disjunct. It is implied by the recursive call in the second disjunct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely makes it more readable, as it now follows the usual type-tracking pattern.

@erik-krogh
Copy link
Contributor Author

Apologies, I should have explained by suggestion about @nosideeffects a little more clearly

I did find that that part of the code, and decided not to use it.
E.g. toString is marked as @nosideeffects. But when I looked at pure functions in general I found plenty of benign uses of exactly toString. So I prefer having an explicit white-list in the query.

erik-krogh and others added 2 commits October 29, 2019 12:10
Co-Authored-By: Max Schaefer <54907921+max-schaefer@users.noreply.github.com>
Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo one minor niggle and adding the query to a legacy suite.

@mchammer01, I think this is ready for doc review.

| Use of returnless function (`js/use-of-returnless-function`) | maintainability, correctness | Highlights calls where the return value is used, but the callee never returns a value. Results are shown on LGTM by default. |
| Useless regular expression character escape (`js/useless-regexp-character-escape`) | correctness, security, external/cwe/cwe-20 | Highlights regular expression strings with useless character escapes, indicating a possible violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
| Unreachable method overloads (`js/unreachable-method-overloads`) | correctness, typescript | Highlights method overloads that are impossible to use from client code. Results are shown on LGTM by default. |
| Ignoring return from concat (`js/ignore-return-from-concat`) | maintainability, correctness | Highlights calls to the concat method on array where the return value is ignored. Results are shown on LGTM by default. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs updating now that the query is more general.

@mchammer01
Copy link
Contributor

Thanks @max-schaefer - I'll review this first thing on Thursday (not in tomorrow).

Copy link
Contributor

@mchammer01 mchammer01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erik-krogh - editorial eview completed.
Lookgs good, I've made a few minor comments/suggestions.

<overview>
<p>
The <code>concat</code>, <code>join</code> and <code>slice</code> methods are
pure and do not modify any of the inputs or the array the method was called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question here. Are you deliberately using the past here? ….was called...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no..
I'm changing it to is called.

| Use of returnless function (`js/use-of-returnless-function`) | maintainability, correctness | Highlights calls where the return value is used, but the callee never returns a value. Results are shown on LGTM by default. |
| Useless regular expression character escape (`js/useless-regexp-character-escape`) | correctness, security, external/cwe/cwe-20 | Highlights regular expression strings with useless character escapes, indicating a possible violation of [CWE-20](https://cwe.mitre.org/data/definitions/20.html). Results are shown on LGTM by default. |
| Unreachable method overloads (`js/unreachable-method-overloads`) | correctness, typescript | Highlights method overloads that are impossible to use from client code. Results are shown on LGTM by default. |
| Ignoring return from concat (`js/ignore-return-from-concat`) | maintainability, correctness | Highlights calls to the concat method on array where the return value is ignored. Results are shown on LGTM by default. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the name be Ignoring result from pure array method, and shouldn't the description mention the other two methods?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I had forgot to change that when the scope of the query changed.

erik-krogh and others added 3 commits November 4, 2019 18:54
@max-schaefer
Copy link
Contributor

Almost there! Could you add the query to https://github.com/Semmle/ql/blob/master/javascript/config/suites/javascript/correctness-core, please?

Copy link
Contributor

@max-schaefer max-schaefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM now!

@erik-krogh
Copy link
Contributor Author

@mchammer01 a final review?

@erik-krogh
Copy link
Contributor Author

I did a final evaluation. Performance and results are still looking good.

@mchammer01
Copy link
Contributor

Thanks for the ping @erik-krogh - I have now approved this PR.

@semmle-qlci semmle-qlci merged commit 04f0c22 into github:master Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants