Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coercing conditions with array-like objects into arrays #1143

Merged

Conversation

doublemarked
Copy link

Fixes strongloop/loopback#2824

Please let me know if this is aligned with your goals for DataAccessObject._coerce.

@slnode
Copy link

slnode commented Oct 18, 2016

Can one of the admins verify this patch? To accept patch and trigger a build add comment ".ok\W+to\W+test."

@slnode
Copy link

slnode commented Oct 18, 2016

Can one of the admins verify this patch?

2 similar comments
@slnode
Copy link

slnode commented Oct 18, 2016

Can one of the admins verify this patch?

@slnode
Copy link

slnode commented Oct 18, 2016

Can one of the admins verify this patch?

@davidcheung
Copy link
Contributor

davidcheung commented Oct 20, 2016

in the light of this would it be better if we handle this earlier in the cycle where the query object is built, then we can make it configurable whether user wants their array-like-objects to be converted to arrays

going to move this to a new issue in strong-remoting instead, so it doesnt side track fixing the original issue

@davidcheung
Copy link
Contributor

@slnode ok to test

@doublemarked doublemarked force-pushed the hm/coerce-array-like-objects-as-arrays branch from b948624 to a341f76 Compare October 24, 2016 11:50
@davidcheung
Copy link
Contributor

thanks @doublemarked! the changes LGTM, can you fix the commit title to be under 50chars?

@Amir-61 can you please take a look too? some line seems quite long, but I believe juggler has more lenient linting conventions, can you confirm?

@doublemarked
Copy link
Author

@davidcheung thanks for your feedback. Yes, happy to fix the conflicts and improve linting compliance. Let's see if @Amir-61 has anything to add.

throw new Error(g.f('Value is not an {{array}} or {{array-like object}}'));
}

var arrayVal = new Array(Object.keys(val).length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this correctly, then this is opening a security vulnerability, where the attacker can send an array-like object { 4000000000: true } and we will create a 4GB array containing a single item at a large index. This is reverting the prevention measures implemented by qs.

strongloop/loopback#2824 (comment)

Also in that Issue they reveal that arrayLength exists to prevent params like key[999999] auto-vivifying an array of that length, which I'll keep in mind to avoid as well.

I think we need to find a different solution here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bajtos Are you sure? I tried to implement this in a way that specifically avoids such an issue.

Scallop:Projects heath$ node -v
v6.9.1
Scallop:Projects heath$ node
> keys = Object.keys({ 4000000000: true })
[ '4000000000' ]
> a = new Array(keys.length)
[  ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doublemarked

$ node
> const arg = { 4000000000: true };
undefined
> var result = new Array(Object.keys(arg).length)
undefined
> result[4000000000]=true;
true
> result
[ ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ,
  ... 3999999901 more items ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fortunately, V8 seems smart enough to allocate a sparse array where empty items do not take any (or very little) memory.

However, notice that the maximum array size is limited:

> new Array(5000000000)
RangeError: Invalid array length

In general, I am not comfortable overriding qs decisions about security in LoopBack. I mean if the team working on qs believe that automatically converting indexed query parameters into an array is a security vulnerability, then I trust them and don't want to undo their security-hardening measurements in loopback. If you believe they are wrong and it's ok to allow arbitrarily large arrays, then please talk to them and convince them to remove this restriction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @bajtos :) So, there are two aspects here, the PR code's behavior and parity with qs, both of which I'll address.

Regarding the code from this PR - this code aims to specifically avoids the issue you're concerned about, but in the very least, the fact that this is not clear means that something needs to change here. That said, let me decompose the code a bit to explain it.

The code operates on an assumption that a well-formed "array-like object" contains a series of sequential numeric indices starting from zero. It initializes the arrayVal array with the length of the keys from the source content (Object.keys(arg).length, which is not an arbitrarily large number), and is very careful to never perform an assignment operation like that from your example (result[4000000000]=true).

To avoid assignments like that, it iterates from 0 up until the keys length, and hard fails in case of any missing or unexpected keys. In addition to protecting against the arbitrary-sized key, it also rejects any objects that are cluttered with unexpected non-sequential non-numeric keys. Does this make sense?

Now, regarding qs's approach - yeah generally I agree that it's better to avoid the potential for subtle discrepancies. However, they seem to have decided to entirely disable this array parsing by default in a future release, meaning even single element array-like parameters would be represented as objects. The premise behind this PR was to anticipate that upcoming change, and my implementation aims to provide a more conservative approach to array processing than qs is currently performing.

So it seems like there are several routes to go here regarding qs's arrayLimit,

  1. Do nothing, however this will be a decision to (in the future) disable by default support for array-like parameters (filter[where][someprop][inq][0]=...).
  2. Pick more a reasonable default for arrayLimit as part of the default Loopback config. Currently it defaults to 20, which is easy to unexpectedly stumble into (potentially disastrously). A substantially larger value (1000?) would create no practical risk and likely avoid the problem in almost all cases.
  3. A patch like the one from this PR, which seeks to provide support for array-like objects in the future.

Thoughts?

EDIT: Sorry, quick additional comment:

A substantially larger value (1000?) would create no practical risk and likely avoid the problem in almost all cases.

So, 1000-element arrays shouldn't generally be a problem, but this is an assumption that may be wrong in this context. If a larger default seems like a viable solution I can poke into the qs thread on this topic to see if they agree with this risk assessment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(face palm) I misunderstood what your code was doing, thank you for detailed explanation! Your proposal looks very reasonable now.

I'll be on vacation in the next few days, I'll try to review this in more details later next week, so that we can move your pull request forward. Sorry for the delay!

@bajtos bajtos self-assigned this Nov 30, 2016
var arrayVal = new Array(Object.keys(val).length);
for (var i = 0; i < arrayVal.length; ++i) {
if (!val.hasOwnProperty(i)) {
throw new Error(g.f('Value is not an {{array}} or {{array-like object}}'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this code (and the test cases) correctly, when processing my example result[4000000000]=true, you will only access result[0] and bail out with an error.

I am wondering whether we should iterate over all keys in the input val, thus converting result[4000000000]=true to a single-item array [true]?

var arrayVal = [];
for (var key in val) {
  arrayVal.push(val[key]);
}
return arrayVal;

If we want to reject such requests, then I would like to see a better error message reported back to the user. Right now, coerceArray({ 40000: true }) fails with Value is not an array or array-like object, which is misleading to me, because the input (result[40000]=true) is an array-like object.

@doublemarked Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this code (and the test cases) correctly, when processing my example result[4000000000]=true, you will only access result[0] and bail out with an error.

Yes, correct!

I am wondering whether we should iterate over all keys in the input val, thus converting result[4000000000]=true to a single-item array [true]?

My objective when writing this was to maintain a strict understanding of an "array-like object" to limit the opportunity for subtle bugs, as I didn't see any circumstance where that structure of input data is not the result of some mistake. In particular I was concerned that the presence of non-index keys was an indication that the object was not intended to be interpreted as an array.

But honestly, looking at this again, the rather few places that arrays are permitted (inq, nin,between, and, or) leaves little room for unintended consequences. One ambiguity I see is with between, which depends on correct element order, but a key sort would probably produce intended behavior.

So anyways, in summary - I'd be happy to refactor the patch to loosen the parsing a bit if you feel that would be an improvement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't have a strong opinion. Considering that it's usually easier to loosen a strict implementation, than to make a loose implementation more strict, I think we should proceed with your current proposal.

} else {
try {
clauses = coerceArray(clauses);
} catch (e) {
err = new Error(g.f('The %s operator has invalid clauses %j', p, clauses));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please log the e error (at least via debug), so that users have means how to investigate what exactly is wrong with their clauses.

case 'between':
try {
val = coerceArray(val);
} catch (e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto - please log e. I think it may be best to include it in the error message?

debug('Cannot coerce array clause: ', e);
err = new Error(g.f('The %s property has invalid clause %j: %s', p, where[p], e.message));

it('should be able to coerce where clauses with array-like objects', function() {
where = model._coerce({scores: {0: '10', 1: '20'}});
assert.deepEqual(where, {scores: [10, 20]});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One assert per test please. Split this test into multiple smaller tests, each test calling _coerce + assert only once.

(I do realise the existing code does not follow this best practice.)

{scores: {between: {0: '10', 1: '20', 2: '30'}}},
];

invalid.forEach(function(where) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, please split this into multiple tests. You can write a parameterised test case, e.g.

context('malformed array-like objects for array-only operators', function() {
  var INVALID_WHERES = [
    {scores: {inq: {0: '10', 1: '20', 4: '30'}}},
    {scores: {inq: {0: '10', 1: '20', bogus: 'true'}}},
    {scores: {between: {0: '10', 1: '20', 2: '30'}}},
  ];

  INVALID_WHERE.forEach(function(where) {
    it('should reject ' + JSON.stringify(where), function() {
      expect(function() { model._coerce(where) })
        .to.throw(/*...*/);
    });
  });
});

@bajtos
Copy link
Member

bajtos commented Dec 21, 2016

@doublemarked what's the status of this pull request? Are you waiting for any information from me? I'll be on vacation from Friday to the end of the year, I would like to resolve any blockers before I leave.

@doublemarked
Copy link
Author

doublemarked commented Dec 21, 2016

@bajtos yes, I was waiting on your feedback before cracking the code back open to fix the more minor things. I don't think I need anything else. I should be able to wrap things up by EOD tomorrow, so we can try to get it closed out before your leave. Thanks!

The query-string parser used by express
https://github.com/ljharb/qs#parsing-arrays
limits the size of arrays that are created from query strings to 20
items. Arrays larger than that are converted to objects using numeric
indices.

This commit fixes the coercion algorithm used by queries to
treat number-indexed objects as arrays. We still maintain a strict
understanding of an "array-like object" to limit the opportunity for
subtle bugs. In particular, the presence of non-index keys is an
indication that the object was not intended to be interpreted as
an array.
@bajtos bajtos force-pushed the hm/coerce-array-like-objects-as-arrays branch from c685763 to 2377792 Compare January 10, 2017 13:50
Copy link
Member

@bajtos bajtos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @doublemarked, sorry for the delay. I have applied few last improvements to the code, rebased on top of the current master and git push -f to your feature branch.

Let's wait for CI results before landing.

@bajtos bajtos merged commit b530ec2 into loopbackio:master Jan 10, 2017
@bajtos
Copy link
Member

bajtos commented Jan 10, 2017

Landed, thank you for the contribution! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants