Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement object argument type #77

Merged
merged 8 commits into from
Feb 11, 2021
Merged

Conversation

devinivy
Copy link
Member

This PR implements the the ability to have an object-type argument so that the user can build-up an object value using dot-separated paths and JSON. For example, an object argument named pet might be built from --pet '{ "type": "dog" }' --pet.name Maddie, resulting in the parsing output { pet: { type: 'dog', name: 'Maddie' } }. Comparable functionality can be found in other arg parsers such as yargs.

I am not sure if this may be applicable within lab, but if this lands I believe yargs can be removed from confidence, and will simplify some code in hpal-debug as well, which would be useful to hapi pal. I believe this could also open-up some possibilities, e.g. to build request payloads from the command line.

@devinivy devinivy added the feature New functionality or improvement label Jan 24, 2021
Copy link
Contributor

@kanongil kanongil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a powerful feature. As such it warrants careful review, especially from a security point-of-view.

This seems well thought out, but I am concerned about the unchecked ability to modify the object structure from what could very well be user-supplied data. Eg. for an object x, where a postcode property with a number is expected, this input could potentially cause unexpected issues:

--x.postcode '{ "toString": null }'`

I don't see a way to safely implement this feature without requiring a schema along with a hookable validator (eg. for full Joi validation).

test/index.js Outdated Show resolved Hide resolved
@kanongil
Copy link
Contributor

For example, an object argument named pet might be built from --pet '{ "type": "dog" }' --pet.name Maddie, resulting in the parsing output { pet: { type: 'dog', name: 'Maddie' } }.

The --pet.name needs quotes to be valid JSON.

@devinivy
Copy link
Member Author

devinivy commented Jan 24, 2021

Thanks for the review!

I don't see a way to safely implement this feature without requiring a schema along with a hookable validator (eg. for full Joi validation).

This crossed my mind, but I figured that the user should perform application-level validation once they obtain the parsed arguments. Even the basic arguments could have this issue: perhaps your calculator CLI needs --divisor to be a non-zero number in order to have well-defined behavior: bossy is leaving it up to the app to detect this and provide a useful error message that makes sense in the context of the application. As I see it bossy currently performs validation (e.g. required, valid) to the extent that it needs to do its job of parsing or document the argument in Bossy.usage(), and nothing is stopping the user from applying good security practices after that process is complete. Perhaps you see things differently, though.

The --pet.name needs quotes to be valid JSON.

If the input does not look like JSON then the value will be taken as a string. I will note this in a code comment in the catch block when we parse the input and ignore SyntaxErrors 👍

@kanongil
Copy link
Contributor

Ah, I missed the parsing fall-through. I would definitely oppose that part of the feature.

The convenience is by far outweighed by the unexpected side-effects, like the names true, false & null not resolving as strings. Of course, requiring quotes for strings means escaping issues and potential injection attacks if passed through a shell command line.

This brings me back to the only way to do this safely, is full schema support. Using Validate/Joi, would also allow for nice error messages. The only part that I guess could work without it, is non-fallthrough plain JSON object support without any merging. But then it seems like a non-feature, as the app could just do the parse.

@kanongil
Copy link
Contributor

I would just like to add, that I have previously used Joi for validating command lines with minimist doing the basic parsing, which worked quite well.

@devinivy
Copy link
Member Author

devinivy commented Jan 25, 2021

The convenience [of the parsing pass-through] is by far outweighed by the unexpected side-effects, like the names true, false & null not resolving as strings.

Those particular keywords wont resolve as strings, but not because of the pass-through: it's the parsing itself that makes them not strings. Do you have an issue with the pass-through (if so, any thoughts on what the behavior should be for non-JSON? An error?) or these keywords not being treated as strings (if so, would you recommend we parse objects and arrays but not simple values? how would a user set a value to null?). For the record I am happy with the current behavior and think it is reasonable for object-type args.

The only part that I guess could work without it, is non-fallthrough plain JSON object support without any merging.

What is it about merging that is special? I think this feature would still be plenty useful if it performed deep assignments rather than deep merges and I would be cool making that change if it quells your concerns.

This brings me back to the only way to do this safely, is full schema support.

I am having trouble following exactly why this feature necessitates full schema support in bossy— could you break it down a little bit more? I do follow that the app consuming bossy output cannot make any assumptions about the contents of an object arg (aside from the fact that it is an array or object), which seems acceptable to me.

I would just like to add, that I have previously used Joi for validating command lines with minimist doing the basic parsing, which worked quite well.

Totally! I would expect users of bossy to do the exact same thing: use bossy for the basic parsing and then validate the result with something like Joi. Minimist allows unstructured input (including deep objects, though no JSON parsing), so applications consuming minimist output should perform validation on the result, and the same goes for bossy.

I genuinely look forward to hearing your perspective on these questions, but I realize we might not end in full agreement and could need to solicit some additional review. Perhaps someone will have some new ideas for us ☺️

@kanongil
Copy link
Contributor

kanongil commented Jan 26, 2021

I see the value of what you are trying to do, and appreciate that you want to accommodate my thoughts.

While I am by no means a security researcher, I have a keen interest in security, and have found and reported several issues. As such I see potential problems if someone decides to invoke a tool with this feature as part of some automated process. Specifically, it is possible that someone will pass a user-supplied value to a specific field (eg. --x.name), and that it will cause issues when what it expected was a simple string, is suddenly null or an object.

One compromise that might work without a schema, is to only support strings for the path-based merging. Ie. --x.name null results in x: { name: 'null' }, and any further parsing / validation would need to be done by the app, eg. using Joi. That way the arg parsing would not provide sudden surprises depending on the exact string value.

@Nargonath
Copy link
Member

I definitely don't have as much security knowledge as you @kanongil but I feel with that compromise we'll make both the feature and the code more complicated for what I think should be the developer responsibility. IMO that's not because someone uses bossy in an automated process that he's exempted of validating the inputs he receives from it. Also bossy is not an executable binary itself so it will most likely be used as part of another CLI project where the user can validate bossy outputs. I understand we want to be as much secured as possible but we have to draw a line somewhere. We can still make a future enhancement though and bring full schema validation to bossy definitions to feel the gap.

I'd be more inclined to add the feature as @devinivy intended.

@kanongil
Copy link
Contributor

kanongil commented Jan 29, 2021

@Nargonath What I propose is actually aligned with (at least one likely interpretation of) the included docs. How do you expect developers to know that passing null to --pet.name will not result in a string?

Also, where is the extra complexity? My proposal would require one if statement, and less test logic than the PR one.

@devinivy
Copy link
Member Author

I consulted @nlf for feedback due to his background in security. I will try to represent his thoughts accurately.

  • He also flagged the ambiguity in non-string primitives, e.g. null vs 'null' and mentioned the qs module as having gone through similar conversations.
  • He likes the idea of validation being involved, but doesn't think it's necessarily required here.
  • We talked about how better messaging and documentation could be a big improvement.
  • Numbers are awkward because intuitively we want to parse those despite not wanting to parse other JSON primitives.

I thought through all the feedback so far, and came-up with a plan that I think at least speaks to the issues that have come-up, including validation:

  • The name of this type is changing to 'json' to represent the input rather than the output. Hopefully this will better signal that inputs are handled as JSON.
  • By default anything that parses to a primitive is going to be treated as a string: numbers, null, true, false.
  • The above behavior throws away information about the representation of the user's input for primitives (e.g. --obj.num 1.2 versus --obj.num '"1.2"') which may indicate different meanings in JSON, so a flag parsePrimitives will be introduced to control this.
  • The docs will state clearly that when parsePrimitives is true that users should avoid ambiguity by specifying string values as JSON (i.e. wrapped in quotes --pet.name '"Maddie"'), and that it's the responsibility of applications to pass this information along to users.
  • The merging behavior will be better described in the docs.
  • For bossy users who want structured, documented, validated object inputs I will add a utility to roll-up output from Bossy.parse() into an object for flags that use dot-separation. This may make more sense once you see it.

I hope this may do the trick for us!

@devinivy
Copy link
Member Author

devinivy commented Feb 4, 2021

Sorry for the poke, but @nlf @kanongil @geek @Nargonath if any of you have time to take a peek at the latest updates that would be awesome and appreciated.

Copy link
Contributor

@kanongil kanongil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel there are security gotchas with parsePrimitives: false, but it is better than before.

@@ -50,33 +50,82 @@ include the usage value formatted at the top of the message.
Options accepts the following keys:
* `colors` - Determines if colors are enabled when formatting usage. Defaults to whatever TTY supports.

### `object(name, parsed)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new method is nice and explicit.

API.md Outdated
* `valid`: A value or array of values that the argument is allowed to equal.
* `valid`: A value or array of values that the argument is allowed to equal. Does not apply to `json` type arguments.

* `parsePrimitives`: When `true`, arguments of the `json` type will parse JSON primitives rather than treat them as strings. For example, `--pet.name null` will result in the output `{ pet: { name: 'null' } }` by default. However, when `parsePrimitives` is `true`, the same input would result in the output `{ pet: { name: null } }`. The same applies for other JSON primitives too, i.e. booleans and numbers. When this option is `true`, users may represent string values as JSON in order to avoid ambiguity, e.g. `--pet.name '"null"'`. It's recommended that applications using this option document the behavior for their users.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this an option means that it will be much safer and there will be less surprises. I don't feel that it properly describes that the value will fallback to the string value if the parsing fails.

Maybe this could have another parsePrimitives: 'strict' mode, where there is no fallback?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the docs to make it clearer 👍 and I will look into parsePrimitives: 'strict'. If I do implement 'strict' then it would parse primitives strictly and only allow primitives, otherwise return an error.

LICENSE.md Outdated
@@ -1,5 +1,5 @@
Copyright (c) 2014-2020, Sideway Inc, and project contributors
Copyright (c) 2014, Walmart.
Copyright (c) 2014-2021, Sideway Inc, and project contributors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the scope of this patch. And should it really extend "Sideway Inc" copyright like that? Hasn't it been transferred away?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to extend the copyright to project contributors (i.e. myself with this patch) into 2021. I will add it on a new line to represent the fact that Sideway hasn't contributed in 2021.

let jsonDef;

if (opt.includes('.')) {
const maybeDef = keys[opt.split('.')[0]];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work with 'json' objects defined at any depth but 0. Eg. for a definition like this:

'pet.owner': { type: 'json'  }

I don't know if that should supported, but I don't see any docs, tests, or checks against it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting this causes some usability and predictability issues (esp. with Bossy.object()), so it is not supported at this time. It's enforced by Joi, and Bossy.object() has an assertion that relates to this issue. The test for it is here: https://github.com/hapijs/bossy/pull/77/files#diff-5bb8db779819ddef5956a5d9d5949c05ef7445237656ca37bf2f02720271440bR465-R474

lib/index.js Outdated
Bounce.ignore(err, SyntaxError);
}

if (!last.def.parsePrimitives && (!value || typeof value !== 'object')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to always parse objects & arrays? It means that even with parsePrimitives: false, someone that expects a simple string could crash the app with {"toString":null}, possibly producing unexpected results.

Copy link
Member Author

@devinivy devinivy Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone is looking to guarantee they receive a string input rather than json input then they would want to use a dot-separated string-type arg, then optionally use Bossy.object() to roll it up into an object. This can even be combined with a json-type arg to get the best of both worlds and sidestep any unpredictability if the application would like to guarantee certain deep properties be strings. If the application wants a string but doesn't know the deep property at which they require that string, then I think it's fair that they will need to validate this themselves.


flags[name].push(value);
}
else if (last.def.type === 'json') {
Hoek.merge(flags[name], value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a default value was provided, the merge will modify it. I suspect the default value will need to be cloned to avoid surprises.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each definition is cloned near the top of Bossy.parse(), and I do have a test confirming that defaults aren't mutated, so I think we're good 👍 https://github.com/hapijs/bossy/pull/77/files#diff-5bb8db779819ddef5956a5d9d5949c05ef7445237656ca37bf2f02720271440bR362

Hoek.assert(!opt.includes('.'), `Cannot build an object at a deep path: ${opt} (contains a dot)`);

const initial = parsed.hasOwnProperty(opt) ? parsed[opt] : {};
const depth = (path) => path.split('.').length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an internal method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that I understand why this would need to live on internals. There is no styleguide rule about this that I am aware of, and in my opinion colocating this utility near where it's used is going to be a bit clearer (at least for me).

lib/index.js Show resolved Hide resolved
@devinivy devinivy added this to the 5.1.0 milestone Feb 11, 2021
@devinivy devinivy self-assigned this Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants