Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

crypto.randomBytes throws in async mode #4583

Closed
hueniverse opened this Issue Jan 13, 2013 · 31 comments

Comments

Projects
None yet
6 participants

The following code does not check if args[1]->IsFunction():

if (!args[0]->IsUint32()) {
Local s = String::New("Argument #1 must be number > 0");
return ThrowException(Exception::TypeError(s));
}

Same problem with PBKDF2...

Is it part of the convention to both return a callback error and throw?

Owner

bnoordhuis commented Jan 13, 2013

The following code does not check if args[1]->IsFunction():

It does, a few lines below. args[1]->IsFunction() is used to differentiate between synchronous and asynchronous invocation. In hindsight, I should've created two functions, randomBytes() and randomBytesSync(). Oh well.

Is it part of the convention to both return a callback error and throw?

Yes. Exceptions are thrown on application errors (bad input) while runtime errors (errors not under your control) are passed on to the callback.

@bnoordhuis bnoordhuis closed this Jan 13, 2013

The documentation examples are misleading then.

Owner

bnoordhuis commented Jan 13, 2013

How's that?

The async code example doesn't show the need to wrap the method in a try...catch while the sync does. This implies all errors come through the callback (which really is the way it should work in a well-formed API, but that's just my 2c).

Owner

bnoordhuis commented Jan 14, 2013

crypto.randomBytes() is no special case, that's how most functions in node.js work. Note that when I say 'application errors', what I really mean is 'program bugs': you shouldn't catch them, you should fix them.

That's a very narrow view of how APIs are used.

You are assuming it is called with static arguments that can and should be fixed during development. Otherwise, you are expecting the API called to duplicate the input validation effort which is wasteful.

Also, combining input errors with execution errors has value in helping produce 100% test coverage in the JS code. For example, I don't know how to test the past of my code that deals with randomBytes errors via a callback because I can never make it happen. On the other hand, the sync flavor is easy to test. In this case, I just gave up and switched to use the sync version which made me sad (but solved my test coverage and ugly code issues).

The bottom line is, some errors are returned via a callback and some are thrown. This makes for a messy interface. You seems to be assuming that the API user has checked the C++ code to see what errors can be received via each channel.

The effort in passing ALL errors via the callback is trivial. Unless there is an architecture value in the current design, it is just broken. I'm happy to do all the work of going through every single API and fixing this, but was unable to figure out how to return the above error via the API. If you can provide one example for the randomBytes code, I will do a PR for everything else... So I'm willing to put my money where my mouth is. This is not just me complaining...

Owner

bnoordhuis commented Jan 14, 2013

You are assuming it is called with static arguments that can and should be fixed during development. Otherwise, you are expecting the API called to duplicate the input validation effort which is wasteful.

I'm assuming that you check your inputs before passing them on, yes. If you feel that's redundant, I don't know what to tell you.

Unless there is an architecture value in the current design, it is just broken.

There is value and it's not broken. Application bugs should be punished, swiftly and at the call site. Passing on the error to the callback a) makes debugging harder, and b) convolutes bugs with runtime errors.

The simple outcome of this design is that no one can use this API without either:

Reading the the FULL source code in JS and C++ for every function they use, map out all the input validation tests that are expected (for example, no where does it mention that randomBytes is limited to an unsigned 32bit integer!), duplicate that entire effort in the application code, and they keep track of any changes to the code,

OR,

Wrap every function call with a try...catch which makes for a complex code given the multiple exit points (both a callback with a potential error, and an exception exit).

How is this easier to write or debug:

var randomBitsString = function (bits, callback) {

    if (!bits ||
        typeof bit !== 'string' ||
        bits < 0) {

        return callback(new Error('Invalid random bits count'));
    }

    if (!callback ||
        typeof callback !== 'function') {

        return callback(new Error('Invalid callback'));
    }

    var bytes = Math.ceil(bits / 8);
    if (bytes >= 4294967296) {              // unsigned int 32
        return callback(new Error('random bits count too big'));
    }

    try {
        Crypto.randomBytes(bytes, function (err, buffer) {

            if (err) {
                return callback(err);
            }

            return callback(null, buffer.toString('hex'));
        });
    }
    catch (err) {
        return callback(err);
    }
};

Instead of:

var randomBitsString = function (bits, callback) {

Crypto.randomBytes(Math.ceil(bits / 8), function (err, buffer) {

    if (err) {
        return callback(err);
    }

    return callback(null, buffer.toString('hex'));
});

};

And this is the super simple case. Things get much messier with PBKDF2.

@hueniverse I'm inclined to agree with @bnoordhuis on this; If you pass something as an argument to an API which is of not the right type, then you should expect it to throw an error; heck, if this were a compiled language, you wouldn't even get it to compile.

I don't think you need all of that code there; however, perhaps the documentation needs to better communicate things like Argument types? i.e., that this function (and potentially others) can only take UInt32's?

I don't want to write code that constantly looks for errors coming from different channels. So who should test for the size being a UInt32? The current design requires both!

If you are taking arbitrary user input, and passing it through to a function that expects some pretty explicit input types, then you should be either validating that input, or catching the error that it produces. I don't think this is a problem with node.

You are missing the point. We all write code that takes inputs from somewhere else and calls node APIs. Those APIs already do the heavy lifting of input validation. No need to do it twice. The point is only how these errors are delivered.

In a callback architecture, it is much better to return all errors in a consistent manner which is through the first error argument of the callback. The fact that some errors are thrown and some are returned via the callback means that the application code has to handle both cases which leads to: ugly, unreadable code and to the fact that now callbacks are returned within a try...catch scope that has no business handling potential callback throws...

The only practical implication of this is that some people will write ugly code that deals with every error scenario, and others will just assume it doesn't throw and ignore it. That's bad design.

The sync version of these functions return all errors via an exception. Nice and clean.

isaacs commented Jan 14, 2013

I do not agree with @bnoordhuis on this. We need to have consistency in how we report errors to the user. Programs are typically made of many interconnected parts, and throwing from an async function is very surprising, because now you have to catch errors in two places.

We had this same discussion regarding \0 bytes in fs functions. Here's the pattern we established there:

  1. Async API: All errors reported to callback
  2. Sync API: All errors reported via throw
  3. Bindings: assert() and abort() to kill the program if the internal API is used incorrectly. (Users should never see this happen, indicates a bug in node.)

@isaacs isaacs reopened this Jan 14, 2013

If someone can point me to an existing place were a validation error is returned via the callback in C++, I am happy to submit a PR and even review other places for the same pattern.

isaacs commented Jan 15, 2013

This is an issue that will require more discussion with the broader community.

There are clear "doing it wrong" application errors. Consider something like fs.open({not:"a string"}, 'r', cb). Node can know at the start of the function that this is clearly wrong.

Should we throw there, because you passed in bad arguments? Or pass the error to the callback?

One approach says, "Application errors and run-time errors are fundamentally different. In the first, the programmer is doing something that is clearly wrong, and the best way to debug it would be to crash asap. In the second, the system/environment/etc is in an unexpected state, and the best way to handle that is to use the normal mechanisms (passing the error to the callback). Deferring detectable application errors violates the principle of raising errors as soon as possible, and loses information about the call site."

The other says, "We already have an error-handling mechanism for async functions: the callback. The cost of having two mechanisms for raising errors (in additional code cruft and program instability) is not worth the value of having errors reported sooner, even if they are detectable. When multiple parts are connected, the difference between 'application error' and 'run-tme error' is much more vague, especially in a weakly typed language. Throwing from async functions violates the principle of least surprise."

Clearly, regardless of which approach wins out, it must be documented and clearly explained which functions can throw, what situations they will throw in, etc.

There are vocal people on both sides of this debate. Both have good points, and clearly want what's best for Node. The question is: which is the vocal minority?

We're not going to change this for v0.8. I'll write up a doc page about error handling to cover how things work there, and we can use that as a jumping off point for perhaps changing the policy.

Can I help?

Jumping in a little late here, but feel a little adamant about this one. (as it sounds many are =)

It seems to me that throwing on bad programmer input is the fundamental way js works, and honestly shouldn't even need to be mentioned in the documentation (though if the documentation doesn't state exactly what the input should be, then it needs to be fixed). Since js doesn't care how many or what arguments you pass, often they'll need to be parsed anyways so the function knows what to do. Then if the wrong arguments are passed it'll leave the function in an undefined state.

@hueniverse there are problematic assumptions you're making in your examples. First is that you assume ease of writing is more important than easily identifying bugs while using the api, and that it wouldn't be easier to debug. The first example, with corrections to your incomplete sanitizations, is going to be far easier to debug since it fails the moment I try to use it improperly. And the point of throwing is not to improve test coverage, it's to smite you for using it improperly.

The second is that you assume a callback was actually passed. So if it's not, and you try to call it, the application will throw regardless. Also for some reason you still felt the need to wrap the call in a try catch. If you've properly sanitized your inputs this should not be necessary, and if it is then they're not being sanitized properly.

As both you and @isaacs have pointed out consistency is key, and I completely agree. Where I disagree is in the definition of "consistent". I've pointed out one scenario where the async call must throw from bad input. So making an exception for that case breaks consistency.

@trevnorris First, node is not JS. Or at least this has nothing to do with JS and everything to do with the node API.

I agree that we can't make an absolute rule about callback vs. throwing because the callback itself can be the problem. But that's also an obvious edge case exception. My preference is to pass ALL errors via the callback whenever possible. This leaves the 'bad callback' error as the only one throwing (or when the API code itself is broken). I think the cost of returning errors via a callback is insignificant on the API side vs the overhead of catching alongside a callback on the client.

My examples are meant as a strawman. The point is that to make them identify all the possible issues (which you consider the better way to catch bugs at the expense of developer ease and readability), I had to read the API source code in C++. That cannot possibly be a requirement or even a suggestion.

Expecting the documentation to identify every possible error that can be thrown or returned isn't practical either. The API code keeps changing, along with new errors and error conditions. You can't possibly expect developers to read the C++ or internal modules JS code diff on every release to replicate those checks.

And without full replication of all the tests for error conditions that throw, I still must use a try...catch. And then of course there is the sin of repeating the exact same validation code twice (or more with nested calls). A quality API checks inputs for you, not forces you to check everything ahead only to be repeated.

But I'm in no way an absolutist and think there is an easy, pragmatic middle ground. After all, making the changes I am asking for will take some effort and time, and so this can be viewed as a step.

The key here is how we define 'coding error' vs 'runtime error'. I think we can easily agree on:

Application errors:

  • Bad data types
  • Incorrect number of arguments

Runtime errors:

  • Out of memory
  • Hardware issues
  • Network errors

Which leaves out what this thread was originally about:

  • Bad input (with the right data type)

One view is that this is an application error because it's predictable and can be tested for and prevented. Another is that most function inputs are generated at runtime in a working system, and represent a valid error situation.

From a purely pragmatic perspective, it's safer and more efficient to treat bad input as a runtime error. The API call is going to validate the input anyway (if it doesn't, it will throw and that's a bug in the API code) - no need to duplicate effort on the calling side. Bad input errors are much easier to handle in the callback because it usually means notifying someone else about it.

If we treated bad input as runtime error, it will address 99% of my concerns. I can live with true application errors throwing and will not need to try...catch them.

One last note - when I say bad input with the right data type I mean the right JS data type, not C++. This means that if I pass a number, but the number must be a C++ uint32, that's a runtime error, not an application error. We cannot require developers to manually check the boundaries of their numbers on every call (along with reading the C++ code to understand those boundaries). Expecting a number and returning a callback with bad range error is the right approach.

@hueniverse Thanks for the clarifications. Allow me to clarify mine.

  • If an error is going to be thrown, it should be thrown early
  • Even in the case of an async call, an error should be thrown if essential inputs are found incorrect
  • An error should never be thrown once a method has entered the async queue (these are a huge pain to find and reproduce)
  • Errors should be thrown on necessary parts. For example:
    • If a user tries to access out of bounds memory, and this can be detected immediately (e.g. not after a buffer has been received from the server and found to be out of bounds)
    • If the user doesn't pass an essential argument, or it is incorrect

The word " essential " needs a clear definition.

All my js experience has shown me that numbers, booleans and strings will all be coerced out of the argument given to them. The timers api is a good example, and since it's locked I think a pretty good possible standard. setTimeout can be called with any value as the second argument, and it will be evaluated as a number. There's no need to check this value. Just coerce the sucker into the type it needs to be.

There are some good examples of overzealous type checking and lack of coercion that currently exist. Good example:

var buf = new Buffer(4);
buf.writeFloatLE('6', '0');
// TypeError: value not a number

I mean, seriously? While there is a performance advantage to the programmer being consistent with passing Smi's, it shouldn't throw despite them. (@isaacs, mind giving me your two cents here?)

Essential arguments are, for example, an object with specific properties, a callback function or a buffer of a minimum length. imho, if these are passed in to either an async or sync call they should be easily detectable from the beginning, and should throw and error.

@trevnorris I completely disagree that setTimeout should be used a model of allowed arguments, or that much from JS's early days should. Automatically casting types is a terrible practice. In fact, most quality code in C and C++ will go out of its way to block the compiler from casting implicitly and that's the role model I follow.

However, the examples presented are all fine by me as application errors. What I'd like to know is what you think about the "bad input" example I listed earlier. Can you live with randomBytes() returning "number too big" in the callback instead of an exception (as one example)? Or do you think everyone using it should check ahead for bytes < 4294967296.

@hueniverse Sorry, disagree. Auto type casting is a feature of js, sort of how v8 works (e.g. BooleanValue() isn't going to throw because the argument isn't true/false) and what js developers are going to expect. And the hole model of coding js like c/c++ is just going to annoy js developers who've only worked in type-less languages.

Anyways. that's besides the point.

As for an answer to your question, that will completely depend on how the internals work. Take the two super simplistic examples:

// going to assume fn is always passed
function myRandomBytes(size, fn) {
  var rand = crypto.randomBytes(size);
  setTimeout(function() {
    fn(null, rand)
  }, 0);
}

Since the random bytes are generated before entering into the async queue, yes it should throw.

// going to assume fn is always passed
function myRandomBytes(size, fn) {
  setTimeout(function() {
    var rand, err = null;
    try {
      rand = crypto.randomBytes(size);
    } catch(e) { err = e }
    fn(err, rand);
  }, 0);
}

Since the random bytes are generated after leaving the code flow, then the error should carry into the callback.

w/o actually looking at the code, I'd guess that randomBytes doesn't block immediately. So all errors, except say missing the callback or a negative size, should flow. But again I'll state that the vast majority of js developers are going to expect that null, undefined or false will evaluate to 0, and any non uint values will be floored.

No no no. That's the whole point. I should not need to care about how the internals work. Your approach seems to offer me either going over protective or prayer. When the cost of always returning these errors in the callback is practically nothing.

Time to let the code team pick a path and adapt either way.

isaacs commented Jan 29, 2013

@hueniverse The problem here is that the core team is split on this issue. We all can agree that fs.open({nonstring:'path'}) should throw, but are somewhat split on the more subtle errors, where typeof returns the correct value, but it's out of range, or has an invalid character. (See prior discussions about the handling of \0 characters in fs paths.)

Incidentally, in this case, long before you get an error about the number overflowing, you'll see an uncatchable FATAL ERROR: v8::Object::SetIndexedPropertiesToExternalArrayData() length exceeds max acceptable value from V8.

I think that the appropriate compromise in this specific case is to throw if the value is non-numeric, but return a more helpful error to the callback if it's out of range. In particular, these two results are just excessively confusing, and indefensible:

> crypto.randomBytes(1.2)
TypeError: Argument #1 must be number > 0

> crypto.randomBytes(4294967296)
TypeError: Argument #1 must be number > 0

It IS a number!  It IS > 0!!  WTF, NODE?!?

Also:

> crypto.randomBytes(Math.pow(2,31))
FATAL ERROR: v8::Object::SetIndexedPropertiesToExternalArrayData() length exceeds max acceptable value

We should detect that and handle it more sanely.

@bnoordhuis @trevnorris @piscisaureus Thoughts on this?

A callback, if provided, is must to be called exactly once. This guarantee should not be broken, and I've filed bug reports with and fixed modules when it is, even for the case of throwing an Error instead of passing it to a callback.

@bnoordhuis is correct that errors should be reported quickly and swiftly. But there is little difference between throwing an error and passing an Error object to a callback, the two are identical in semantics. Even if there was a difference, you still need to call the callback (assuming the program doesn't exit due to the throw): Both calling the callback without an error, and then throwing an error, would also be acceptable (some people may expect the callback to be called after the current tick and not in-line, but that's a different issue), though I wouldn't recommend that.

isaacs commented Jan 29, 2013

Thank you @acubed for providing the other reasonable and defensible, yet completely opposite, view on this issue. (That being said, no, we cannot ever call an async callback on the current tick.)

@isaacs it seems the best approach at this point is along the lines you suggested, to address this on a per-issue basis, discuss the specifics, want wait until we see a more coherent pattern emerging based on individual consensus calls...

Which is mostly just fancy talk for I'm happy to get this one resolved and open issues in the future when I find more of these that annoy me... :-)

Eh? I'm reporting this is is a bug for me too. That is, Node.js must call the callback if provided, and if so it mustn't pass an Error that's also being thrown.

(And I'm not proposing that a Node.js callback be called in-line, I'm not sure why I brought that up.)

@isaacs isaacs added a commit to isaacs/node-v0.x-archive that referenced this issue Jan 30, 2013

@isaacs isaacs crypto: Only throw if randomBytes arg is a non-number
If it is out of the acceptable range, then return a more sensible
error.  (Throw for sync, call the callback for async.)

Fixes GH-4583
b7198f7

@trevnorris trevnorris was assigned May 4, 2013

@isaacs you referenced this issue as fixed in your commit above. Close?

Assumed fixed. Comment if there are more issues.

@trevnorris trevnorris closed this May 21, 2013

isaacs commented May 21, 2013

@trevnorris The pull req for that commit is still open. But it seems that this is on the very edge of our rules, and there's not much strong agreement on how it should go. Crypto typically falls on the side of being extra strict, though, so I think we should just probably say it's staying how it is.

Oy. Sorry. Totally missed the commit is on isaacs/node. :-P

@spion spion referenced this issue in winterland1989/Action.js Feb 20, 2016

Closed

Discuss some design detail #7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment