Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert: improve simple assert #17581

Closed
wants to merge 5 commits into from

Conversation

BridgeAR
Copy link
Member

@BridgeAR BridgeAR commented Dec 9, 2017

This improves the error message in simple asserts by using the
real call information instead of the already evaluated part.

CI https://ci.nodejs.org/job/node-test-pull-request/12010/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

assert

@nodejs-github-bot nodejs-github-bot added the assert Issues and PRs related to the assert subsystem. label Dec 9, 2017
@Trott
Copy link
Member

Trott commented Dec 10, 2017

Seems like there's an edge case here where the source file is changed after execution begins, in which case it's possible to alter the assertion message or perhaps link to something that streams endless data and thus providing a vector for a DoS. Might be a bit far-fetched, but perhaps worth thinking on a bit in the interest of caution?

lib/assert.js Outdated
try {
const src = readFileSync(call.getFileName(), 'utf8');
const line = src.split('\n')[call.getLineNumber() - 1];
const potentialMsg = line.match(/assert(\.ok)?\((.*)\)/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: .* -> .+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: (\.ok) -> (?:\.ok) (which changes it to a non-capturing group) and then below potentialMsg[2] -> potentialMsg[1]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regular expression is going to be necessarily buggy. In its current form, it can return an incorrect value if there is more than just the assert.ok() on the line. For example:

assert.ok(someIdentifier); someFunc();

...will have someIdentifier); someFunc( as the potential message.

If we switch from greedy matching to lazy matching, it will fix that issue but then it might return the wrong thing in other situations (like assert.ok(someFunc() > someNum)).

Unfortunately, the right tool for the job here isn't a regular expression but a parser. And I'm pretty sure we don't want to go down that route.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the first nit assert().
The second one is indeed nicer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the first nit assert().

Right, but if you do that, then the captured message is an empty string. Is that what you want? If so, ignore me. :-D I assumed an empty value for message was undesirable, but thinking about it now, I'm less sure. :-D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Trott we already have acorn and could use that. A simple alternative would be to do a multiple step thing like e.g.

  1. if that line contains only a single opening and closing bracket, match the content.
  2. count all opening and closing brackets that are not surrounded by a string.
  3. if that is not the case, print the whole line

The assert could span over multiple lines as well though... but I doubt that this would be a problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about the probable brittleness and the likelihood of bug reports. (Even if we never pull out a wrong message, there will still be the potential for "why do I get different assertion messages based on how I format my code" issues.) I'm not fervently opposed to this but I'm -0 on it, at least for now. I do admire the creativity here, though. I would have never thought to try to open the source code file.

@BridgeAR
Copy link
Member Author

@Trott that is indeed correct. I also thought about streaming lots of data. I can not think of a good solution against this though. The question though is if this is really a thing (especially concerning DoS because you must already have file access to those files and in that case you probably can also do worse things).

Changing the message... I can not think of anything right now where this could be harmful.

@BridgeAR
Copy link
Member Author

So after thinking more about it I see a solution to improve this further

  1. I can change readFileSync to readSync. This is going to require more code but the memory usage will be low at all times in that case and we only have to read the file up to the necessary line.
  2. I switch to acorn to parse the code from the correct line on and provide up to five more lines. I can not even think of any assert that would span over more than five lines and using acorn that way is super easy. We would immediately know exactly where the assert begins and where it ends.

This would mitigate almost all concerns as far as I see it. @Trott would you be fine with such a implementation?

@Trott
Copy link
Member

Trott commented Dec 10, 2017

This would mitigate almost all concerns as far as I see it. @Trott would you be fine with such a implementation?

@BridgeAR Yeah, I think that would be fine with me. I guess the thing to think about there is if it's possible to exploit a bug in acorn or if it's possible to spin acorn into an infinite loop or something. I have no idea about any of that stuff, so I would probably refrain from a full +1, but I definitely wouldn't stop it if a few people who felt competent to evaluate it thought it was 👍.

@BridgeAR
Copy link
Member Author

I went ahead and implemented it as I suggested. I decided to always print assert(...) instead of using the "real" function name because I feel like it makes it clearer what it is about. We could further limit the data that may be read to e.g. 128 kb instead of 512 kb as it is right now but I doubt that this is going to be a issue. I also decided to print the "regular" error in case of call / apply - everything else would be confusing.

I also went ahead and used a special escaping to prevent multiple lines as output and similar things but I felt like using the regular inspect escaping slashes and quotation marks is to much. What do others think?

I know this is going to slow down the error case but that should not happen in live code anyway. If it does though, I believe this increases the productivity of assert a lot.

@BridgeAR
Copy link
Member Author

It would be nice if someone could have a look at this again and to get some further thoughts about it.

@BridgeAR
Copy link
Member Author

@nodejs/collaborators PTAL

Copy link
Member

@apapirovski apapirovski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really my area of expertise unfortunately (have barely looked at assert code before) but changes seems good.

The only concern I have is that prior to this acorn was only used in repl, right?

@@ -665,6 +665,9 @@ assert.ok(true);
// OK
assert.ok(1);
// OK
assert.ok(typeof 123 === 'string');
// throws "AssertionError: assert(typeof 123 === 'string')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should either be updated to be assert() in the code or assert.ok in the comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that is a tricky one. The example is actually in line with the current implementation. I decided to just always show assert no matter what the real function name was. The reason is that it is more work to find out if the called function was name, obj.name, obj.prop.name and so on. It is easy to print the name but finding out about the potential part before the name is more work.

As a user might actually also import assert as any name (e.g. foobar), I think it is better to have assert as function name. But someone might of course also argue against this because of the very same reason...

@ronkorving
Copy link
Contributor

power-assert built into Node directly? Yes please! :) This would be huge for testing. I can imagine that because of the nature of this PR, we may have some people who would be against this landing at all. But in my excitement, let me just say that I'm in favor of seeing this in Node core.

lib/assert.js Outdated
// Do not try to get the error if there is no code that we can read.
var fd;
try {
fd = openSync(call.getFileName(), 'r', 0o666);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open question: is there any chance that the string containing the file contents is actually already in memory somewhere, due to the module having been loaded?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I am aware of. The code is imported once and discarded right afterwards again when loading a module. That is also logical as the code might be quite big and could potentially block a big memory chunk. It would of course be nice to prevent reading the files but that is not possible currently (as far as I see it).

What I could do is adding a cache for errors that occurred before. That way each assert that triggers will only result in a single read instead of one per call.

@BridgeAR
Copy link
Member Author

@apapirovski yes, I think acorn is only used in the repl so far. In what way is that critical though?

@ronkorving power-assert does not work out of the box as the code has to be transformed first. That is a advantage and disadvantage at the same time. The disadvantage is that it does not work out of the box and you have to add a build step first. That is not necessary here. The advantage is that the transformed code can actually be even more powerful than this implementation and no extra read is necessary in case a error occurs.

@BridgeAR
Copy link
Member Author

While thinking about power-assert - a alternative implementation would actually be to do a similar transformation as "hook" before the code is compiled to a module. That way the extra read is not necessary anymore but it would also be a overhead while loading modules.

@apapirovski
Copy link
Member

@BridgeAR I don't mind it but it does make the core dependent on a third-party module to a greater extent. I'm just flagging it because maybe someone else will have an issue with it.

@BridgeAR BridgeAR added notable-change PRs with changes that should be highlighted in changelogs. semver-minor PRs that contain new features and should be released in the next minor version. labels Dec 15, 2017
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m very -1 with this change.

I hve seen several people using assert to validate user input, and then catch the errors with try-catch (or promises/async await). This is going to enable them for a DoS vector because at high load this is going to stall the event loop.

A crude solution is to do it for the ~100 assertions.

@BridgeAR
Copy link
Member Author

@mcollina there is a easy way to prevent any recalculation by caching the already read part. This will not result in a DoS vector. This was also already discussed earlier if you read everything. I am aware that this is "not conventional" but I think it is actually quite nice to add this. But I am also thinking about going the way that I described about just hooking into the module loading. That might be a nicer solution, what do you think?

@BridgeAR
Copy link
Member Author

I updated the code to include a caching for the already known error cases.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will produce incorrect data if the underlining module is reloaded. Is there a way to wipe the cache in such occasion?

@BridgeAR
Copy link
Member Author

@mcollina I did not implement a way to further invalidate / wipe the current cache. It should be simple to add that as well. It is a bit unusual though that someone invalidates the modules cache and then reloads the module with different code.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BridgeAR
Copy link
Member Author

I rebased and updated some small things. I am still wondering if it makes sense to implement a further cache invalidation. If a user empties the module cache and loads a changed file, the source code should also have changed and the error should not be triggered at that spot anymore. So I decided against implementing that as it is a bit more complicated.

I also decided against refactoring this to transform the code while loading the module as it would be a overhead for the startup time.

So PTAL. I think this is actually ready.

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 28, 2017
@BridgeAR
Copy link
Member Author

lib/assert.js Outdated
const line = call.getLineNumber() - 1;
const column = call.getColumnNumber() - 1;

if (codeCache.has(`${filename}${line}${column}`)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe nitpicky, but this exact same concatenation happens in 4 places. Perhaps put it in a variable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@BridgeAR
Copy link
Member Author

@BridgeAR
Copy link
Member Author

@ronkorving @mcollina @jasnell PTAL. After looking at it again, I found some minor issues that I fixed now (e.g. setting the stackTraceLimit to <= 0, some more problems with .call and .apply and the EOL under Windows).

On top of that I also changed the output to always include a short description. I think it is just better to read that way.

@BridgeAR
Copy link
Member Author

It now also properly applies to strict and legacy mode.

@BridgeAR BridgeAR added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. assert Issues and PRs related to the assert subsystem. and removed assert Issues and PRs related to the assert subsystem. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Jan 15, 2018
lib/assert.js Outdated
.replace(escapeSequencesRegExp, escapeFn);
message = 'The expression evaluated to a falsy value:' +
`${EOL}${EOL} assert${ok}(${message})${EOL}`;
codeCache.set(identifier, message);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: this should move one line down.

@BridgeAR
Copy link
Member Author

BridgeAR commented Jan 16, 2018

@BridgeAR
Copy link
Member Author

Landed in 27925c4...4319780

@BridgeAR BridgeAR closed this Jan 16, 2018
BridgeAR added a commit to BridgeAR/node that referenced this pull request Jan 16, 2018
This improves the error message in simple asserts by using the
real call information instead of the already evaluated part.

PR-URL: nodejs#17581
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ron Korving <ron@ronkorving.nl>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Jan 16, 2018
`assert.ok()` should always receive a value. Otherwise there
might be a bug or it was intended to use `assert.fail()`.

PR-URL: nodejs#17581
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ron Korving <ron@ronkorving.nl>
BridgeAR added a commit to BridgeAR/node that referenced this pull request Jan 16, 2018
This is the only double line break in the file and for
consistency one is removed.

PR-URL: nodejs#17581
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ron Korving <ron@ronkorving.nl>
@BridgeAR BridgeAR added semver-major PRs that contain breaking changes and should be released in the next major version. and removed semver-minor PRs that contain new features and should be released in the next minor version. labels Jan 16, 2018
@BridgeAR
Copy link
Member Author

I just changed this to be semver-major to be on the safe side. Maybe some people rely on the error message.

@addaleax addaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 18, 2018
@ChALkeR ChALkeR mentioned this pull request Apr 7, 2018
4 tasks
@BridgeAR BridgeAR deleted the improve-simple-assert branch April 1, 2019 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assert Issues and PRs related to the assert subsystem. notable-change PRs with changes that should be highlighted in changelogs. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants