Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: initial experimental promisified API #15485

Closed
wants to merge 2 commits into from
Closed

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Sep 20, 2017

Introduces an experimental promises API for the fs module. The API is accessible via fs.async, e.g.

This is intended to land as Experimental.

const { async:fs } = require('fs');

fs.access('.')
  .then(console.log)
  .catch(console.error);

Most fs functions are supported. There are some variations from
the base fs module (for instance, fs.async.realpath() uses the
faster libuv realpath implementation rather than the slower js
implementation.

This does not yet include documentation updates.

Semver-major because this makes substantive changes to error handling in the existing fs module.

Benchmarks are included. For instance,

access

james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 280,078.9048933864
fs/promises-access.js method="promisify" n=200000: 69,310.04884236863
fs/promises-access.js method="promise" n=200000: 240,928.00098102028
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 236,062.80841465757
fs/promises-access.js method="promisify" n=200000: 82,258.05538216804
fs/promises-access.js method="promise" n=200000: 192,776.68164314007
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 112,929.07886994253
fs/promises-access.js method="promisify" n=200000: 136,272.83764802708
fs/promises-access.js method="promise" n=200000: 169,612.60194223037
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 163,294.66290369138
fs/promises-access.js method="promisify" n=200000: 119,130.69228308211
fs/promises-access.js method="promise" n=200000: 185,855.53426089959

copyFile

james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 282,755.1240812897
fs/promises-copyfile.js method="promisify" n=20000: 166,754.91197646034
fs/promises-copyfile.js method="promise" n=20000: 101,068.59267154362
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 264,452.6619289267
fs/promises-copyfile.js method="promisify" n=20000: 84,065.33716046945
fs/promises-copyfile.js method="promise" n=20000: 100,486.89215315646
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 270,616.2934281645
fs/promises-copyfile.js method="promisify" n=20000: 156,420.93141688287
fs/promises-copyfile.js method="promise" n=20000: 103,760.93116221747
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 251,781.945806882
fs/promises-copyfile.js method="promisify" n=20000: 118,330.0739314469
fs/promises-copyfile.js method="promise" n=20000: 266,315.20161791815
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 106,352.46687950206
fs/promises-copyfile.js method="promisify" n=20000: 120,297.63535689937
fs/promises-copyfile.js method="promise" n=20000: 260,925.24326093044
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 283,371.8168790678
fs/promises-copyfile.js method="promisify" n=20000: 154,272.6477009993
fs/promises-copyfile.js method="promise" n=20000: 177,795.42772743877
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 116,029.27694807078
fs/promises-copyfile.js method="promisify" n=20000: 154,155.65571356818
fs/promises-copyfile.js method="promise" n=20000: 92,732.89058694763
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 286,576.9026610743
fs/promises-copyfile.js method="promisify" n=20000: 69,093.89846983271
fs/promises-copyfile.js method="promise" n=20000: 83,583.96891906712
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-copyfile.js
fs/promises-copyfile.js method="legacy" n=20000: 123,446.49834316899
fs/promises-copyfile.js method="promisify" n=20000: 157,543.37448739127
fs/promises-copyfile.js method="promise" n=20000: 150,655.3488844555

As illustrated by the benchmarks, the promise versions can be faster than the traditional callbacks, but the results are inconsistent. This appears largely to do with gc (there are a lot of promise objects created that need to be cleaned up).

Having the Promise version does not obsolete the callback version, as the creation and management of the promise objects can be fairly expensive.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

Introduces an experimental promises API for the fs module.
The API is accessible via `fs.async`, e.g.

```js
const { async:fs } = require('fs');

fs.access('.')
  .then(console.log)
  .catch(console.error);
```

Most fs functions are supported. There are some variations from
the base fs module (for instance, `fs.async.realpath()` uses the
faster libuv realpath implementation rather than the slower js
implementation.

This does not yet include documentation updates.
@jasnell jasnell added the semver-major PRs that contain breaking changes and should be released in the next major version. label Sep 20, 2017
@nodejs-github-bot nodejs-github-bot added the lib / src Issues and PRs related to general changes in the lib or src directory. label Sep 20, 2017
@jasnell jasnell added fs Issues and PRs related to the fs subsystem / file system. promises Issues and PRs related to ECMAScript promises. labels Sep 20, 2017
@jasnell
Copy link
Member Author

jasnell commented Sep 20, 2017

Note that these are not util.promisify wrappers. These are implementations of the fs functions with Promises without any wrapping. In some cases, the methods were entirely reimplemented (e.g. readFile), and changes were made at the native layer to resolve promises there where possible. This implementation should not be considered complete, but it is functional enough to land as experimental.

@mscdex
Copy link
Contributor

mscdex commented Sep 20, 2017

The API is accessible via fs.async

This seems confusing given that non-*Sync fs methods are also asynchronous.

There are some variations from
the base fs module (for instance, fs.async.realpath() uses the
faster libuv realpath implementation rather than the slower js
implementation.

Why?

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Note that these are not util.promisify wrappers. These are implementations of the fs functions with Promises without any wrapping.

In that case, the [util.promifiy.custom] properties on the original functions should still be set to the “proper” async implementation wherever that makes sense

req);
} catch (err) {
promiseReject(promise, err);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried making all of these functions async functions to have implicit try/catch+reject? I think unless you’re having any await in it that’s all there they are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, was considering that already :-)

configurable: true,
get() {
return require('internal/async/fs');
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not value + writable: false?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid the experimental warning from printing until someone explicitly goes to use the new API. We can switch this later once it's non-experimental

src/node_file.cc Outdated
if (promise->IsPromise()) {
if (promise.As<Promise>()->State() != Promise::kPending) return;
Local<Promise::Resolver> resolver = promise.As<Promise::Resolver>();
resolver->Resolve(context, arg).FromJust();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to #5691 you’ll want an InternalCallbackScope around this … ideally move the class definition to node_internals.h in the same way it’s done in #15428 to avoid merge conflicts :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, thanks for the reminder :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, how and where to use InternalCallbackScope is about as clear as mud right now :-( ... any chance of having some docs written up on it? Even if only in the form of some code comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, sure. From node.h:

 * `MakeCallback()` is a wrapper around this class as well as
 * `Function::Call()`. Either one of these mechanisms needs to be used for
 * top-level calls into JavaScript (i.e. without any existing JS stack).

do you have any specific suggestions for how to improve that? should resolving promise state be mentioned explicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll work up some language.

src/node_file.cc Outdated
Local<Value> promise =
object()->Get(context, env->oncomplete_string()).ToLocalChecked();
if (promise->IsPromise()) {
if (promise.As<Promise>()->State() != Promise::kPending) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this not true? Can you make this a check or, alternatively, add a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 .. yeah, was already looking at making this a CHECK. It should always be true.

src/node_file.cc Outdated
Local<Value> argv[2];
argv[0] = Null(env->isolate());
argv[1] = arg;
MakeCallback(env->oncomplete_string(), 2, argv);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arraysize(argv) :)

@jasnell
Copy link
Member Author

jasnell commented Sep 20, 2017

@mscdex

This seems confusing given that non-*Sync fs methods are also asynchronous.

Yeah, I'm thinking I'll just change it to fs.promises.*

There are some variations from the base fs module (for instance, fs.async.realpath() uses the faster libuv realpath implementation rather than the slower js implementation.

Why?

Performance mainly. And lack of backwards compatibility concerns. The key reason we ended up having to revert back to the old realpath implementation is because changing broke existing code. We don't have that issue here. Also, if we kept the js implementation, we wouldn't really gain any performance advantages over simply doing util.promisify(fs.realpath) given the way it is implemented.

@addaleax
Copy link
Member

Fwiw, I think it might be cleaner to introduce fs.fastRealpath() for that, maybe?

const getPathFromURL = internalURL.getPathFromURL;

const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue with accessing internalFs. For example realpathCacheKey is only used in one location, why create a module level variable to manage instead of being explicit and accessing it from internalFs the one time?

@@ -283,35 +190,41 @@ fs.access = function(path, mode, callback) {
throw new errors.TypeError('ERR_INVALID_CALLBACK');
}

if (handleError((path = getPathFromURL(path)), callback))
return;
handleError(path = getPathFromURL(path));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, my first thought when I saw the change to handle errors sync was that it is a good idea but after having a second thought I am not so sure about that anymore. The path could definitely be passed in by users and therefore handling the error in the callback would probably be the best thing to stick to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, argument validation error handling is inconsistent, with some errors resulting in a throw and others being passed to the callback. I'm good with passing them to the callback, it just needs to be consistent (e.g. all validation errors always going to the callback or always throwing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they should be consistent but as long as it is not a "programmers" error I think it should land in the callback.


if (!nullCheck(path, callback))
return;
if (typeof path !== 'string' && !Buffer.isBuffer(path)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually accept all TypedArrays right now if I am not mistaken. Did you want to limit it to buffers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check

req.oncomplete = makeCallback(callback);
binding.access(pathModule._makeLong(path), mode, req);
};

fs.accessSync = function(path, mode) {
handleError((path = getPathFromURL(path)));
fs.accessSync = function(path, mode = fs.F_OK) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the validation of the mode is later it would be better performance wise to check for undefined (the Number.isInteger check is obsolete in that case). But this might be over optimizing stuff.

@@ -329,9 +242,9 @@ Object.defineProperty(fs.exists, internalUtil.promisify.custom, {


fs.existsSync = function(path) {
handleError(path = getPathFromURL(path));
nullCheck(path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -827,117 +808,192 @@ fs.ftruncate = function(fd, len, callback) {
if (typeof len === 'function') {
callback = len;
len = 0;
} else if (len === undefined) {
} else if (len == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the explicit version is actually better performance wise. But do we really want to support using null at all? I did not check but I guess you just ported that over from the c++ layer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this simplifies an additional check that was being done in C. There's an existing comment in the C code about the inconsistency here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think == null is optimized anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamingr that is currently not correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #15178.

return;
if (!nullCheck(path, callback)) return;
var req = new FSReqWrap();
mode = modeNum(0o777);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modeNum wrapping here is obsolete. I think you actually wanted to do

if (typeof mode === 'function') {
  callback = mode;
  mode = 0o777;
} else {
  mode = modeNum(mode, 0o777);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to other functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


if (typeof path !== 'string' && !Buffer.isBuffer(path)) {
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'path',
['string', 'Buffer', 'URL']);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great if all the errors could also contain the actual value.

});
bench.start();
for (let i = 0; i < n; i++) {
access(__filename, () => countdown.dec());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that running 2e5 calls in the same tick is actually a good way of benchmarking this. I think it would be better to either use a recursive strategy to see the individual call length or a batched recursive call that batches up to lets say 500 calls together.
This applies to all these benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, ideally if we can get a more applicative benchmark that'd be the best

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, the promises version always queue the callback to run later every time here which would be disproportionately represented in the above benchmark (I think)

@mhdawson
Copy link
Member

@nodejs/tsc for awareness

@ofrobots
Copy link
Contributor

nit: in the PR abstract s/promisified/promised

@MylesBorins
Copy link
Contributor

re: fs.fastRealpath please refer to these references about age old security vulns found in realpath(3). The potential off by one sec vuln was why it wasn't built on top of originally. I landed a fix into libuv that should guard against this, but we should likely have a larger discussion about this before introducing the api

#2680 (comment)
https://www.kb.cert.org/vuls/id/743092

@benjamingr
Copy link
Member

Note that these are not util.promisify wrappers. These are implementations of the fs functions with Promises without any wrapping

Can we get a benchmark comparing the two?

@jasnell
Copy link
Member Author

jasnell commented Sep 21, 2017

@benjamingr ... there are some in the original post at the top. e.g.

james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 280,078.9048933864
fs/promises-access.js method="promisify" n=200000: 69,310.04884236863
fs/promises-access.js method="promise" n=200000: 240,928.00098102028
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 236,062.80841465757
fs/promises-access.js method="promisify" n=200000: 82,258.05538216804
fs/promises-access.js method="promise" n=200000: 192,776.68164314007
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 112,929.07886994253
fs/promises-access.js method="promisify" n=200000: 136,272.83764802708
fs/promises-access.js method="promise" n=200000: 169,612.60194223037
james@ubuntu:~/node/node$ ./node --no-warnings benchmark/fs/promises-access.js
fs/promises-access.js method="legacy" n=200000: 163,294.66290369138
fs/promises-access.js method="promisify" n=200000: 119,130.69228308211
fs/promises-access.js method="promise" n=200000: 185,855.53426089959

The second entry in each run uses util.promisify, the third entry uses the approach in this PR.

@benjamingr
Copy link
Member

cc @bmeurer

@@ -259,7 +260,9 @@ E('ERR_NO_CRYPTO', 'Node.js is not compiled with OpenSSL crypto support');
E('ERR_NO_ICU', '%s is not supported on Node.js compiled without ICU');
E('ERR_NO_LONGER_SUPPORTED', '%s is no longer supported');
E('ERR_OUT_OF_RANGE', 'The "%s" argument is out of range');
E('ERR_OUTOFBOUNDS', 'The "%s" argument is out of bounds');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better ERR_OUT_OF_BOUNDS (to be consistent with prev line)?

@pitaj
Copy link

pitaj commented Oct 2, 2017

Why is this labelled semver-major? It doesn't appear to be a breaking change, shouldn't it be a minor version feature addition?

@jasnell
Copy link
Member Author

jasnell commented Oct 2, 2017

It makes a number of breaking changes including changes to error handling in the existing fs apis


if (!Number.isInteger(len) || len < 0) {
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'len',
'unsigned integer');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't a lot of these checks be done in c++? it would be a nice performance win.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled many of these out of C++ in order to get better consistency with the move to internal/errors and consistency with where the type checking is happening. As it is, the type checking is rather... inconsistent. Will look at it again, tho.

@jasnell
Copy link
Member Author

jasnell commented Dec 18, 2017

Work continued in #17739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. lib / src Issues and PRs related to general changes in the lib or src directory. promises Issues and PRs related to ECMAScript promises. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet