Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esm: implement the getFileSystem hook #41076

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions demo/foo/index.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
console.log(`foo`);
5 changes: 5 additions & 0 deletions demo/index.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
import './foo/index.mjs';
import hash from './foo/index-sha512.mjs';

console.log(`demo`);
console.log(`demo hash:`, hash);
50 changes: 50 additions & 0 deletions demo/loader-sha512.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import crypto from 'crypto';
import path from 'path';

const shaRegExp = /-sha512(\.mjs)$/;

function getSourcePath(p) {
if (p.protocol !== `file:`)
return p;

const pString = p.toString();
const pFixed = pString.replace(shaRegExp, `$1`);
if (pFixed === pString)
return p;

return new URL(pFixed);
}

export function getFileSystem(defaultGetFileSystem) {
const fileSystem = defaultGetFileSystem();

return {
readFileSync(p) {
const fixedP = getSourcePath(p);
if (fixedP === p)
return fileSystem.readFileSync(p);

const content = fileSystem.readFileSync(fixedP);
const hash = crypto.createHash(`sha512`).update(content).digest(`hex`);

return Buffer.from(`export default ${JSON.stringify(hash)};`);
},

statEntrySync(p) {
const fixedP = getSourcePath(p);
return fileSystem.statEntrySync(fixedP);
},

realpathSync(p) {
bmeck marked this conversation as resolved.
Show resolved Hide resolved
const fixedP = getSourcePath(p);
if (fixedP === p)
return fileSystem.realpathSync(p);

const realpath = fileSystem.realpathSync(fixedP);
if (path.extname(realpath) !== `.mjs`)
throw new Error(`Paths must be .mjs extension to go through the sha512 loader`);

return realpath.replace(/\.mjs$/, `-sha512.mjs`);
},
};
}
1 change: 1 addition & 0 deletions lib/internal/errors.js
Original file line number Diff line number Diff line change
Expand Up @@ -1342,6 +1342,7 @@ E('ERR_IPC_CHANNEL_CLOSED', 'Channel closed', Error);
E('ERR_IPC_DISCONNECTED', 'IPC channel is already disconnected', Error);
E('ERR_IPC_ONE_PIPE', 'Child process can have only one IPC pipe', Error);
E('ERR_IPC_SYNC_FORK', 'IPC cannot be used with synchronous forks', Error);
E('ERR_LOADER_MISSING_SYNC_FS', 'Missing synchronous filesystem implementation of a loader', Error);
E('ERR_MANIFEST_ASSERT_INTEGRITY',
(moduleURL, realIntegrities) => {
let msg = `The content of "${
Expand Down
62 changes: 62 additions & 0 deletions lib/internal/modules/esm/get_file_system.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
'use strict';

const {
ObjectAssign,
ObjectCreate,
SafeMap,
} = primordials;

const realpathCache = new SafeMap();

const internalFS = require('internal/fs/utils');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fsUtils might be a better name for this. Otherwise it sounds like internal vs external/public versions, but I think the important distinction is that these are utilities.

const fs = require('fs');
const fsPromises = require('internal/fs/promises').exports;
Comment on lines +12 to +13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use destructuring here I think, otherwise monkey-patching of fs module would impact this module.

const packageJsonReader = require('internal/modules/package_json_reader');
const { fileURLToPath } = require('url');
const { internalModuleStat } = internalBinding('fs');

const defaultFileSystem = {
async readFile(p) {
return fsPromises.readFile(p);
},

async statEntry(p) {
return internalModuleStat(fileURLToPath(p));
},

async readJson(p) {
return packageJsonReader.read(fileURLToPath(p));
},
Comment on lines +27 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should this be used to read json files that are not a package.json? If not, I think the name should be more explicit:

Suggested change
async readJson(p) {
return packageJsonReader.read(fileURLToPath(p));
},
async readPackageJson(p) {
return packageJsonReader.read(fileURLToPath(p));
},


async realpath(p) {
return fsPromises.realpath(p, {
[internalFS.realpathCacheKey]: realpathCache
});
},

readFileSync(p) {
return fs.readFileSync(p);
},

statEntrySync(p) {
return internalModuleStat(fileURLToPath(p));
},

readJsonSync(p) {
return packageJsonReader.read(fileURLToPath(p));
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we can't just have readJson as a well-defined abstraction on top of readFile? Package virtualization can still work by returning serialized JSON - which is likely needed anyway with threading workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make perf analysis on internalReadJson, but I'd intuitively agree that a JS abstraction over readFile (that would just apply a regex to get the containsKey value) would fast enough without requiring a dedicated native function. I can experiment with that. Or do you have something else in mind?


realpathSync(p) {
return fs.realpathSync(p, {
[internalFS.realpathCacheKey]: realpathCache
});
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory realpath can be implemented on top just an lstat and stat implementation, although that would be a slightly more complex refactoring of the realpath algorithm. It might well simplify the model though in just having three base-level FS primitive hooks to virtualize.

Copy link
Contributor Author

@arcanis arcanis Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about that - it'd likely be much slower, and the semantic loss could bring unforeseen issues. I'd not feel confident dropping it in this PR 🤔

};

function defaultGetFileSystem(defaultGetFileSystem) {
return ObjectAssign(ObjectCreate(null), defaultFileSystem);
}

module.exports = {
defaultGetFileSystem,
};
6 changes: 3 additions & 3 deletions lib/internal/modules/esm/get_source.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,28 @@ const {
decodeURIComponent,
} = primordials;
const { getOptionValue } = require('internal/options');
const esmLoader = require('internal/process/esm_loader');

// Do not eagerly grab .manifest, it may be in TDZ
const policy = getOptionValue('--experimental-policy') ?
require('internal/process/policy') :
null;

const { Buffer } = require('buffer');

const fs = require('internal/fs/promises').exports;
const { URL } = require('internal/url');
const {
ERR_INVALID_URL,
ERR_INVALID_URL_SCHEME,
} = require('internal/errors').codes;
const readFileAsync = fs.readFile;

const DATA_URL_PATTERN = /^[^/]+\/[^,;]+(?:[^,]*?)(;base64)?,([\s\S]*)$/;

async function defaultGetSource(url, { format } = {}, defaultGetSource) {
const parsed = new URL(url);
let source;
if (parsed.protocol === 'file:') {
source = await readFileAsync(parsed);
source = await esmLoader.getFileSystem().readFile(parsed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be preferable to have this passed in as a param instead of grabbing off of the application context esmLoader (there can be multiple Loaders).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which case does that happen? It could be a blocker for the node:resolution accessors I mentioned earlier, as it relies on loader utils being singletons.

If that's the case of the "loader loader" that @JakobJingleheimer mentions there, since from what I understand they aren't used at the same time, perhaps the global access would be fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I THINK their sequence is fully serial: the node-land esmLoaderNodeland finishes its work and disappears into the void, and then esmLoaderUserland steps in and starts its work.

Aside from Bradley's concern of bloating params, I think it would be better for this to be passed (and also makes it easier to test). One of our Loader team's side/wish-list projects is to make ESMLoader pure, and this goes against that. Not necessarily a show-stopper, but something to mention: If we do decide to make it global and preclude fully pure, we should do it consciously.

} else if (parsed.protocol === 'data:') {
const match = RegExpPrototypeExec(DATA_URL_PATTERN, parsed.pathname);
if (!match) {
Expand Down
83 changes: 82 additions & 1 deletion lib/internal/modules/esm/loader.js
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,22 @@ require('internal/modules/cjs/loader');
const {
Array,
ArrayIsArray,
ArrayPrototypeFilter,
ArrayPrototypeJoin,
ArrayPrototypePush,
ArrayPrototypeSlice,
FunctionPrototypeBind,
FunctionPrototypeCall,
ObjectAssign,
ObjectCreate,
ObjectKeys,
ObjectPrototypeHasOwnProperty,
ObjectSetPrototypeOf,
PromiseAll,
RegExpPrototypeExec,
SafeArrayIterator,
SafeWeakMap,
StringPrototypeEndsWith,
globalThis,
} = primordials;
const { MessageChannel } = require('internal/worker/io');
Expand All @@ -27,7 +32,8 @@ const {
ERR_INVALID_MODULE_SPECIFIER,
ERR_INVALID_RETURN_PROPERTY_VALUE,
ERR_INVALID_RETURN_VALUE,
ERR_UNKNOWN_MODULE_FORMAT
ERR_UNKNOWN_MODULE_FORMAT,
ERR_LOADER_MISSING_SYNC_FS
Comment on lines +35 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ASCII order

Suggested change
ERR_UNKNOWN_MODULE_FORMAT,
ERR_LOADER_MISSING_SYNC_FS
ERR_LOADER_MISSING_SYNC_FS,
ERR_UNKNOWN_MODULE_FORMAT,

} = require('internal/errors').codes;
const { pathToFileURL, isURLInstance } = require('internal/url');
const {
Expand All @@ -45,6 +51,9 @@ const {
initializeImportMeta
} = require('internal/modules/esm/initialize_import_meta');
const { defaultLoad } = require('internal/modules/esm/load');
const {
defaultGetFileSystem
} = require('internal/modules/esm/get_file_system');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan to only use this in the esm/loader.js and not in the CJS loader? That seems like it might limit the applicability of these hooks, especially when CJS interfaces with eg exports resolutions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment the other hooks are only used in the esm code path so I didn't want to change that since it's not required strictly speaking. A followup PR however will be to consider getFileSystem for cjs.

const { translators } = require(
'internal/modules/esm/translators');
const { getOptionValue } = require('internal/options');
Expand Down Expand Up @@ -81,6 +90,14 @@ class ESMLoader {
defaultResolve,
];

/**
* @private
* @property {Function[]} fileSystemBuilders First-in-first-out list of file system utilities compositors
*/
#fileSystemBuilders = [
defaultGetFileSystem,
];

#importMetaInitializer = initializeImportMeta;

/**
Expand All @@ -104,6 +121,7 @@ class ESMLoader {
translators = translators;

static pluckHooks({
getFileSystem,
globalPreload,
resolve,
load,
Expand Down Expand Up @@ -159,10 +177,17 @@ class ESMLoader {
if (load) {
acceptedHooks.loader = FunctionPrototypeBind(load, null);
}
if (getFileSystem) {
acceptedHooks.getFileSystem = FunctionPrototypeBind(getFileSystem, null);
}

return acceptedHooks;
}

constructor() {
this.buildFileSystem();
}

/**
* Collect custom/user-defined hook(s). After all hooks have been collected,
* calls global preload hook(s).
Expand All @@ -180,6 +205,7 @@ class ESMLoader {
globalPreloader,
resolver,
loader,
getFileSystem,
} = ESMLoader.pluckHooks(exports);

if (globalPreloader) ArrayPrototypePush(
Expand All @@ -194,13 +220,68 @@ class ESMLoader {
this.#loaders,
FunctionPrototypeBind(loader, null), // [1]
);
if (getFileSystem) ArrayPrototypePush(
this.#fileSystemBuilders,
FunctionPrototypeBind(getFileSystem, null), // [1]
);
}

// [1] ensure hook function is not bound to ESMLoader instance

this.buildFileSystem();
this.preload();
}

buildFileSystem() {
// Note: makes assumptions as to how chaining will work to demonstrate
// the capability; subject to change once chaining's API is finalized.
const fileSystemFactories = ArrayPrototypeSlice(this.#fileSystemBuilders);

const defaultFileSystemFactory = fileSystemFactories[0];
let finalFileSystem =
defaultFileSystemFactory();

const asyncKeys = ArrayPrototypeFilter(
ObjectKeys(finalFileSystem),
(name) => !StringPrototypeEndsWith(name, 'Sync'),
);
Comment on lines +240 to +247
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary til chaining is finalised, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I left a comment about that right before this snippet; I mostly wanted to show one working example


for (let i = 1; i < fileSystemFactories.length; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Prefixing the increment operator is not the same as postfixing it and can lead to unexpected results that are rarely intentional. It doesn't make a difference with the code as currently written, but it easily could with a number of a small changes, and the cause is very easy to miss. So the postfix is generally preferred unless the prefix is specifically needed.

Suggested change
for (let i = 1; i < fileSystemFactories.length; ++i) {
for (let i = 1; i < fileSystemFactories.length; i++) {

Copy link
Contributor Author

@arcanis arcanis Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh that's the first time I hear this - preincrementation are usually "preferred" in for loops since they don't require a temporary variable, although any decent compiler like v8 will surely treat them equally in this instance.

Do you have a particular case in mind where preincrementation leads to problems in a for loop expression?

const currentFileSystem = finalFileSystem;
const fileSystem = fileSystemFactories[i](() => currentFileSystem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this a function that returns the current FS utils rather than just passing the current utils? (Ex this doesn't protect against mutating currentFileSystem)

Copy link
Contributor Author

@arcanis arcanis Dec 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly to match the signature of other hooks, which provide the next hook function rather than their results (granted, they accept inputs whereas this one doesn't, so it could be possible, but by consistency I feel more comfortable keeping the same pattern).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually this should do a copy operation rather than direct reference to avoid people purposefully doing mutation becoming breaking change worries.


// If the loader specifies a sync hook but omits the async one we
// leverage the sync version by default, so that most hook authors
// don't have to write their implementations twice.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make sure to throw though if a specific loader implements an async hook but not a sync hook for the same method?

for (let j = 0; j < asyncKeys.length; ++j) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I woulda chosen k for "key" 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a key, though, but the index of a key. I looked at the existing codebase to pick the iterator name and saw many references to i,j

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merely a side comment—I wasn't suggesting a change. Sorry for the confusion. I use k as an index in a set of keys

for (const key of keys) // key = 'foo'

vs

for (let k = 0; k < keys.length; k++) // key = keys[k] = 'foo'

const asyncKey = asyncKeys[j];
const syncKey = `${asyncKey}Sync`;

const hasAsync = ObjectPrototypeHasOwnProperty(fileSystem, asyncKey);
const hasSync = ObjectPrototypeHasOwnProperty(fileSystem, syncKey);

if (
!hasAsync &&
hasSync
) {
fileSystem[asyncKey] = async (...args) => {
return fileSystem[syncKey](...args);
};
Comment on lines +267 to +269
Copy link
Contributor

@aduh95 aduh95 Dec 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not make our user expect a promise here, if/when we end up getting rid of the sync methods in this hook, loader hook authors should expect the previous hook may have passed a synchronous function:

Suggested change
fileSystem[asyncKey] = async (...args) => {
return fileSystem[syncKey](...args);
};
fileSystem[asyncKey] = fileSystem[syncKey];

and if we want it to return a promise, we still need to avoid the spread operator on arrays (which relies on globally mutable Array.prototype[Symbol.iterator]):

Suggested change
fileSystem[asyncKey] = async (...args) => {
return fileSystem[syncKey](...args);
};
fileSystem[asyncKey] =
async (...args) => ReflectApply(fileSystem[syncKey], fileSystem, args);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo it'd be best to attempt to keep a consistent API - otherwise loader authors would be effectively forced to do:

const content = await Promise.resolve(fs.readFile(path));

The extra Promise.resolve wrapper wouldn't look very idiomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another solution is perhaps to just drop this "automatically bind sync functions into async slots", since it comes with its own drawbacks (it can be seen as a footgun, since it makes all async filesystem operations silently turn sync).

I'm not that attached to it, so dropping it would be fine by me.

Copy link
Contributor

@aduh95 aduh95 Dec 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise loader authors would be effectively forced to do:

I don't think that this is correct, you can await non-Promise objects:

console.log({}); // {}
console.log(await {}); // {}
console.log(await Promise.resolve({})); // {}

I don't feel strongly about this, but imho it wouldn't be too bad as an API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: being able to await a non-promise was explicitly for this scenario (can find the citation if needed).

} else if (!hasSync && hasAsync) {
throw new ERR_LOADER_MISSING_SYNC_FS();
}
}

finalFileSystem = ObjectAssign(
ObjectCreate(null),
currentFileSystem,
fileSystem,
);
}

this.fileSystem = finalFileSystem;
}

async eval(
source,
url = pathToFileURL(`${process.cwd()}/[eval${++this.evalIndex}]`).href
Expand Down
Loading