Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile to SystemJS on the fly #79

Closed
Jamesernator opened this issue Mar 31, 2020 · 21 comments
Closed

Compile to SystemJS on the fly #79

Jamesernator opened this issue Mar 31, 2020 · 21 comments

Comments

@Jamesernator
Copy link

My interest for this is primarily for module workers in browsers that already support modules, but it would apply to running ES modules in browsers that don't even support ES modules.

@guybedford
Copy link
Owner

A fast in-browser SystemJS transform would be a great project.

Unfortunately this project is not it, since it is not a parser, and SystemJS transformation requires a full parser.

A wasm-based fast compiler is definitely the future though after Babel for SystemJS transforms. The problem is the lack of funding and companies building this work. I have a few side projects here too, but without funding it's a very hard thing to get off the ground given the number of developer hours compilers take.

@Jamesernator
Copy link
Author

Jamesernator commented Apr 6, 2020

Something I've been considering the last few days is how to make a module format that only requires es-module-lexer and I think I might've come up with one. (I'll write an implementation later this week).

Basically it works like this:

Suppose we have the following ES module:

import { foo } from 'foo';
import bar from 'bar';
import * as baz from 'baz';

function localFunction() {

}

export default class SomeClass {
  constructor() {
    console.log(foo);
  }
}

export function hositedFunction() {
  console.log(bar);
}

export const value = 10 + baz.boz;

Now to preserve hoisting without a full parse and search, we can simply wrap this in a generator function (this is effectively v8's internal implementation and what inspired this approach):

defineModule('qux', function*() {
  const [foo$random1, bar$random2, baz$random3] = yield {
    // Because we can determine imports with es-module-lexer we can
    // easily get this list
    imports: ['foo', 'bar', 'baz'],
    namespace: Object.freeze({
      __proto__: null,
      [Symbol.toStringTag]: 'Module',
      // We wrap each export with a getter, this preserves hoisting
      // because generators support hoisting
      get default() { return SomeClass; },
      get hoistedFunction() { return hoistedFunction; },
      // Notice this will pass along a ReferenceError if called
      // before the following body is evaluated as is expected
      get value() { return value; }
    }),
  };

  class SomeClass {
    constructor() {
      console.log(foo);
    }
  }

  function hositedFunction() {
    console.log(bar);
  }

  const value = 10 + baz.boz;
});

So this gets us most of the way, however we still need to be able to access imports values, which again seems like it needs a full parse. But again a solution: the with statement!

Because we know all names that'll be imported thanks to es-module-lexer, we can simply wrap everything with a with statement:

defineModule('qux', function*() {
  const imports$random1 = {
    __proto__: null,
    get foo() { return foo$random1.foo },
    get bar() { return bar$random2.default },
    get baz() { return baz$random3 },
  }
  with (imports$random1) {
    const [foo$random1, bar$random2, baz$random3] = yield {
      // Because we can determine imports with es-module-lexer we can
      // easily get this list
      imports: ['foo', 'bar', 'baz'],
      namespace: Object.freeze({
        __proto__: null,
        [Symbol.toStringTag]: 'Module',
        // We wrap each export with a getter, this preserves hoisting
        // because generators support hoisting
        get default() { return SomeClass; },
        get hoistedFunction() { return hoistedFunction; },
        // Notice this will pass along a ReferenceError if called
        // before the following body is evaluated as is expected
        get value() { return value; }
      }),
    };

    class SomeClass {
      constructor() {
        console.log(foo);
      }
    }

    function hositedFunction() {
      console.log(bar);
    }

    const value = 10 + baz.boz;
  }
});

Other than a couple easily fixable problems (e.g. no strict mode), this strategy should work as a module format that requires no full parsing (just what es-module-lexer supports).

I haven't tried this out yet, so I'll create an implementation this week, but I feel like it's promising.

@guybedford
Copy link
Owner

Another aspect of the format that requires a parser is that all assignments in modules need to update their live bindings and that circular references must hoist function definitions across module boundaries. These are edge cases few people rely on yes, but edge cases that need to be handled none the less (at least in terms of what this project and SystemJS aim to support).

You are welcome to fork and build your own paths though - this is open source!

And if you have a comprehensive approach that handles all the edge cases and you want to collaborate I'm all ears.

@Jamesernator
Copy link
Author

Another aspect of the format that requires a parser is that all assignments in modules need to update their live bindings and that circular references must hoist function definitions across module boundaries.

Yep that's the trick of using generators + getters, basically functions are hoisted inside generators as per usual, so if I do yield { namespace: { get foo() { return foo; } }, the getter with return the hoisted function definition.

To perform hoisting stage of module evaluation I'll call next on all modules at once, this ensures the namespaces are all available before evaluation of the body is performed.

For live bindings, because removing the import statements effectively turns the lookups into global variable lookups, I can use the with() statement to intercept them (and because the set of imported names is known, there's no risk of collisions with actual globals).

@guybedford
Copy link
Owner

How would this wrapper approach work with reexports? Where eg foo$random1.exportName itself is a reexported live binding?

The other concern is the strict mode handling - although maybe you can embed a 'use strict' function wrapper within the with block.

@guybedford
Copy link
Owner

Also you want to look into reexport chaining with export * etc.

@guybedford
Copy link
Owner

By the way, I think these ideas are very interesting... please do share your demo, as it would be interesting to do the perf and edge case checks. If it can provide a path forward for module workers I'm all for it.

@guybedford guybedford reopened this Apr 7, 2020
@Jamesernator
Copy link
Author

Jamesernator commented Oct 6, 2020

I'll write an implementation later this week

... Six months later

I now have a working implementation of what I described. I basically impelemented Module Records in the spec almost to the letter, using my generator technique above to actually evaluate the module.

It's definitely not production ready yet and isn't bundled or minified or anything, however it's ready enough to try playing around with.

I wrote a short README with build steps if you wanna try it out.

The library can be found here: https://github.com/Jamesernator/module-shim

@guybedford
Copy link
Owner

Nice to see! If you ever want to collaborate on a version 2 of the SystemJS module format specification incorporating these ideas do let me know.

@guybedford
Copy link
Owner

If you want to put your project to the test, see if you can run the SystemJS test suite execution cases here - https://github.com/systemjs/systemjs/blob/master/test/system-core.js#L278.

@Jamesernator
Copy link
Author

If you ever want to collaborate on a version 2 of the SystemJS module format specification incorporating these ideas do let me know.

I would be interested, yes.

I should note the internal format I'm using is slightly different than what is described above.

The new format is fairly simple, the following ES Module is converted like so:

import SomeImport from "./mod1.js";
import { SomeImport2 } from "./mod2.js";

const someResource = new URL("./foo.yaml", import.meta.url);

export default class FooBar {
    load(module) {
         return import(module);
    }
}

export function foo() {
    return 12;
}

becomes

// Imported bindings are attached to this object, these were detected by the lexer
// so it's guaranteed these are all known ahead-of-time, this acts as the module scope
// Also attached is a generated name for import() and import.meta
// This allows us to avoid a complicated transform that involves analysing scopes
// for import references
with (arguments[0]) {
    // Because the with() statement is sloppy-only we need an inner
    // wrapper to set strict mode for the module body
    yield* function*() {
        "use strict";
        // These getters are generated by the exports detected by the lexer
        // thanks to the behaviour of generators we can access hoisted bindings
        // such as foo even before the statements after this yield are executed
        // this allows us to match the hoisting behaviour of ES modules without
        // a complex transform
        yield {
             get default() {
                 return FooBar;
             },
             get foo() {
                 return foo;
             },
        };

        // The body is inserted pretty much as is except with import/export
        // statements stripped and import()/import.meta replaced to refer to
        // a name on the module scope object
        const someResource = new URL("./foo.yaml", importMeta$$0.url);
        
        class FooBar {
           load(module) {
               return dynamicImport$$0(module);
           }
        }

        function foo() {
            return 12;
        }
    }();
}

Adapating to an ahead-of-time format would probably just involve skipping the with() wrapper entirely and just remap imported names to refer to arguments[0].<importedName> directly.

If you want to put your project to the test, see if you can run the SystemJS test suite execution cases here -

I should note that my module-shim doesn't actually implement SystemJS, it just implements Module/CyclicModule/SourceTextModule, it's very similar to Node's vm.*Module classes. Any kind of loader could be implemented on top of them though.

I do intend to run the module tests from test262 though to ensure it is a spec-compliant implementation of ES Module Records (except for parse errors, as that's just the limitation of the lexer).

@guybedford
Copy link
Owner

Each of those SystemJS tests I pointed to are just ES module cases. Switching those same tests to run against https://github.com/systemjs/systemjs/tree/master/test/fixtures/es-modules would do the same.

Put it this way, if you can cover all the test262 tests, that's the bar.

@Jamesernator
Copy link
Author

Jamesernator commented Oct 14, 2020

I've run the library against test262 and have all tests passing except for a couple minor things that depend on full parse (so similar to export let { destructuring: [here] } it won't be supported).

I've published an initial version to npm here: https://www.npmjs.com/package/module-shim

Next up is to add support for Top-Level await and arbitrary string import/export names.

@guybedford
Copy link
Owner

@Jamesernator nice to see this, but it doesn't seem to handle the cases I mentioned. The following test just failed for me:

import SourceTextModule from 'module-shim/SourceTextModule';

const m1 = await SourceTextModule.create({
  source: `
    import { f2, v2 } from 'm2';
    export function f1 () {
      v2++;
    }
    export var v1 = 10;
    console.log(v1, v2);
    f2();
    console.log(v1, v2);
  `,
  resolveModule
});

const m2 = await SourceTextModule.create({
  source: `
    import { f1, v1 } from 'm1';
    f1();
    export function f2 () {
      v1++;
    }
    export var v2 = 20;
  `,
  resolveModule
});

function resolveModule (specifier) {
  if (specifier === 'm1') return m1;
  if (specifier === 'm2') return m2;
}

await m1.link();
m1.evaluate();

I would suggest using a simpler linking API of link(module, resolve) as opposed to overloading the classes.

@guybedford
Copy link
Owner

That is:

function resolve(specifier, parentModule)

that would match the HOST_RESOLVE_IMPORTED_MODULE hook in ECMA-262 better.

@Jamesernator
Copy link
Author

nice to see this, but it doesn't seem to handle the cases I mentioned. The following test just failed for me:

I'm a bit confused, the behaviour of the test case you gave above is the same as running those modules as ESM in Node. Is there some other behaviour you were expecting?

module-shim
undefined:14
      v2++;
        ^

TypeError: Cannot set property v2 of [object Object] which has only a getter
    at Object.f1 (eval at createModuleEvaluationGenerator (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/ModuleEvaluator.js:58:25), <anonymous>:14:9)
    at eval (eval at createModuleEvaluationGenerator (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/ModuleEvaluator.js:58:25), <anonymous>:13:5)
    at Generator.next (<anonymous>)
    at ModuleEvaluator.evaluate (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/ModuleEvaluator.js:151:51)
    at SourceTextModule.#executeModule (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/SourceTextModule.js:209:31)
    at SourceTextModule.executeModule (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/SourceTextModule.js:36:43)
    at SourceTextModule.#innerModuleEvaluation (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/CyclicModule.js:163:30)
    at SourceTextModule.#innerModuleEvaluation (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/CyclicModule.js:157:48)
    at SourceTextModule.#evaluate (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/CyclicModule.js:189:40)
    at SourceTextModule.evaluate (file:///home/jamesernator/Projects/playground/node_modules/module-shim/dist/CyclicModule.js:22:43)
Node
file:///home/jamesernator/Projects/playground/mod1.js:3
  v2++;
    ^

TypeError: Assignment to constant variable.
    at f1 (file:///home/jamesernator/Projects/playground/mod1.js:3:5)
    at file:///home/jamesernator/Projects/playground/mod2.js:2:1
    at ModuleJob.run (internal/modules/esm/module_job.js:146:23)
    at async Loader.import (internal/modules/esm/loader.js:165:24)
    at async Object.loadESM (internal/process/esm_loader.js:68:5)

I would suggest using a simpler linking API of link(module, resolve) as opposed to overloading the classes.

Currently the API is just a mirror of Abstract Module Records by providing the four methods: GetExportedNames(exportStarSet)/ResolveExport(exportName, resolveSet)/Link()/Evaluate(). .link(linker) might be slightly more convenient but I don't imagine people would really want to use this without wrapping it a small loader anyway.

that would match the HOST_RESOLVE_IMPORTED_MODULE hook in ECMA-262 better.

It should be noted resolveModule is already of the form resolveModule(specifier, parentModule), it's just the example didn't need it for anything. I should probably add an example to the README to show some form of loader being implemented on top of the module-shim.

@guybedford
Copy link
Owner

Ok, well that is just incredible work then!

The only remaining concern I have is if the with / getter approach has a performance cost over setters.

Consider eg:

import fn from 'dep';
for (let i = 0; i < 1000; i++)
  fn(i);

if on every tick of the loop it is running a getter it seems like that will be slower than the setter approach used in the System module format currently.

Have you done any checks on these types of cases?

@guybedford
Copy link
Owner

Also if there are reexports in the getter chain does that also add to the slowdown?

@Jamesernator
Copy link
Author

The only remaining concern I have is if the with / getter approach has a performance cost over setters.

Have you done any checks on these types of cases?

I haven't done any checks on the performance yet, but yes I do expect it to be slower due to the with() statement usage. This is unavoiable with the on-the-fly compilation simply because there's no way to identify all locations the exported name is set without a full parse.

An ahead-of-time format could use setters however. I'll have to try writing an evaluator to check, but I think it would easy enough to support both formats (including mixing) without performance loss if the on-the-fly isn't used.

Also if there are reexports in the getter chain does that also add to the slowdown?

No, all recursive lookups happen only once during the creation of the namespace object. The getters themselves don't perform any recursive lookup.

@guybedford
Copy link
Owner

Yes I think that a replacement for the SystemJS module format itself would need to use a AOT setter approach still, but that would still bring the format simplification of the generator style to the output.

I like that you construct the full namespace upfront as well.

In terms of eg supporting workers in this project it really depends on how long it takes for dynamic import support in workers to land across the browser baseline. If we get that then this project can rely on that fine. But if that continues to be delayed then your approach might be better. It's still a little difficult to tell which way things will go here so I'd suggest separate projects for the time being, but I'm all for a merge / supporting your project in future.

@ExE-Boss
Copy link

Well, if the object that’s passed to the with statement has a null prototype and is sealed, then engines might be able to somewhat optimise it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants