Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture: Modify the AST directly #44

Open
iccir opened this issue Feb 3, 2015 · 17 comments
Open

Architecture: Modify the AST directly #44

iccir opened this issue Feb 3, 2015 · 17 comments
Assignees

Comments

@iccir
Copy link
Member

iccir commented Feb 3, 2015

The oj compiler currently modifies lines of code directly (via the Modifier class), rather than operating on the AST. This is due to historical reasons: before source maps became widely supported, I relied on preserving line numbers between oj source and the generated js source to aid debugging.

Now that source maps are common, and escodegen supports them, the compiler should modify the oj AST directly: converting the oj AST into an ECMAScript 5 AST. We can then pass that AST into escodegen to generate the resulting js source.

We can also pass that AST into ESLint (see #40) or 6to5 (http://6to5.org) without having to go through a codegen/reparse pass. More importantly, we can expose the AST to clients and allow them to write custom modifiers before passing into escodegen.

The oj->TypeScript code generator will still be string-based, as the TypeScript compiler doesn't allow AST input, and there is no real specification for a TypeScript AST.

The long term plan:

  1. modifier.js and generator.js are only used by the typechecker.
  2. A new replacer.js (or transformer.js ?) is responsible for replacing/transforming the oj AST to a js AST.

Strategy:

  1. Don't touch modifier.js and generator.js
  2. Clone generator.js to replacer.js. Add debug option to use the replacer rather than the generator
  3. Command line tool to spin up ojc with the generator, then with the replacer, and then compare the output
  4. Ensure the outputs of our source base (and AL's source base) are functionally identical.
  5. Remove ES5 output from generator
@iccir iccir self-assigned this Feb 3, 2015
@iccir
Copy link
Member Author

iccir commented Feb 3, 2015

This may take awhile, development will occur in the ast branch

@iccir
Copy link
Member Author

iccir commented Oct 21, 2015

One major issue here is that the TypeScript AST is not the same as the Esprima AST. Hence, we still would have to have generator.js around and maintain two very different code paths.

I think this is out of scope for 2.0.

@iccir iccir removed this from the 2.0 milestone Oct 21, 2015
@IngwiePhoenix
Copy link

What does the TS AST have to do with the Esprima JS AST? just to make sure I get this right.

@iccir
Copy link
Member Author

iccir commented Oct 22, 2015

There was a lot of frustration that occurred in the hours leading up to that comment :) None of this is infeasible from an implementation standpoint, it's an architectural question of "what's the simplest thing to do that accomplishes our goals" and "how much work should ojc do, how much can we leverage existing ES tools".

I'm hesitant on the long term plan of having two very different paths (one AST based for ES5/6 generation, one string based for type-checking). Right now, generator.js is used by both paths. Although, over time, the number of if (language === LanguageTypechecker) checks inside of generator has grown. Ideally, everything would be AST-based and use the ESTree specification. TypeScript would use an superset of the ESTree specification. transformer.js would be used for both paths.

There's nothing wrong with the current method of generation. It just results in string output, and that string output needs to be reparsed by Babel/Uglify/ESLint/JSHint/etc.

My original comment of: "More importantly, we can expose the AST to clients and allow them to write custom modifiers before passing into escodegen." is still true, but it's also possible to accomplish the same by sending the string output of ojc into Babel and writing a Babel transformer.

@IngwiePhoenix
Copy link

I see what you are getting at. In fact, many people and modules seem to use a similar technique. For instance, WebPack 2 knows that Babel emits ES6 and does some more reading to do dead-code stripping - but it doesn't really pick up the AST. So, the concept is similar.

So you used TypeScript for the Type Checker?

@iccir
Copy link
Member Author

iccir commented Oct 22, 2015

Yep, generator.js will spit out TypeScript for the Type Checker. Note that this TypeScript doesn't actually work (don't run the output of tsc), it's just used to get warnings/type hints.

@iccir
Copy link
Member Author

iccir commented Oct 22, 2015

Another issue with the AST approach: JSHint integration relies on its input being a string, as JSHint uses it's own JS parser.

@iccir
Copy link
Member Author

iccir commented Oct 24, 2015

Modified strategy:

  1. transformer.js is used for both the ES5 and TypeScript generation.
  2. The output TypeScript AST will have a few non-ESTree-standard node types, for dealing with type annotations and casts.
  3. The TypeScript AST will be handed into escodegen, which will output a JS string and a sourcemap.
  4. We'll use this sourcemap to map errors from tsc back to the original line number.

@IngwiePhoenix
Copy link

Sounds solid to me. But have you taken ES6 into account, as in, how big will the efford be to change the ES5 handling to be ES6 capable? Otherwise there doesn't seem to be any problem.

@iccir
Copy link
Member Author

iccir commented Oct 24, 2015

The beauty of this (combined with procrastination) is that TypeScript/escodegen/ESTree/estraverse all handle ES6 now :)

@IngwiePhoenix
Copy link

Okay, that super-long name-chain convinced me ;)
But hey, that’s quite the good news! Can’t wait to get started using OJ with ES6.

@iccir
Copy link
Member Author

iccir commented Oct 27, 2015

Sadly, escodegen with source maps enabled is incredibly slow (~2 seconds to generate our source bases, compared to ~200ms with 1.1's generator/modifier). That's going to be a deal-breaker for us (at least for now).

The problem may be with source-map rather than escodegen (or specifically, escodegen's approach of wrapping the Array IR with source-map's SourceNode and then having source-map generate the string). I should recheck this once escodegen implements their Dumper (estools/escodegen#214).

@iccir
Copy link
Member Author

iccir commented Oct 27, 2015

(Note: this shouldn't prevent you from using ES6 with OJ. Using the AST vs. using the modifier/compiler has always been about avoiding additional parse/generate steps. Right now you can still use ES6 in the 2.0-wip branch, and you can also pass the result to Babel to transpile back to ES5).

@IngwiePhoenix
Copy link

I see. Well, I'd rather wait for a stable 2.0 release so I can update my oj-loader or the extended version, OhSoJuicy with ES6 support.

But I will get to test oj-2.0-alpha soon to see how it'll go. :)

@iccir
Copy link
Member Author

iccir commented Oct 27, 2015

Without this, I can't think of any reason why I would need to bump semver major. So the next release will probably be 1.2 (with Esprima 2.7 et al.)

@IngwiePhoenix
Copy link

That works for me. :) Ill I wish is to just make sure that when I update my modules with the new support, that I don't run into too many bugs :)

@iccir
Copy link
Member Author

iccir commented Aug 7, 2018

I've been pondering this issue again recently. There have been speed improvements to the source-map project. We could also use babel-generator rather than escodegen.

However, one glaring issue is that oj has to output Typescript in addition to JS. The current string-based solution of Modifier.js allows us to easily do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants