Redesign #485

Merged
merged 512 commits into from Mar 27, 2016

Projects

None yet

7 participants

@disnet
Contributor
disnet commented Aug 2, 2015

Redesign

So I've been working on a rewrite/redesign of sweet.js. Turns out that what we currently have is kinda a rolling collection of hacks that are in desperate need of some rethinking. Things are slow and just getting slower in the branch with ES6 module support so something has to give.

What I'm pushing right now is definitely a work in progress but I think it shows promise. In particular I think it's much more comprehensible for people coming to the codebase for the first time. Read on if you're interested in helping out or just curious as to what might be changing.

Comments/opinions requested.

The current todo status looks something like this:

  • add support for all forms (right now I have some of the obvious ones but switch, yield and lots more aren't supported)
  • add hygiene (actually should be straightforward, hooks are already in the right places)
  • add declarative macros (rule and case, currently just primitive macros "work")
  • add module support (not straightforward but doable)
  • add infix macros
  • add custom operators
  • spec out multi-token equivalent
  • add line number and sourcemap support
  • test each syntactic form
  • port old tests
  • add perf benchmarks
  • add descriptive error messages

No more destructuring

One of the big areas of slowdown came from the fact that we weren't doing real parsing. Expansion worked by building up a partial AST (TermTree) and then throwing all that work away by destructuring the partial AST back to an array of tokens and fed that to esprima to actually produce a real AST.

Now instead of doing all the parsing work twice, we just build the complete AST. This is being handled by two data structures, a Term (roughly equivalent to the current TermTree) that acts as a partial AST (some terms hold syntax objects and some hold other terms) and a Node that is just an ESTree representing the complete AST.

A Term has two methods, parse and expand. The parse method returns a new corresponding Node while expand is roughly equivalent to expandToTermTree in the current expander (ie handles some hygiene details and walks down partially expanded Terms).

The final Node is shipped directly to babel (via transform.fromAST) because ES6. All sweet.js code you write is now ES6 (or at least as much as babel can support).

Recursive descent enforest

The old enforest was weird and complicated and basically a giant if block. The new enforest is still weird and complicated but at least it's a bit more modular.

Immutable.js

Currently the expansion algorithm is written as if we were using lists when in fact it's arrays all the way down. Lot's of calls to concat that should not be happening if we care about performance.

Now we're using immutable.js lists of syntax objects which should be better.

New syntax transformer type

There's a new syntax declaration form (analogous to var/let/const for compiletime values) that looks like syntax <id> = <expr>. Previously sweet supported a couple different primitive macro forms but you could only really put macros into the compiletime environment. This new form allows you to put whatever you want into the env.

Normally (e.g. Racket, old sweet) syntax transformers (aka macros) are just functions but I'm changing things up. Macros are actually objects with two methods match and transform.

syntax m = {
    // List[Syntax] -> {subst: Substitution, 
    //                  rest: List[Syntax] }
    match: function(stxl) {
        return {
            subst: [],
            rest: stxl.rest()
        };
    },
    // Substitution -> List[Syntax]
    transform: function(subst) {
        return syntaxQuote { 42 };
    }
};

The reason for breaking matching and transforming out into two functions is hygiene. Currently sweet has to pass some hygiene information to primitive macros so that they can mark syntax that they match and their result syntax. This is gross and dangerous; badly behaved primitive macros can mess us hygiene in various ways. By splitting macros into two functions we can pull the hygiene manipulation code back into the expander.

More details to work out here but I think this is the right factoring.

This of course is just the primitive form, declarative rule can case macro forms will be built on top of this.

No multi-tokens

Right now you can do things like macro (number?) { ... } to create multi-token macros. This massively complicates the expander for not much gain. Hacking the lexical structure of your language can be done with readtables (thanks @jlongster!) so let's do that instead.

Limit infix macros

Infix macros are cool but maybe too cool. The enforestation of operators is massively complicated because we want to allow infix macros to be very flexible. Some heroic work was done by @natefaubion here but I think we are both of the opinion it's not actually worth it.

I think we can still have them but just in a restricted state. My proposal is that they can only match on previously seen Terms and operators create implicit delimiters that infix macros can't "see" out of (so in 2 + inf 42 the inf macro sees an empty prefix list). Just my initial intuition, more details to work though.

@natefaubion
Contributor

I think overall, this is a much needed change, and things are looking a lot better from what I've glanced at.

Right now you can do things like macro (number?) { ... } to create multi-token macros. This massively complicates the expander for not much gain. Hacking the lexical structure of your language can be done with readtables (thanks @jlongster!) so let's do that instead.

I don't agree with this. Without multi-token macros (or some equivalent) it makes it impossible to create (importable) custom symbolic operators. There may be a better of doing it than before, but I think if we are going to keep custom operators, we need a way to keep this around. I don't think read tables are a good solution for this. Read tables are neither modular nor composable. If we want read tables to be used, then I would like to see them used more along the lines of Racket's #lang declaration with an expanded API for controlling the expansion of a module.

I think we can still have them but just in a restricted state. My proposal is that they can only match on previously seen Terms and operators create implicit delimiters that infix macros can't "see" out of (so in 2 + inf 42 the inf macro sees an empty prefix list). Just my initial intuition, more details to work though.

I strongly agree with this. If you make it a single term look behind only (enough to implement arrows), you don't need to keep the nasty zipper-like structure we were using at all.

Macros are actually objects with two methods match and transform.

I don't really like the ad-hoc record. I think we could still keep the single function definition if we had a safe API for extracting syntax from the context rather than manipulating it directly, but I don't know what that would look like. In general, I think it would be good to think about the low-level APIs we need to build things like declarative macros as libraries rather than hard wiring them into our own opinionated system. I think it would be great if sweet.js was just a pure low-level core, but that's just me.

If we did really want to go this route, I think it would be better to go all-in with ES6 classes, and have something like a Macro base class that you can import from some magic-module, where appropriate hooks are exposed. This also makes it easier to pickup which compile-time values are actually macros. Maybe you could also use this to expose AST-transformers rather than just pure syntax transformers. I don't know what that would look like though, just a thought.

@disnet
Contributor
disnet commented Aug 3, 2015

Read tables are neither modular nor composable. If we want read tables to be used, then I would like to see them used more along the lines of Racket's #lang declaration with an expanded API for controlling the expansion of a module.

Good points. I'm adding a todo item to spec out what we really need for this. I'd really like to keep multi-token code out of the main expander so my hope is that modules are sufficient. We'll see.

If you make it a single term look behind only (enough to implement arrows), you don't need to keep the nasty zipper-like structure we were using at all.

Is there any reason to cap it at a single term?

I think it would be great if sweet.js was just a pure low-level core, but that's just me.

Totally agree. My thinking is that syntax transformers should be as low-level as possible but not lower than hygiene. Hygiene should be completely handled by the expander, which means that the expander needs to (at minimum) distinguish between syntax provided to the transformer and syntax returned by the transformer.

You're right though, we could still accomplish this with just a single function if the syntax extraction API is "safe".

Hmm...actually the match/transform approach has another problem which is that the rest syntax returned from match is not guaranteed to be correlated with the syntax it put into the substitution (e.g. match could grab all the syntax but return exactly the list provided to it). A buggy match could do some damage.

Ok, so maybe like you said we just need a good and safe extraction API, which would obviate the need to break things out into a pair of functions.

Here's some first thoughts.

Have macros take as their only argument an iterator-like object (so macros don't get direct access to the syntax list, just the iterator). Each time next is called the returned value can be appropriately marked by the hygiene algorithm and when the macro returns the expander knows exactly where the "rest" of the syntax is (whatever was not consumed by the iterator).

So the type of a transformer is something like Iterator -> [Syntax].

syntax id = function(ctx) {
  let macro_name = ctx.next();
  let arg = ctx.next();
  return [arg];
}

A critical part of the API of course is matching against bits of the grammar (like :expr) so the iterator could also have methods like nextExpression that call back into enforest to build up an expression.

// { mylet $id:ident = $init:expr }
// =>
// { var $id = $init }
syntax mylet = function(ctx) {
  let result = [makeKeyword("var")];
  let macro_name = ctx.next();

  let id = ctx.nextIdentifier();
  result.push(id);

  let eq = ctx.nextPunctuator();
  result.push(eq);

  let init = ctx.nextExpression();
  result.push(init);
  return result;
}

This raises the question what should the type of next* be. Currently when sweet sees :expr it builds up a term via enforest and the destructs it down to an array of syntax and then wraps that syntax in a () to keep precedence right (this is confusing to macro authors and leads to bugs). It would probably be better if we could skip the destruct and just have the next* functions have the type () -> Term.

Doing this means that a syntax transformer would now have the type Iterator -> [Syntax or Term] and so the expander would need to handle both syntax objects and terms in the result of a macro.

Does this seem reasonable?

@zackp30
zackp30 commented Aug 8, 2015

Damn I'm looking forward to this. 😄

@jlongster
Contributor

I saw the refactor branch before but somehow missed this PR. This is huge! My favorite part is no more destructuring and building on top of babel. I always hated how much unnecessary CPU time we were using up with all that.

I don't have strong opinions about the new feature-set, as I'm currently not as involved, but man I can't wait to start playing with this again.

@m1sta
m1sta commented Dec 14, 2015

Very excited. In the emerging re-design will the ability to create whitespace-sensitive macros be more accessible?

@disnet
Contributor
disnet commented Dec 14, 2015

Not directly part of the redesign but it will definitely set us up in a good position to start working on that.

disnet added some commits Dec 23, 2015
@disnet disnet Dehydrate scopeset too abb391b
@disnet disnet Start work on resolve f77f069
@disnet disnet Make compile give a string to babel instead of an AST
will need to do a bunch more work to convert the shift AST to a babel compatible AST
e72bfb0
@disnet disnet Add full resolve f7122a1
@disnet disnet Add basic support for AssignmentExpression 55e3d37
@disnet disnet Start working on adding scopes during expansion 3022b59
@disnet disnet Start redesign of the reader 8a1b415
@disnet disnet Add serialization via transit-js b68a19c
@disnet disnet Reformat 58765fc
@disnet disnet Remove flow 7335f62
@disnet disnet Add serialize/deserialize support for symbols add8dc6
@disnet disnet Make hygiene tests pass 5291570
@disnet disnet Refactor 8a99bea
@disnet disnet Add hygiene to function declaration fa0c25b
@disnet disnet Using the compiletime environment for identifier expressions ca1869e
@disnet disnet Add more scopes 875fd23
@disnet disnet Apply flipped introduced scope to macro result 9ef7d1e
@disnet disnet Add basic support for hygienic var bindings c502933
@disnet disnet Make tests to handle hygiene better 633ae1a
@disnet disnet Clean 4e5553d
@disnet disnet Remove the use scope from declarations e7fba3b
@disnet disnet Refactor 5b6dca0
@disnet disnet Add some regex disambig 8630835
@disnet disnet Add more cases to the reader b347374
@disnet disnet Add more cases for reader ec2a245
@disnet disnet Add more reader cases 753cc06
@disnet disnet Add more reader cases 59a4707
@disnet disnet Handle / in nested delimiters 4a48c00
@disnet disnet Remove old reader c67587f
@disnet disnet Turn back on more reader tests 9a5e4fb
@disnet disnet Handle operators and punctuators better b0c738c
@disnet disnet Use gulp 60b2e2b
@disnet disnet Add basic parsing of imports ae6b4ed
@disnet disnet Add basic support for export declarations 8589f75
@disnet disnet Add very basic module support facc6c9
@disnet disnet Refactor term expander 8518682
@disnet disnet referenced this pull request Jan 11, 2016
Closed

Comment Bug #464

@vendethiel

console.log("Hello, Racket"); :). nice!

@disnet
Contributor
disnet commented Feb 29, 2016
@disnet
Contributor
disnet commented Mar 20, 2016

So in super-cool news the redesign is now self-hosting! Sweet.js is once again building sweet.js!

This means that I'm calling the redesign officially close enough for merging into master. I'm planning on waiting a week to see if any blocking bugs show up but after that I think we can start iterating on a much better foundation.

The pre-release version is available under the pre npm tag (i.e. install it with npm install sweet.js@1.0.0-pre.0). The documentation is available here.

@Alexsey
Alexsey commented Mar 22, 2016

Link from previous post 'here' doesn't work :(
But generally it's a great news, congratulations! 👍

@disnet
Contributor
disnet commented Mar 22, 2016

@Alexsey I split the documentation out into the tutorial and the reference.

@disnet disnet merged commit a3c302d into master Mar 27, 2016

1 check was pending

continuous-integration/travis-ci/push The Travis CI build is in progress
Details
@vendethiel

\o/

@disnet disnet deleted the redesign branch Jan 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment