initial readtable implementation #340

Merged
merged 10 commits into from Jun 27, 2014

Projects

None yet

2 participants

@jlongster
Contributor

This is my initial implementation of readtables. It's very early and not ready to be merged yet, but I wanted to open it up as early as possible for feedback. There's definitely sloppy code in some places, which will be cleaned up. I'm mostly interested in feedback about the APIs and how we handle custom reader extensions.

I went ahead and included an example reader in the source, which will be removed before this is merged. The reader implements the JSX extension to embed HTML in JS. This has been a fantastic example of a reader and has uncovered practical needs that our APIs need to meet.

Theoretically, with this code you can expand this file:

let DOM = macro {
    rule { { $el $attrs } } => {
        $el($attrs)
    }

    rule { { $elStart $attrs $elEnd } } => {
        $elStart($attrs)
    }

    rule { { $elStart $attrs $children:expr ... $elEnd } } => {
        $elStart($attrs, $children (,) ...)
    }

    rule { } => { DOM }
}

<a foo="bar"> {value} <b><c /></b></a>

with the command:

./build/bin/sjs --load-readtable ./example-readtable.js jsx-macro.js

The "example-readtable.js" file will be looked up relative to the sweet files, not your working directory, so that should work. Just a stop gap for now. That should output:

React.DOM.a({ foo: 'bar' }, ' ', value, ' ', React.DOM.b(null, c(null)));                                               

There are 3 integration points that I'd like feedback on:

  1. Hooking the readtable into the parser. This is in parser.js. Esprima doesn't have a parser object, it all functions within the same file and assumes closures. This makes is hard to call an external module and give it the parser. I had to create a fake parser object with APIs that can access the closure. This fake object is the parser API provided to a reader, and we need to think through what it should be.
  2. readtables.js implements a base reader object that can be extended with CustomReader.create. This provides higher-level methods for a reader. It will scan for tokens and automatically queue them in a buffer, and provides methods like reset which will bail out and reset the parser to the state before the reader was invoked.
  3. How we actually load readtables is up for debate. I think this is where we'll need to do the most research on. Racket has a powerful mechanism for specifying a language at the top of the file. JSX actually requires you to put an annotation at the top of the file. The nice thing about this is that you can run the expander across a whole bunch of JS files and know that it will only touch yours. We could provide an option for this mode, and have some sort of language file that specifies reader extensions and macros to load. Or we can just provide CLI options.

EDIT: A minor point to #3 above, the React compiler not only looks for the JSX annotation at the top of the file, but also parses an argument off of it. The annotation is /** @jsx React.DOM */, so it only prefixes the DOM elements with that you pass. (a is turned into React.DOM.a)

Racket only allows reader extensions to be invoked on expressions. I think that makes a lot of sense. In fact, for JSX I have go through more work to make sure that the < character is in place of an expression. The problem is that at the readToken stage we know nothing about the shape of the code or if we are at an expression or not. Not sure how Racket does this, but it's something to think about.

I made it so that you cannot override delimiters. Reader extensions are only checked after delimiters are parsed in readToken. It seems like too much power to override delimiters and I don't think we should allow it; the chances for something to go wrong are high.

That is my braindump of all of my work. It's probably somewhat scattered, but it feels nice to get it out there so that feedback can help shape it early on.

@jlongster jlongster referenced this pull request Jun 11, 2014
Closed

reader extensions #334

@jlongster
Contributor

I've got some updates coming to this, with a much clearer API and explanation which will be easier to discuss, so don't worry too much about going through this yet.

@jlongster
Contributor

This is a far better and cleaner implementation of readtables. I've rebased it so the diff is clean and you can see only what I've changed.

@disnet and @natefaubion can you look this over and see if there are any major problems with it? I'd like to land it soon else I get distracted/involved with something else and let this bitrot.

The general idea is to expose a parser API to the reader. This API is pretty minimal, and you have to take care to handle things correctly in the reader. The API includes the scan* functions, and access to the internal parser state (index, source, etc). There are few other convenience functions, but nothing that should make it hard to integrate a different parser if we ever wanted to.

The reader is a simple object that has an extend method for creating new readtables. It also has characters as keys and functions as values that are called from parser when that character is hit. You can call extend with a new set of keys/functions to make a new readtable, as seen here: https://github.com/jlongster/jsx-reader/blob/master/jsx-reader.js#L930. Currently the default readtable does nothing, but in the future we could potentially even implement things like string templates as a reader extension instead of builtin.

You change the current reader by calling sweet.setReadtable which internally calls parser.setReadtable.

Right now you are only able to hook into punctuators. Racket allows you to specify whether or not you only want to be a punctuator or literally anything else. I think only allowing punctuators right now is fine..

I feel like this is a pretty good first pass, and we can tweak it or add to it if needed later.

@jlongster
Contributor

Additionally, I should probably write some tests,

@disnet
Contributor
disnet commented Jun 18, 2014

This is looking awesome! I'll take a deeper look later today.

In addition to a few tests would it also be possible to add some basic documentation here?

@jlongster
Contributor

Sure! I can add docs and tests. I want to figure out one last thing first: how to easily load readtables. Right now there's a --load-readtable switch, and we can figure out a short switch (let's say -l). My concern is that it's confusing from the user's perspective to have to do -l jsx-reader -m foo1,foo2, instead of just -m jsx-reader,foo1,foo2. It's nice to just say "load this module" and we install everything from it.

However, as Nate pointed out, eventually -m will go away when we have real modules. So we do need a different way to load readtables. Additionally, it's nice if readtables can be parameterized, though I haven't seen this in other systems. JSX expands something like a to React.DOM.a, but it knows to insert React.DOM from an annotation at the top of each file (/** @jsx React.DOM */). Maybe it is best to just have a separate switch like -l, which takes a module and any string parameters to it (so it would be sjs -l jsx-reader React.DOM. It would be kind of neat to have something like Racket where you indicate the "language" for each file, but I don't know how that would look and would require all of the current JSX users to change all their files.

The JSX reader depends on a macro (takes away a lot of token mangling). Instead of forcing the user to always do -l jsx-reader React.DOM -m jsx-reader, I'd like for the readtable to install the macro behind-the-scenes. I noticed loadMacro in sweet.js, which adds macros to a loadedMacros array, but that array doesn't seem to be used anywhere. I fixed this in the latest commit so that it passes it to expand. Is that right? Now my jsx-reader module can install macros that it can use.

One final thought: if we can't specify the language in the file itself, maybe we could have a .sjsconfig file that specifies readtables (and possibly global macros? are we still going to have global macros when we get modules?) I know people probably won't like this. I just think it's great to just do sjs for projects that don't need big build systems.

@jlongster
Contributor

Actually, in regards to the config file, if you can imperatively import macros with loadMacro and import readtables with setReadtable, you totally could just have a lang.js file and run sjs -m ./lang.js. That would work very well for projects without a build system.

@jlongster
Contributor

I vote that for now, we keep readtables and macros separate when it comes to loading them. For now let's just add another switch for loading a readtable. If you have a package that provides readtable extensions and macros for the user, they need to load them separately. We'll figure out how they interact better when we work on modules.

Additionally, to help with the use case of using helper macros inside readers, I added a new API expandSyntax. This seems useful anyway if you want to do controlled, programmatic expansion. It takes syntax objects instead of a string, and returns a token tree instead of a flat list of tokens. With this, I can internally invoke my helper macro easily: https://github.com/jlongster/jsx-reader/blob/master/jsx-reader.js#L182.

Now there are no macro dependencies, you just need to load the readtable. I think this is a good starting point, and we can improve this over time.

I'll start writing tests/docs next, probably not for a few days though, have some other stuff to do.

@disnet
Contributor
disnet commented Jun 19, 2014

I vote that for now, we keep readtables and macros separate when it comes to loading them.

Agreed. A more friendly API/syntax can come later.

@jlongster
Contributor

Added some tests. The API is pretty minimal, but the tests cover the basic cases and some of the fine details. I will write some docs later this week.

@disnet

Does it make sense to add some token constructor functions? Something like makeStringToken, makeIdentToken etc.

Owner

That would be a good idea! It had occurred to me but I forgot. It would be easy to do that.

Does the API handle the lineNumber/range etc. properties on tokens you create and return from here?

Actually it looks like it doesn't. If you try to return a token with a missing range property you get an error:

/Users/tim/Dropbox/dev/sweet-js/lib/parser.js:5086
            throw e;
                  ^
TypeError: Cannot read property '0' of undefined
    at lex (/Users/tim/Dropbox/dev/sweet-js/lib/parser.js:1410:32)

Is it easy enough to patch up the line/range info on the readtable side or do we need to push that back on the user (with a better error message)?

Owner

Yeah, whatever token you return, it uses. You can return multiple tokens so it can't track what you are doing. In jsx-reader I have a helper function to attach the location: https://github.com/jlongster/jsx-reader/blob/master/jsx-reader.js#L13.

This is actually a great use case for adding the "make token" functions like makeStringToken. These could automatically attach location information with them. The only thing is the range: sometimes you want to manually specify the start/end range, but that could be part of the make* functions api, and that's rare.

Owner

Check out the make* functions I added in this commit: 3137aed

Did I handle strings and regexs right? It looks like the string regex has an "octal" option, which you can pass in. My main worry is regexs; sometimes that token type has a literal field which is a string representing the regex, while value is a RegExp instance. Sometimes it doesn't have the literal field though. Do I need to add it?

Ah, of course I remember to check your comment after I merge :)

Yes we do need the literal field. I think the only time that field doesn't exist is when vanilla esprima is run with the --tokens flag. Not sure what might go wrong with out it (maybe escodegen requires it?) but at the very least we can be consistent with the syntax object version of makeRegExp.

Anyway, it's simple to add so I just did it in commit 79c0648

Owner

Ok, thanks! I was expecting the user to pass a real RegExp object, not a string though. Should we assume in all the make* functions that they are passing strings and do the string->value conversion for them? (like in parseNumericLiteral, needs parseFloat)

Ah! This is a good point, we should keep the function signatures consistent (either all passing in values or doing string conversion). I think your choice of values over string conversion here is the better one but this means that the existing makeRegex in syntax.js is wrong.

I've switched this one back ad430ec but we should update the syntax version to be consistent. It's a breaking change so maybe we'll hold off until we can queue up a few other breakers.

@disnet

Why are we passing parser to the callback? Won't you always be defining the callback in a context where you have access to parser like here?

Owner

It's a little confusing, but the parser object you get here is a different parser object then you get from require('./parser'). We are providing a different API, one that has access to the internal state and a lot of internal functions: https://github.com/jlongster/sweet.js/blob/reader-extensions/src/parser.js#L5560.

We could add all of these APIs to the normal parser exports, which would work, but I didn't know if we want to open all of that up or not.

Ah right! That makes sense. We definitely don't want to give internal state on the normal export.

Owner

It does feel a little weird to access the API that way rather than just importing a module. If you like how it currently is, maybe I could change calling the object from parser to reader. Calling it reader gives it a clear purpose and separates it from the parser module.

Owner

Should we also change all the scan* methods to read* in the API provided by the reader object? That seems to make sense to me, but maybe we want to stay consistent with the parser.

The name scan* only makes sense in the context of esprima and there's no reason users of the readtable API should care about that so yeah changing to read* is a great idea.

parser->reader is good. Are there any exports on the "normal" parser that aren't needed for the readtable? If so we could restrict reader to just the internal state exports needed for readtable.

@jlongster
Contributor

Added a bunch of docs. Feel free to look over and suggest changes.

@jlongster
Contributor

I think I am done here, tests and docs included. Please let me know if something looks wrong. I would love to get this merged soon!

@disnet
Contributor
disnet commented Jun 26, 2014

Wahoo! I'll take a look and merge it tonight.

@disnet disnet merged commit 2bed05c into sweet-js:master Jun 27, 2014

1 check failed

continuous-integration/travis-ci The Travis CI build failed
Details
@disnet
Contributor
disnet commented Jun 27, 2014

This is looking great! I'll do a release next week assuming nobody finds a showstopper over the weekend.

@jlongster
Contributor

Awesome, thanks for merging!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment