-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pug-code-gen] Rewrite using babel #2708
Comments
for info I am keen on trying to see how by 'Rewrite using babel' you mean that |
From @ForbesLindesay on November 6, 2015 10:39 Yes, that's what I mean by re-write using babel. In the short term, then-jade should fully fork the code-gen module, but will benefit from a more stable and defined AST structure. In the long term I hope we will have a plugin architecture that allows overriding just the mixin parts of code-gen. react-jade is fairly irrelevant here because it's such a total fork of code-gen anyway (it doesn't share any code-gen code with jade core). |
I'll look into forking this for regading |
From @ForbesLindesay on November 6, 2015 11:56 I suppose all you would need from a plugin API is the ability to override |
I'll look into that ; do you have a sample repository that shows how to derive a new compiler from jade 2.0 ? currently, the inheritance in would you copy/paste as a side question, I guess i would be interested in trying to rewrite |
From @ForbesLindesay on November 6, 2015 13:32 You can create a new No, unfortunately I'm not aware of any projects that do this already, but I've asked Sebastian McKenzie what he thinks and he felt it was a pretty sensible approach. I'll ask if he knows of any projects currently doing this/has any advice on how to get started. |
From @TimothyGu on November 13, 2015 2:36
Have you done so yet? Frankly, the notion of generating a fully qualified JavaScript AST from I also wonder how one can generate source maps with a different language, but I guess adding location information to the Babel AST should suffice. |
I looked a bit at the The basic node construction feature of the AST seem to be given by https://github.com/babel/babel/blob/master/packages/babylon/src/parser/node.js It might be feasible with https://github.com/benjamn/ast-types but I did not check how compatible the ASTs are. |
From @ForbesLindesay on November 17, 2015 16:24 He wasn't aware of anything, but gave me a few pointers on where to get started. We should use:
|
I have vaguely started to look at this (very low priority on my side). While trying to see what I could do with the buffer() method, I wonder whether every new buffer addition should lead to a clearly, a babel plugin could take care of such concatenation afterwards but I wondered what you think because every babel plugin that we will add will have its own running cost. |
From @ForbesLindesay on September 12, 2016 15:3 I think we need to at least make sure that consecutive strings result in I don't have a strong opinion on whether this should be done as part of generating the initial code or in a babel plugin. I think it depends a lot on how difficult the plugin would be to write. |
I worked a bit on this again. I convinced myself that at this stage, it is better (if possible) to try and generate the closest javascript to the current generated javascript (it seems easier to me to compare the 2 generated javascript in parallel). It should be easy to refactor once we have test-parity. Now I hit a problem because of how js strings are generated. In pug we use With babel, declaring literals with
instead of strings like those pug currently generates so we lose the escaping, Can you tell me more about why we do this escaping and if when generating the whole code from AST we would still need it ? I currently don't see a way to have babel generator generate u-escaped brackets from literals the way js-stringify does; I would appreciate some feedback on this. The closest issue I found is babel/babel#4478 note: maybe this has to do with incompatibilities when the generated js is written inside an html script tags
Extract found here by the way it is weird that in this extract that choose \u003c, \u003e, \u0026, \u2028, \u2029 while it could maybe be a babel plugin used when generating literals (?) |
From @ForbesLindesay on October 27, 2016 9:23 This is an interesting point, that I had not considered. The reason we generate those characters, instead of
Option 1 should produce much more efficient code. This may be something babel is keen to offer, but I don't know. I think long term we should avoid escaping everything like js-stringify does. I took the approach of escaping everything in js-stringify to be as safe as possible since I (and many other people) was using it to embed user generated info in inline scripts in JavaScript, i.e. script.
var CSRF_TOKEN = !{stringify(csrfToken)}; The overhead of this extra escaping probably does slow pug down and certainly makes for bigger/less readable output. We should try and minimise this overhead. |
ok. At this stage I think that I will not take this escaping into account. Since babel generator uses https://github.com/mathiasbynens/jsesc for literal escaping and there is a
currently |
for information i could force the escaping using an undocumented feature of stringLiterals via
not sure if we want to keep this is what babylon does when it parses escaped strings. |
From @ForbesLindesay on October 27, 2016 13:44 That's a neat option. We should probably update js-stringify to just do what jsesc does providing that is properly secure when faced with user generated content. I'm tempted to say lets use that hack for now so we don't forget to resolve this later. Up to you though. |
for info i hit into other difficulties with statements like
where code blocks have to be babylon'ed while not parsable in their own right. the freeform javascript + 2 options we currently have
make it not so easy handle all cases. |
I did a custom implementation for if/else if/else code blocks but there are more interesting things with
in order for babylon to parse cases like these (or more complex cases) , we need a strategy. Maybe balance all opened '(' and '{' to make the thing parsable, with a fake statement inside like but I don't know what level of javascript hacks inside the templates we want to support + there could be hairier things like maybe
whatever sense that may have (not tested). If you have ideas I am interested ! |
@ForbesLindesay the "freeform" js that some people may currently inject into pug templates is I think going to be difficult to keep as a promise. Currently, to handle
To keep the current
of course, detection
There could be another option, like buffering all code tokens (while addind markers) until we have a buffer that is balylon parsable (does not throw). While doing this, we would capture the AST of the blocks in their corresponding marker variable. In the second example, that would be something like That would probably be more bulletproof but do you think that detecting a "parsable" block is a correct end-marker for the buffering algorithm ? Clearly I fear that we may create hard to fix backward incompatibilies for some people who have created monster-templates for their own good reasons. I'll do a pause on this now. Right now my code is not in a state of commit I would need to clean up non-generic code on this balancing issue. 212 tests passing, 29 errors (including these balancing problems + mixins that I have not started) |
I did a bit more work on this (hard to pause ;-) The second algorithm kind of works and is the best I can come up with at this stage. I had misunderstood something in now the problem I have is that in
the algorithm works well, since the first parsable snippet is but
breaks because |
From @TimothyGu on October 28, 2016 19:53 I'm not sure how applicable this is, but can you detect if there is a block attached to a specific |
OK. I managed to rewrite the whole thing and have all 241 pug tests pass :-) I realize that in your first message @ForbesLindesay you said
The current algorithm I choose to avoid this is to buffer all the code blocks of the current level (I modified the current impl is on https://github.com/jeromew/pug-code-gen, on the babel branch. I'd appreciate @ForbesLindesay and @TimothyGu If you have some time to take a look at this and tell me what you think. At this stage, I mainly tried to make the tests pass while keeping the code acceptable, but this could use some improvements. Things that are left to do :
The code is less error prone (no dangling brackets) and more clear once you get used to reading ast meta code. The compiler seems slower probably because the structures are heavier to instanciate + we babylon.parse each and every expression that is considered javascript code. The code is then generated and reparsed via Feedbacks welcome ! |
From @TimothyGu on October 30, 2016 1:17 For the |
From @ForbesLindesay on October 30, 2016 16:3 Thanks for doing all this. It may be a few weeks before I get time to look at it properly. I do think eventually it would be good to restrict unbuffered code to valid statements, but that transition is definitely going to be hard on users of pug, so we need to think about it carefully. We could start warning now, we could also support some kind of white-list of the most commonly used non-statements like |
@TimothyGu thanks for the info on the babel based @ForbesLindesay yes I think that we will be able to report line number/file of errors at compilation time + hopefully it should be possible to reach a jade->js sourcemap with close enough mapping. Regarding unbuffered code, I think the implementation i found should give us the choice & an upgrade path. Currently it accepts locally invalid statements just as pug does but a boolean could be added to restrict compilation to locally valid statements only. I believe the main culprits in the wild are I'll try to see how source maps can be generated and tested programmatically. If you have experience on this I am interested. I'll probably fork pug to modify its package.json to point to my forked pug-code-gen so that it can be tested more easily. |
From @ForbesLindesay on October 30, 2016 17:7 I think babel-code-gen will need forking to add support for multiple files. re forking pug. That makes sense. I think we should move most of pug into a mono-repo as it will make testing/releasing much safer and easier. I'd like to keep react-pug, then-pug, pug-cli, plugins etc. separate as it's very useful to have their issue & pull requests separate, but the pug-lexer, pug-parser etc. would probably be better in the same repo but separate npm packages. |
You can install a version of pug using babel AST using https://github.com/jeromew/pug With the help of @TimothyGu I used I could not make I am a bit worried and clueless at this stage regarding performance. I did not measure it precisely, but I do have more red and yellow tests with AST than with the previous method. before:
after
In my opinion, the perf difference should not be significant since I also thought that parsing small expressions would be faster than parsing the final javascript because there is less context to parse. The problem I have is that at this stage, there is no code to "fix" the line/pos of all the AST nodes and that I suspect that this code is going to add more time penalties. We will need this nevertheless to generate source maps. I add @alubbe here since you seem to know a lot about node optimisations and that maybe I am doing something wrong when I build the AST tree (a lot of array push for example). |
From @TimothyGu on October 31, 2016 15:59 babel-traverse is inherently slow because of all the features it supports -- paths, scopes, etc., while babylon-walk is just a bare-bones node-to-node walker. I don't think speed should be that much of a concern currently, but if needed we can profile it in the future. |
From @TimothyGu on October 31, 2016 16:6 And I was wondering what you meant by "fixing" line/pos of AST nodes. If you meant attaching line/pos of Pug source code so that sourcemaps could be generated, we need to add code column support to the parser first. I'm not terribly sure, but we might need to add support for tracking code spans in lexers (as currently only the beginning position is tracked). |
Yes I haven't looked into sourcemaps. I saw that we have a line/pos of the beginning position of pug tokens and was thinking about using that.
would build an AST out of expr with line/pos in the expr string and I thought I would offset this with the line/pos given in the pug node (at least for js code) but I haven't started looking into this. maybe this would not be sufficient for a useful sourcemap. |
From @ForbesLindesay on November 1, 2016 11:12 I think we will need end locations for tokens in the lexer. Once we have that it should be moderately simple, but I'm not sure. |
From @jbsulli on January 17, 2017 6:0 I would be interested in adding the end locations for for tokens in the lexer. Could be handy for some of the stuff I'm playing with and I would like to start getting more familiar with the pug internals. Would that be helpful? Or have you already started working on it? |
From @ForbesLindesay on January 17, 2017 8:47 That would be hugely helpful. Ideally it would be best to update the pug lexer and parser to match the Babylon, |
From @jbsulli on January 17, 2017 9:29 I like that idea. I've been working with Babylon some and would've suggested it. I'll keep you posted! |
From @jbsulli on January 17, 2017 16:13 I should clarify: What would be considered the end of tags/mixins/etc?
My guess is just the end of the name as properties and content would be more in the domain of the parser. Just thought I'd check though. |
From @ForbesLindesay on January 17, 2017 16:13 This is an interesting point. I guess what we possibly ought to do is add In the lexer, the tag ends at position |
Copied from original issue: pugjs/pug-code-gen#21 |
for info I migrated the babel AST version of pug-code-gen to monorepo on https://github.com/jeromew/pug |
For information, I started working on At this stage I use https://github.com/jeromew/pug for both the currently I created a I am not yet satisfied with the I am also waiting on a new babylon version now that |
@ForbesLindesay what is your current thinking on the rewrite of pug-code-gen using babel and direct AST generation ? I have seen your work on the react generator which takes a different approach (I did not really dig into it yet) and was wondering how this all fits together in your view. Is it worth maintaining my AST branch ? is there a target for switching the current pug-code-gen with an AST based solution ? Should I try and rewrite then-pug (which currently uses the AST branch) using an approach similar to pug-react ?). How does this fits with sourcemaps and the work that was done on token lineno/pos ? Thanks for you help |
@ForbesLindesay, @jonathanong can you help me see what is the status of this issue ? tldr:
how can I present you this work to see if this has any chance to reach upstream ? I would understand if you changed your mind and do not want to pursue this AST path. |
Hi @jeromew sorry I've been a bit lax about this. I think the time is probably right to attempt the switch. Can you create a "babel" branch on this repository and apply your changes to pug-code-gen on that branch, then create a pull request. I will then review it. If you can do this by 7th June, I can review it on the 8th June, I might have time before then, but I'm not sure. |
@ForbesLindesay I pushed the PR as #3019 |
@ForbesLindesay Hello this is a ping on this old issue to know if you have any new direction / insights in mind for this. |
From @ForbesLindesay on November 6, 2015 10:1
babel does a really good job of generating source maps. Generating ASTs will also help us only generate proper, syntactically valid javascript. As a bonus we can rapidly run any babel plugins we want on the output for optimisation or implementation of advanced features.
One of the restrictions this will impose is that unbuffered code blocks (e.g.
- var x = 10;
) will now need to be valid statements. This forces people to use our built in constructs forif
,else
,each
etc. I don't think that's a bad thing though, and we can make up for it with much better error messages for syntax errors.I think this is too big a feature to block the 2.0.0 release, which I really want to roll out this weekend. I do think we should hold off on plugins for code-gen until this is complete though.
Copied from original issue: pugjs/pug-code-gen#21
The text was updated successfully, but these errors were encountered: