New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concrete Syntax in tree #41

Open
getify opened this Issue Mar 4, 2015 · 87 comments

Comments

Projects
None yet
@getify
Contributor

getify commented Mar 4, 2015

I'd like to open up the discussion of what our various options are for preserving so called "Concrete Syntax" elements in the parse tree format.

What are Concrete Syntax elements?

A simplified definition of "concrete syntax" is any stuff that could appear in a source code program which, when it's parsed, would be otherwise discarded information that an AST (as it stands now) does not represent.

That specifically means that things which are reliably inferable from an AST's nodes are not concrete syntax. For example:

var a = 2,
    b = (a + 2) * 3;

Here, the ( ) around a + 2 is not represented in the AST, because the structure of the tree, combined with operator precedence rules, absolutely implies that it must exist, and moreover a + 2 * 3 would have been a different tree structure. Thus, the ( ) in the original program can be reconstructed from the AST reliably, so it need not be stored separately. It is not concrete syntax.

However:

var a = 2,
    b = a + (2 * 3);

This program includes a pair of ( ) that would already be implied by the tree structure and operator precedence, and thus would also not be stored.

But, critically, they also would not be re-generated when the tree was reconstituted. Why? Because it's impossible to know from the tree alone if the ( ) was really there, or just implied. And code generation takes the conservative path and doesn't make up ( ) where it's not sure they were there, so it leaves them out.

Herein we see that the original ( ) are not abstract syntax, but concrete, and to preserve them is going to require some other solution.

Examples of concrete syntax:

  • extraneous whitespace (the non-significant kind)
  • comments
  • extraneous ( ) (used primarily for readability more than functionality)
  • ... other?

The Rabbit Hole

Just how deep does this rabbit hole go? In the following snippet, every single /*x*/ comment represents a location where whitespace and/or comments can optionally appear as concrete syntax, in addition of course to the fact that some of those locations require significant whitespace, such as /*4*/:

/*1*/class /*2*/ Foo /*3*/ extends /*4*/ Bar /*5*/ {
   /*6*/constructor /*7*/ (/*8*/x /*9*/ = /*10*/ "hello" /*11*/) /*12*/ {
      // ..
   }

   /*13*/ static /*14*/ bam /*15*/ ( .. ) {
      // ..
   }
}

var /*16*/{ /*17*/ x /*18*/: /*19*/ { /*20*/ y /*21*/: /*22*/ z /*23*/ = /*24*/ 2 /*25*/ } } =
   /*26*/new /*27*/ Foo /*28*/ ( (/*29*/) /*30*/ => /*31*/ { .. } /*32*/ ) /*33*/;

Your imagination can probably take it from there. There's a whole slew of complex ES6 syntax forms which implies a deep rabbit hole of nooks and crannies where we need to be able to preserve information (in some way) that our normal approach to AST doesn't currently preserve.

Consider the tree structure for the arrow function expression, for example... how and where could we represent /*29*/ and /*30*/? /*31*/ and /*32*/ are a little clearer.

Why Concrete Syntax?

Why would we want to preserve all these pieces of concrete syntax? There's several use-cases:

  • Any tool which performs localized transformations on a source code file, which doesn't want to change everything, but only a targeted specific thing. For example, a tool that does nothing but rewrite all variable names to uppercase versions (for whatever silly reason).

    The goal of this tool is not to affect anything else about the program, such as formatting and comments, as those might still be important to the author of the file.

  • Fully configurable automated code formatting: not just rule based, like "always put a space between an ( ) on a call" or stuff like that, but more fine-grained rules, where a tool might parse a source program and produce a tree structure with very specific information in it about how the resultant code should be re-generated.

    For example, it may automatically insert comments for each parameter in a function declaration with some sort of annotations about how and where the param is lexically used, etc. Or a code-style painter may "repaint" a file with spaces for indentation vs tabs, or may insert spaces for alignment and indentation with tabs, etc etc.

I could go on speculating, but I'll just leave it that there are definitely cases for tools which want to be able to preserve concrete syntax wherever it appears. Since concrete syntax by definition cannot be inferred, the parser and data structures that come out of it must be able to do so. Moreover, this information must be something a code generator can receive and use.

How?

Here's the part where all the bikeshedding is going to happen.

To date, conversations around this topic have happened many times that I've been privy to, and there's never been any kind of consensus on how to approach solving it. I have my ideas, but I'm only going to suggest them as a possible starting point proposal, not that it has to be this way.

CST-as-AST Proposal

I believe we should have one unified tree structure, which has optional -- what I call "extras" -- annotations (and in a few limited cases, nodes) in it which represent the necessary hooks for preserving these concrete syntax elements. In other words, I propose that there be no difference between an AST and a CST (concrete syntax tree), other than the absence or presence of CS elements in the tree.

Any tool which currently produces ASTs is a tool that's producing CSTs by default, but which just happens to not actually be keeping any of the CS elements yet. These tools can start keeping the CS elements, but still have the same style of tree they're producing, just with extra info in them.

Any tool which consumes ASTs is a tool that's already consuming CSTs by default, but which is just ignoring any CS elements which may be present. These tools can start using the CS elements they find.

It turns out that most of the places where we need to preserve CS elements can be added as additional properties (again, I call it "extras", with extra sub-names like "before", "inside", and "after" for positioning), which means that there would be zero impact to the existing tools that use such a tree format.

Tree producing tools (parsers, transpilers, etc) could just not produce these annotations, but things still work fine. Tree consuming tools continue to consume the trees as they currently do, and just ignore the these extras as they currently do.

I believe this has the most minimal impact to the existing tool ecosystem, and thus the easiest path to wider adoption by more tools.

Downside

There will be some minor places where the node structure will have to be slightly different to accommodate some of the trickier cases of CS positioning.

For example, anonymous function expressions that have an id of null means that we don't have an object value in that node to attach any extras annotations to in the function/*1*/() position.

If we could represent an anonymous name entry instead of id: null as an object like id: { extras: .. }, this will mean we have a hook to annotate those extras.

This does represent a slight breaking change to the format, but it's not a major sweeping new tree format, and will require on the whole just a small bit of extra handling care. These necessary node structure breaking changes are minor and very few for the pre-ES6 tree structure (aka SpiderMonkey).

The new ES6 forms definitely add more places where we should consider tree structure from the beginning which are amenable to attaching these annotations. Since there's not already an established standardized ES6 tree format, I think it's not too late for us to consider these concerns as we specify the ES6 node forms.

@michaelficarra

This comment has been minimized.

Show comment
Hide comment
@michaelficarra

michaelficarra Mar 4, 2015

Member

... other?

Off the top of my head:

  • escape sequences
  • string delimiters
  • number formatting
  • high precision or very large (magnitude) numbers
  • almost all semicolons
  • others depending on AST structure
  • elisions
Member

michaelficarra commented Mar 4, 2015

... other?

Off the top of my head:

  • escape sequences
  • string delimiters
  • number formatting
  • high precision or very large (magnitude) numbers
  • almost all semicolons
  • others depending on AST structure
  • elisions
@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 4, 2015

Contributor

@michaelficarra great points, knew I was forgetting some of them. :)

Contributor

getify commented Mar 4, 2015

@michaelficarra great points, knew I was forgetting some of them. :)

@michaelficarra

This comment has been minimized.

Show comment
Hide comment
@michaelficarra

michaelficarra Mar 4, 2015

Member

Any tool which performs localized transformations on a source code file, which doesn't want to change everything

I disagree that this would require full concrete syntax information. Localised replacements can be done with source position information alone. But I still agree that concrete syntax information has its uses.

I believe we should have one unified tree structure

I completely agree. This is the evolutionary route forward.

For example, anonymous function expressions that have an id of null means that we don't have an object value in that node to attach any extras annotations to in the function/*1*/() position.

Exactly. This is one of the many reasons I have argued in the past that each piece of concrete syntax information should have its own named property on each node. For example, each position whitespace/comments are allowed would have a named field in the CST. Additionally, each of the concrete syntax elements I mentioned in my comment above need to be represented on the appropriate node. This is incompatible with a simple string for "before/after/in-between" concrete syntax.

Finally, are we sure we want to have this discussion here? It seems a bit out of scope for this project. Also, readers should check out previous CST discussions/proposals:

Member

michaelficarra commented Mar 4, 2015

Any tool which performs localized transformations on a source code file, which doesn't want to change everything

I disagree that this would require full concrete syntax information. Localised replacements can be done with source position information alone. But I still agree that concrete syntax information has its uses.

I believe we should have one unified tree structure

I completely agree. This is the evolutionary route forward.

For example, anonymous function expressions that have an id of null means that we don't have an object value in that node to attach any extras annotations to in the function/*1*/() position.

Exactly. This is one of the many reasons I have argued in the past that each piece of concrete syntax information should have its own named property on each node. For example, each position whitespace/comments are allowed would have a named field in the CST. Additionally, each of the concrete syntax elements I mentioned in my comment above need to be represented on the appropriate node. This is incompatible with a simple string for "before/after/in-between" concrete syntax.

Finally, are we sure we want to have this discussion here? It seems a bit out of scope for this project. Also, readers should check out previous CST discussions/proposals:

@kittens

This comment has been minimized.

Show comment
Hide comment
@kittens

kittens Mar 4, 2015

Contributor

@michaelficarra

Finally, are we sure we want to have this discussion here? It seems a bit out of scope for this project.

Completely agreed. We haven't even agreed on the entire ES6 AST spec and there's still a lot of work to do so getting sidetracked into CST discussion so early is only going to be harmful IMO.

Contributor

kittens commented Mar 4, 2015

@michaelficarra

Finally, are we sure we want to have this discussion here? It seems a bit out of scope for this project.

Completely agreed. We haven't even agreed on the entire ES6 AST spec and there's still a lot of work to do so getting sidetracked into CST discussion so early is only going to be harmful IMO.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 4, 2015

Contributor

I agree we should defer discussion of CST until we get past thorny bits of ES6, however, we need to develop the ES6 bits with the CST in mind as a future concern. That is to say, when considering ES6 features, we should consider whether the specced feature is harmful to future CST implementation.

Contributor

mikesherov commented Mar 4, 2015

I agree we should defer discussion of CST until we get past thorny bits of ES6, however, we need to develop the ES6 bits with the CST in mind as a future concern. That is to say, when considering ES6 features, we should consider whether the specced feature is harmful to future CST implementation.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 4, 2015

Contributor

There is also a ton of prior art here in JSCS, ESLint, ESFormatter, and friends to how a CST may work. In terms of a specific proposal, I have ideas based off what both JSCS and esformatter have arrived at.

The basic concept is to formalize the token list, and leave the AST as is, except have a formal link between nodes in the tree and the position of the first and last tokens of that mode in the token list.

The AST portion needs to then formalize what constitutes the tokens of a node (which includes question like "what about grouping parens and semicolons). Each node in AST would have a firstToken and lastToken property that points to the tokens in the token list.

Once you have a link between the tree and the tokens, you augment the token list to be a doubly linked list (for ease of traversal or insertion) that also includes comment tokens, introduces an EOF token, and adds a whitespaceBefore property to each token representing the non comment whitespace between tokens.

Rerendering the source file is as simple as iterating over all tokens and concatenating whitespaceBefore with value of each token.

Augmentation of nodes remains trivial because you can just do a linked list replacement on the replaced nodes first and last token.

This separation allows abstract consumers to not really think about concrete, and vice versa.

Contributor

mikesherov commented Mar 4, 2015

There is also a ton of prior art here in JSCS, ESLint, ESFormatter, and friends to how a CST may work. In terms of a specific proposal, I have ideas based off what both JSCS and esformatter have arrived at.

The basic concept is to formalize the token list, and leave the AST as is, except have a formal link between nodes in the tree and the position of the first and last tokens of that mode in the token list.

The AST portion needs to then formalize what constitutes the tokens of a node (which includes question like "what about grouping parens and semicolons). Each node in AST would have a firstToken and lastToken property that points to the tokens in the token list.

Once you have a link between the tree and the tokens, you augment the token list to be a doubly linked list (for ease of traversal or insertion) that also includes comment tokens, introduces an EOF token, and adds a whitespaceBefore property to each token representing the non comment whitespace between tokens.

Rerendering the source file is as simple as iterating over all tokens and concatenating whitespaceBefore with value of each token.

Augmentation of nodes remains trivial because you can just do a linked list replacement on the replaced nodes first and last token.

This separation allows abstract consumers to not really think about concrete, and vice versa.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 4, 2015

Contributor

Note that the approach above is essentially how both popular AST based formatters JSCS and esformatter augment the existing AST to generalize their approach.

Contributor

mikesherov commented Mar 4, 2015

Note that the approach above is essentially how both popular AST based formatters JSCS and esformatter augment the existing AST to generalize their approach.

@millermedeiros

This comment has been minimized.

Show comment
Hide comment
@millermedeiros

millermedeiros Mar 11, 2015

I would really like a format that makes it easier to add/remove nodes and tokens, augmenting the AST won't make it easy to manipulate.. each node type will require an unique logic. reconstructing the code will be hard (can't do a simple string concatenation). And it won't be a Concrete Syntax Tree since it won't contain all the tokens that composes the Program, you need to infer that an IfStatement contains {} by looking at the consequent.type...

The linked list (or array of tokens) have some advantages:

But also some drawbacks:

  • main drawback is that you don't have a link between the tokens and the nodes, so:
    • if you add a new element it won't be part of the AST (which might negate some of your logic and/or make things way more complex)
    • adding new tokens at the edges of the node might cause some undesired side effects (reference to startToken and endToken on the AST will point to wrong tokens..)
    • hard/expensive to detect all the nodes that contains that token (all tokens are part of Program)
    • you lose reference of what really an Identifier, Literal or Keyword are part of (ie. you don't know if function is an expression or declaration and what surrounds it)
  • hard to find the matching (), {}, [] (need some sort of state machine that counts how many opening/closing parenthesis in between..)
  • can't be serialized as JSON given the circular references

In my opinion the best format for code manipulation would be structured like a tree but would be more similar to nested arrays (sequential structure from left to right) than nested object (need to know object structure to be able to infer the position of tokens and which tokens exists in that range), it would have the following characteristics:

  • easy to insert new nodes and tokens before/after/around/inside any node
  • easy to remove/replace nodes and tokens
  • easy to convert back into a string
  • easy to find matching (), {} and []
  • easy to find if expression is surrounded by () (even if multiple nested ())
  • easy to find , in between arguments/parameters/values
  • easy to find ; in between expressions
  • easy to identify if node ends with ; or LineBreak (specially for things like ReturnStatement)
  • straightforward logic to loop through all the tokens inside a node/program (changes can be made without full context)
  • can be serialized as JSON
    • means that we would need helper libs or maybe even enhance the CST with extra properties (eg. reference to parent node) to be able to traverse it easily (imagine you need to get a reference to parent node and/or next/previous token)

The CST wouldn't need any range/loc info, specially since these get outdated after each manipulation and can be easily computed by looping through all the tokens that came before.

millermedeiros commented Mar 11, 2015

I would really like a format that makes it easier to add/remove nodes and tokens, augmenting the AST won't make it easy to manipulate.. each node type will require an unique logic. reconstructing the code will be hard (can't do a simple string concatenation). And it won't be a Concrete Syntax Tree since it won't contain all the tokens that composes the Program, you need to infer that an IfStatement contains {} by looking at the consequent.type...

The linked list (or array of tokens) have some advantages:

But also some drawbacks:

  • main drawback is that you don't have a link between the tokens and the nodes, so:
    • if you add a new element it won't be part of the AST (which might negate some of your logic and/or make things way more complex)
    • adding new tokens at the edges of the node might cause some undesired side effects (reference to startToken and endToken on the AST will point to wrong tokens..)
    • hard/expensive to detect all the nodes that contains that token (all tokens are part of Program)
    • you lose reference of what really an Identifier, Literal or Keyword are part of (ie. you don't know if function is an expression or declaration and what surrounds it)
  • hard to find the matching (), {}, [] (need some sort of state machine that counts how many opening/closing parenthesis in between..)
  • can't be serialized as JSON given the circular references

In my opinion the best format for code manipulation would be structured like a tree but would be more similar to nested arrays (sequential structure from left to right) than nested object (need to know object structure to be able to infer the position of tokens and which tokens exists in that range), it would have the following characteristics:

  • easy to insert new nodes and tokens before/after/around/inside any node
  • easy to remove/replace nodes and tokens
  • easy to convert back into a string
  • easy to find matching (), {} and []
  • easy to find if expression is surrounded by () (even if multiple nested ())
  • easy to find , in between arguments/parameters/values
  • easy to find ; in between expressions
  • easy to identify if node ends with ; or LineBreak (specially for things like ReturnStatement)
  • straightforward logic to loop through all the tokens inside a node/program (changes can be made without full context)
  • can be serialized as JSON
    • means that we would need helper libs or maybe even enhance the CST with extra properties (eg. reference to parent node) to be able to traverse it easily (imagine you need to get a reference to parent node and/or next/previous token)

The CST wouldn't need any range/loc info, specially since these get outdated after each manipulation and can be easily computed by looping through all the tokens that came before.

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 11, 2015

Contributor

main drawback is

I'd say the main drawback to having anything more than just one complete data structure is that you make it all but impossible/impractical for different tools in the flow of parsing->transformation->codegen to compose through typical methods, such as command line I/O piping, etc.

The secondary drawback, as you allude to, is that you've doubled the work to do any transformations, since you have to update both if there's any hope of keeping a sane flow of transformations between tools.

I cannot even fathom it being a good idea to have multiple structures. To me that is a complete show stopper design.

Contributor

getify commented Mar 11, 2015

main drawback is

I'd say the main drawback to having anything more than just one complete data structure is that you make it all but impossible/impractical for different tools in the flow of parsing->transformation->codegen to compose through typical methods, such as command line I/O piping, etc.

The secondary drawback, as you allude to, is that you've doubled the work to do any transformations, since you have to update both if there's any hope of keeping a sane flow of transformations between tools.

I cannot even fathom it being a good idea to have multiple structures. To me that is a complete show stopper design.

@millermedeiros

This comment has been minimized.

Show comment
Hide comment
@millermedeiros

millermedeiros Mar 11, 2015

impossible/impractical for different tools in the flow of parsing->transformation->codegen to compose through typical methods, such as command line I/O piping, etc

not totally true. depending on the kind of manipulation you don't need to touch the AST structure - on esformatter we just manipulate the tokens (mainly WhiteSpace, LineBreak, Indent and comments), so there is no need to update both structures. - that's why I decided to implement the linked list, it's the simplest structure that allow us to add/remove/search tokens, but it's less than ideal for complex manipulations (that's why I still want a real CST).

you always have the option to parse the code multiple times (command line I/O piping usually works over strings).. and that's really the path we are going to take on esformatter moving forward for plugins/tools that needs to change the code structure (see: millermedeiros/esformatter#168) - if all tools receive a string as input and output a string it's easier to mix and match, they can use whatever format is better for the job. - parsing JS code is rarely the bottleneck.

there is no need for complex codegen when the data structure already contains all the tokens in the proper order (codegen becomes just a basic string concatenation) - toString() on rocambole is incredibly fast.

millermedeiros commented Mar 11, 2015

impossible/impractical for different tools in the flow of parsing->transformation->codegen to compose through typical methods, such as command line I/O piping, etc

not totally true. depending on the kind of manipulation you don't need to touch the AST structure - on esformatter we just manipulate the tokens (mainly WhiteSpace, LineBreak, Indent and comments), so there is no need to update both structures. - that's why I decided to implement the linked list, it's the simplest structure that allow us to add/remove/search tokens, but it's less than ideal for complex manipulations (that's why I still want a real CST).

you always have the option to parse the code multiple times (command line I/O piping usually works over strings).. and that's really the path we are going to take on esformatter moving forward for plugins/tools that needs to change the code structure (see: millermedeiros/esformatter#168) - if all tools receive a string as input and output a string it's easier to mix and match, they can use whatever format is better for the job. - parsing JS code is rarely the bottleneck.

there is no need for complex codegen when the data structure already contains all the tokens in the proper order (codegen becomes just a basic string concatenation) - toString() on rocambole is incredibly fast.

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 11, 2015

Contributor

you're only talking about one specific view of your use-case. i'm talking about making it more difficult in the general case for all potential tools in the ecosystem.

Contributor

getify commented Mar 11, 2015

you're only talking about one specific view of your use-case. i'm talking about making it more difficult in the general case for all potential tools in the ecosystem.

@millermedeiros

This comment has been minimized.

Show comment
Hide comment
@millermedeiros

millermedeiros Mar 11, 2015

you're only talking about one specific view of your use-case. i'm talking about making it more difficult in the general case for all potential tools in the ecosystem.

yes, I'm looking through the eye of someone who needs to insert WhiteSpace/LineBreak around tokens, WhiteSpace/Indent/LineBreak inside elements, change between comma-first and comma-last, re-order variable declarations, convert between single variable declaration and multiple, normalize quotes around strings, align the content of multiple lines, and so many other things that are currently handled by esformatter plugins or are on our wishlist...

my point of view is that your proposal of shoehorning the AST is not going to make my work any simpler, in fact it would make it harder.. - maybe it would make your work simpler, I have no idea what are your use cases..

adding any extra property to the AST might cause undesired side effects; for instance esprima@2.1.0 added the handler property to every TryStament together with the handlers array.. which made rocambole loop over the same CatchClause twice, breaking the esformatter indentation logic (it was adding 2 indents instead of 1). even tho you currently think that keeping the same structure will be easier, it won't be backwards compatible and current tools won't magically work out of the box, there is always a lot of work to be done to adapt things and make sure we all play by the same rules.

millermedeiros commented Mar 11, 2015

you're only talking about one specific view of your use-case. i'm talking about making it more difficult in the general case for all potential tools in the ecosystem.

yes, I'm looking through the eye of someone who needs to insert WhiteSpace/LineBreak around tokens, WhiteSpace/Indent/LineBreak inside elements, change between comma-first and comma-last, re-order variable declarations, convert between single variable declaration and multiple, normalize quotes around strings, align the content of multiple lines, and so many other things that are currently handled by esformatter plugins or are on our wishlist...

my point of view is that your proposal of shoehorning the AST is not going to make my work any simpler, in fact it would make it harder.. - maybe it would make your work simpler, I have no idea what are your use cases..

adding any extra property to the AST might cause undesired side effects; for instance esprima@2.1.0 added the handler property to every TryStament together with the handlers array.. which made rocambole loop over the same CatchClause twice, breaking the esformatter indentation logic (it was adding 2 indents instead of 1). even tho you currently think that keeping the same structure will be easier, it won't be backwards compatible and current tools won't magically work out of the box, there is always a lot of work to be done to adapt things and make sure we all play by the same rules.

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 11, 2015

Contributor

OK, so let's for now grant that no one solution is perfect for all use-cases. Whatever solution we pick will cause some use-cases to be relatively easier than today's ad hoc options, or relatively harder, and some use-cases will be flipped.


But what is true, and was my earlier point, is that if two CLI utilities want to cooperate, and the sharing (via I/O piping) is currently just tool A outputting a string on the stream (say the JSON of the tree) that tool B then consumes (not re-parses from scratch) this tree and makes more changes, and re-outputs another tree, and ... THAT process will be more complicated if each tool needs to share two separate structures.

If we take CLI I/O piping out of the concern, and think only about streams in node (think gulp), the same concern applies. And if we take stream-based I/O entirely off the table, every API method in every tool which currently takes one argument (the tree or whatever) now has its arity/complexity doubled to accept two data structures.

THAT is all I meant by suggesting that across the ecosystem, passing around two structures is twice as complicated as passing around one structure.

Contributor

getify commented Mar 11, 2015

OK, so let's for now grant that no one solution is perfect for all use-cases. Whatever solution we pick will cause some use-cases to be relatively easier than today's ad hoc options, or relatively harder, and some use-cases will be flipped.


But what is true, and was my earlier point, is that if two CLI utilities want to cooperate, and the sharing (via I/O piping) is currently just tool A outputting a string on the stream (say the JSON of the tree) that tool B then consumes (not re-parses from scratch) this tree and makes more changes, and re-outputs another tree, and ... THAT process will be more complicated if each tool needs to share two separate structures.

If we take CLI I/O piping out of the concern, and think only about streams in node (think gulp), the same concern applies. And if we take stream-based I/O entirely off the table, every API method in every tool which currently takes one argument (the tree or whatever) now has its arity/complexity doubled to accept two data structures.

THAT is all I meant by suggesting that across the ecosystem, passing around two structures is twice as complicated as passing around one structure.

@millermedeiros

This comment has been minimized.

Show comment
Hide comment
@millermedeiros

millermedeiros Mar 11, 2015

totally agree, better to have a single structure that contains everything you need. less moving parts. I just don't think that adding extra properties to an AST is enough, it would still be hard to do some of the manipulations that I described on my previous comment. I'd rather have a new format that makes it easier to rearrange nodes while also making it simple to edit the tokens that compose the program.

millermedeiros commented Mar 11, 2015

totally agree, better to have a single structure that contains everything you need. less moving parts. I just don't think that adding extra properties to an AST is enough, it would still be hard to do some of the manipulations that I described on my previous comment. I'd rather have a new format that makes it easier to rearrange nodes while also making it simple to edit the tokens that compose the program.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 11, 2015

Contributor

I'd rather have a new format that makes it easier to rearrange nodes while also making it simple to edit the tokens that compose the program.

You can have that. What you won't have though is collaboration with a bunch of other tools that need the existing tree structure.

Adding the tokens to the structure with ability to reference between tree and token list solves majority of use cases, is immediately back compat, works with structure and whitespace modification, and has the only downside of some theoretic difficulty in maintaining both, which is just not proven to be true.

If you'd like to continue having string output and have every tool in pipeline pay the parse penalty, that's also fine, but we're talking about passing around a single structure that is non lossy and only gets printed back to a string at the end.

Not paying the multi parse penalty is the key here.

Contributor

mikesherov commented Mar 11, 2015

I'd rather have a new format that makes it easier to rearrange nodes while also making it simple to edit the tokens that compose the program.

You can have that. What you won't have though is collaboration with a bunch of other tools that need the existing tree structure.

Adding the tokens to the structure with ability to reference between tree and token list solves majority of use cases, is immediately back compat, works with structure and whitespace modification, and has the only downside of some theoretic difficulty in maintaining both, which is just not proven to be true.

If you'd like to continue having string output and have every tool in pipeline pay the parse penalty, that's also fine, but we're talking about passing around a single structure that is non lossy and only gets printed back to a string at the end.

Not paying the multi parse penalty is the key here.

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 11, 2015

Contributor

FWIW, to some of the earlier questions about whether this is a useful/necessary discussion to be having in scope of ESTree, I think this present line of discussion is precisely why some "standards body" (used in a very loose sense here, obviously) definitely needs to be taclking this.

To me it would be folly for some other external group to come along later and try to define a separate standard for how these token lists or annotations or whatever were somehow integrated into the tree we specify here. I think we precisely have to consider these issues and work through them, as the greater tooling ecosystem clearly needs something more than one tool's ad hoc solution can provide. The lack of such a coordinated broad effort is why this problem remains partially hacked around on a per-tool basis after years of efforts and complaints.

If ESTree doesn't standardize concrete syntax as part of the overall process here, then some other tree standard (entirely duplicative of ESTree) will have to do so, and that standard will have to be what folks like me hope most tools in the ecosystem adopt eventually.

IOW, if ESTree doesn't agree to the notion of standardizing concrete syntax (in some fashion, and at some point eventually) as part of the overall tree standard, then I think my next appropriate step will be to fork ESTree and continue that effort in parallel, and then hope to convince tools that the forked wider standard is what they consider, either now or sometime down the road. It'd be a lot less waste of effort to just define that as in scope for ESTree itself, and keep the effort centralized.

Anything short of that accomplishes precisely zero of what I've been on-and-off banging a drum about for the better part of two years.

Contributor

getify commented Mar 11, 2015

FWIW, to some of the earlier questions about whether this is a useful/necessary discussion to be having in scope of ESTree, I think this present line of discussion is precisely why some "standards body" (used in a very loose sense here, obviously) definitely needs to be taclking this.

To me it would be folly for some other external group to come along later and try to define a separate standard for how these token lists or annotations or whatever were somehow integrated into the tree we specify here. I think we precisely have to consider these issues and work through them, as the greater tooling ecosystem clearly needs something more than one tool's ad hoc solution can provide. The lack of such a coordinated broad effort is why this problem remains partially hacked around on a per-tool basis after years of efforts and complaints.

If ESTree doesn't standardize concrete syntax as part of the overall process here, then some other tree standard (entirely duplicative of ESTree) will have to do so, and that standard will have to be what folks like me hope most tools in the ecosystem adopt eventually.

IOW, if ESTree doesn't agree to the notion of standardizing concrete syntax (in some fashion, and at some point eventually) as part of the overall tree standard, then I think my next appropriate step will be to fork ESTree and continue that effort in parallel, and then hope to convince tools that the forked wider standard is what they consider, either now or sometime down the road. It'd be a lot less waste of effort to just define that as in scope for ESTree itself, and keep the effort centralized.

Anything short of that accomplishes precisely zero of what I've been on-and-off banging a drum about for the better part of two years.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 12, 2015

Contributor

@getify take it easy, man. No reason to get so worked up and go talking about a fork already! We're here trying to unite community and not divide.

No one said not to consider a CST, we're just talking about prioritizing the work here. First is finishing ES6, and then figure out how to proceed with CST.

There is no need to discuss this in such an adversarial tone. The group here is rightfully asking whether CST is part of this discussion. That doesn't mean we won't tackle it, together.

Contributor

mikesherov commented Mar 12, 2015

@getify take it easy, man. No reason to get so worked up and go talking about a fork already! We're here trying to unite community and not divide.

No one said not to consider a CST, we're just talking about prioritizing the work here. First is finishing ES6, and then figure out how to proceed with CST.

There is no need to discuss this in such an adversarial tone. The group here is rightfully asking whether CST is part of this discussion. That doesn't mean we won't tackle it, together.

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 12, 2015

Contributor
  1. I deliberately didn't respond right away several days ago, so as to not just be "worked up". I am not worked up nor emotional about this. I just feel strongly, as this topic is something I've been trying to address for years now.

    I mentioned "fork" not as a threat but as a point about the proper scope, and specifically as something I would like to do everything in my power to avoid needing to do. A fork would be worst-case outcome IMO. For the same reason, years ago when I needed this concrete syntax stuff, I didn't just fork a parser and do all my own ad hoc work. My project esre has just been sitting waiting hoping for a solution. :/

  2. "No one said not to consider a CST". well...

    Finally, are we sure we want to have this discussion here? It seems a bit out of scope for this project.

    Completely agreed.

    I'm only reacting to those sentiments. Nothing more.

  3. "prioritizing the work here" I don't mind prioritizing. I didn't start this thread demanding that we tackle all of CST now. That having been said, I strongly agree with...

    we need to develop the ES6 bits with the CST in mind as a future concern

    And so I was merely saying, it only makes sense that we have a general idea of the overall direction (ie, that we don't consider a completely separate data structure) while we're thinking about the other tree node definitions.

    Specifically, I would very much like to avoid the "it's too late to do that" arguments that have actually specifically thwarted some of my earlier efforts.

Sorry for the alarming tone my previous message came across as. That's not actually my intent at all.

Contributor

getify commented Mar 12, 2015

  1. I deliberately didn't respond right away several days ago, so as to not just be "worked up". I am not worked up nor emotional about this. I just feel strongly, as this topic is something I've been trying to address for years now.

    I mentioned "fork" not as a threat but as a point about the proper scope, and specifically as something I would like to do everything in my power to avoid needing to do. A fork would be worst-case outcome IMO. For the same reason, years ago when I needed this concrete syntax stuff, I didn't just fork a parser and do all my own ad hoc work. My project esre has just been sitting waiting hoping for a solution. :/

  2. "No one said not to consider a CST". well...

    Finally, are we sure we want to have this discussion here? It seems a bit out of scope for this project.

    Completely agreed.

    I'm only reacting to those sentiments. Nothing more.

  3. "prioritizing the work here" I don't mind prioritizing. I didn't start this thread demanding that we tackle all of CST now. That having been said, I strongly agree with...

    we need to develop the ES6 bits with the CST in mind as a future concern

    And so I was merely saying, it only makes sense that we have a general idea of the overall direction (ie, that we don't consider a completely separate data structure) while we're thinking about the other tree node definitions.

    Specifically, I would very much like to avoid the "it's too late to do that" arguments that have actually specifically thwarted some of my earlier efforts.

Sorry for the alarming tone my previous message came across as. That's not actually my intent at all.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 12, 2015

Contributor

Thanks for clearing it up!

Contributor

mikesherov commented Mar 12, 2015

Thanks for clearing it up!

@nzakas

This comment has been minimized.

Show comment
Hide comment
@nzakas

nzakas Mar 12, 2015

Contributor

Can we agree on a goal for this thread? Right now it doesn't seem like the conversation is heading towards any sort of conclusion. So far it seems people are agreed that:

  1. A CST is desirable
  2. It would be nice to have a CST built on top of ESTree rather than something separate

What are the next steps here?

Contributor

nzakas commented Mar 12, 2015

Can we agree on a goal for this thread? Right now it doesn't seem like the conversation is heading towards any sort of conclusion. So far it seems people are agreed that:

  1. A CST is desirable
  2. It would be nice to have a CST built on top of ESTree rather than something separate

What are the next steps here?

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 12, 2015

Contributor

My goal was purely to get consensus on:

  1. That concrete syntax is an important problem to solve
  2. That ESTree should handle it in some form or fashion during the process of defining this new (aka updated) cross-tool standard.
  3. That in some fashion this be a single data structure instead of separate meta-data.

The bikeshedding of how to do it, or even precisely when, is not in scope for this thread IMO. I probably shouldn't have even included my own proposal as that invites such.

Contributor

getify commented Mar 12, 2015

My goal was purely to get consensus on:

  1. That concrete syntax is an important problem to solve
  2. That ESTree should handle it in some form or fashion during the process of defining this new (aka updated) cross-tool standard.
  3. That in some fashion this be a single data structure instead of separate meta-data.

The bikeshedding of how to do it, or even precisely when, is not in scope for this thread IMO. I probably shouldn't have even included my own proposal as that invites such.

@nzakas

This comment has been minimized.

Show comment
Hide comment
@nzakas

nzakas Mar 12, 2015

Contributor

Based on those goals, have we concluded this thread at this point? If so, what would the next steps be?

Contributor

nzakas commented Mar 12, 2015

Based on those goals, have we concluded this thread at this point? If so, what would the next steps be?

@michaelficarra

This comment has been minimized.

Show comment
Hide comment
@michaelficarra

michaelficarra Mar 12, 2015

Member

I can get behind 1 and 3 but disagree on 2. This project, as I understand it, is simply tasked with unifying and documenting the SpiderMonkey-based ASTs produced and consumed by popular tooling. Not making it better or fixing it, and especially not expanding its scope. As long as this isn't getting in the way, it should be a separate effort.

Member

michaelficarra commented Mar 12, 2015

I can get behind 1 and 3 but disagree on 2. This project, as I understand it, is simply tasked with unifying and documenting the SpiderMonkey-based ASTs produced and consumed by popular tooling. Not making it better or fixing it, and especially not expanding its scope. As long as this isn't getting in the way, it should be a separate effort.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 12, 2015

Contributor

@michaelficarra I disagree with that. There's no reason this group can't and shouldn't tackle CST once we have ironed out the inconsistencies and ES6. It is precisely the major parsers and the major parser consumers who should be defining the CST IMO. This would have benefits for esformatter, JSCS, ESLint, Recast, Babel, etc... all of who consume, not surprisingly, Esprima, Acorn, or Spidermonkey.

Saying the goal of this group is not to make the SM AST better is a non-starter conversation for me. As long as the changes are additive and BC is preserved, we should not limit ourselves to being done once the wrinkles are ironed out.

Contributor

mikesherov commented Mar 12, 2015

@michaelficarra I disagree with that. There's no reason this group can't and shouldn't tackle CST once we have ironed out the inconsistencies and ES6. It is precisely the major parsers and the major parser consumers who should be defining the CST IMO. This would have benefits for esformatter, JSCS, ESLint, Recast, Babel, etc... all of who consume, not surprisingly, Esprima, Acorn, or Spidermonkey.

Saying the goal of this group is not to make the SM AST better is a non-starter conversation for me. As long as the changes are additive and BC is preserved, we should not limit ourselves to being done once the wrinkles are ironed out.

@kittens

This comment has been minimized.

Show comment
Hide comment
@kittens

kittens Mar 12, 2015

Contributor

@michaelficarra The SpiderMonkey AST isn't going away (in fact some people such as me prefer it). Try not to let your bias interfere too much with the projects direction.

Contributor

kittens commented Mar 12, 2015

@michaelficarra The SpiderMonkey AST isn't going away (in fact some people such as me prefer it). Try not to let your bias interfere too much with the projects direction.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 12, 2015

Contributor

@RReverser @sebmck thoughts? Once we have concluded ironing out inconsistencies and ES6/7, are you interested in pursuing defining the CST portion of the ESTree?

Contributor

mikesherov commented Mar 12, 2015

@RReverser @sebmck thoughts? Once we have concluded ironing out inconsistencies and ES6/7, are you interested in pursuing defining the CST portion of the ESTree?

@kittens

This comment has been minimized.

Show comment
Hide comment
@kittens

kittens Mar 12, 2015

Contributor

@mikesherov Sure I don't see why not. Would make it much easier for linters and for transformers that want to retain input whitespace etc.

Contributor

kittens commented Mar 12, 2015

@mikesherov Sure I don't see why not. Would make it much easier for linters and for transformers that want to retain input whitespace etc.

@millermedeiros

This comment has been minimized.

Show comment
Hide comment
@millermedeiros

millermedeiros Mar 12, 2015

My next step is try to convince that keeping the same structure as the AST
is not a good idea. If that doesn't work I'll let you guys move forward and
keep doing my own thing.. Maybe that will motivate me to create a competing
standard - only reason why I haven't done it already is because rocambole
was good enough for basic token manipulation (like we need for esformatter)
and time is a very limited resource...

millermedeiros commented Mar 12, 2015

My next step is try to convince that keeping the same structure as the AST
is not a good idea. If that doesn't work I'll let you guys move forward and
keep doing my own thing.. Maybe that will motivate me to create a competing
standard - only reason why I haven't done it already is because rocambole
was good enough for basic token manipulation (like we need for esformatter)
and time is a very limited resource...

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 12, 2015

Contributor

@millermedeiros if the conversation you want to have is changing the AST structure as you already described, then yes, you should create a competing standard. Anything that is non additive, or that breaks BC at all is not currently on the table.

Contributor

mikesherov commented Mar 12, 2015

@millermedeiros if the conversation you want to have is changing the AST structure as you already described, then yes, you should create a competing standard. Anything that is non additive, or that breaks BC at all is not currently on the table.

@kittens

This comment has been minimized.

Show comment
Hide comment
@kittens

kittens Mar 12, 2015

Contributor

@millermedeiros

My next step is try to convince that keeping the same structure as the AST is not a good idea. If that doesn't work I'll let you guys move forward and keep doing my own thing..

Feel free to suggest specific changes or raise specific pain points. Discussion about changing the entire AST wont be taken into consideration.

Contributor

kittens commented Mar 12, 2015

@millermedeiros

My next step is try to convince that keeping the same structure as the AST is not a good idea. If that doesn't work I'll let you guys move forward and keep doing my own thing..

Feel free to suggest specific changes or raise specific pain points. Discussion about changing the entire AST wont be taken into consideration.

@millermedeiros

This comment has been minimized.

Show comment
Hide comment
@millermedeiros

millermedeiros Mar 12, 2015

millermedeiros commented Mar 12, 2015

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Mar 12, 2015

Contributor

@nzakas to address your point, I think we need to lay down some basic principles for a CST so we can close this thread. I'll submit a PR that add a page that serves as an informative note about the CST that closes this issue.

  1. Must be an additive change.
  2. Must promote the concept of mutability, that is, augmenting the CST should not invalidate non local nodes.
  3. Must be JSON serializable, so that tools can eventually output a parsed JSON string as input to other tools (to avoid the double parse penalty)
  4. Must promote traversability. Should be able to get to the "next" or "prev" set of tokens from any node.
  5. Must promote printability. Should be easy to generate the src from the tree.
  6. Must be lossless. Seems obvious, but worth mentioning.

OK?

Contributor

mikesherov commented Mar 12, 2015

@nzakas to address your point, I think we need to lay down some basic principles for a CST so we can close this thread. I'll submit a PR that add a page that serves as an informative note about the CST that closes this issue.

  1. Must be an additive change.
  2. Must promote the concept of mutability, that is, augmenting the CST should not invalidate non local nodes.
  3. Must be JSON serializable, so that tools can eventually output a parsed JSON string as input to other tools (to avoid the double parse penalty)
  4. Must promote traversability. Should be able to get to the "next" or "prev" set of tokens from any node.
  5. Must promote printability. Should be easy to generate the src from the tree.
  6. Must be lossless. Seems obvious, but worth mentioning.

OK?

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Mar 12, 2015

Contributor

+1

Contributor

getify commented Mar 12, 2015

+1

@nzakas

This comment has been minimized.

Show comment
Hide comment
@nzakas

nzakas Mar 12, 2015

Contributor

Sounds good. I really just wanted to move on from the bike shedding to something more goal oriented. This sounds like a good way to do that.

Contributor

nzakas commented Mar 12, 2015

Sounds good. I really just wanted to move on from the bike shedding to something more goal oriented. This sounds like a good way to do that.

@RReverser

This comment has been minimized.

Show comment
Hide comment
@RReverser

RReverser Mar 12, 2015

Member

Once we have concluded ironing out inconsistencies and ES6/7, are you interested in pursuing defining the CST portion of the ESTree?

Of course (I suppose everyone here is interested in that).

Member

RReverser commented Mar 12, 2015

Once we have concluded ironing out inconsistencies and ES6/7, are you interested in pursuing defining the CST portion of the ESTree?

Of course (I suppose everyone here is interested in that).

@jzaefferer

This comment has been minimized.

Show comment
Hide comment
@jzaefferer

jzaefferer Mar 12, 2015

@mikesherov that list seems to exclude many architectural approaches prematurely. This...

Should be able to get to the "next" or "prev" set of tokens from any node.

...even prescribes a specific approach, one that Miller has argued against for a CST. I find that quite compelling, considering that he wrote rocambole, which implements exactly what you're suggesting. If the author of such a tool talks about the architecture shortcomings of said tool and suggests discussing alternatives, I'd hope for such a discussion to be welcomed, not shut down immediately.

and has the only downside of some theoretic difficulty in maintaining both, which is just not proven to be true.

Then give Miller a chance to prove those difficulties. Considering the rather open timeline for CST proposals, that is unlikely to hurt progress, since he could do that while the ESTree team continues working on ES6 details.

jzaefferer commented Mar 12, 2015

@mikesherov that list seems to exclude many architectural approaches prematurely. This...

Should be able to get to the "next" or "prev" set of tokens from any node.

...even prescribes a specific approach, one that Miller has argued against for a CST. I find that quite compelling, considering that he wrote rocambole, which implements exactly what you're suggesting. If the author of such a tool talks about the architecture shortcomings of said tool and suggests discussing alternatives, I'd hope for such a discussion to be welcomed, not shut down immediately.

and has the only downside of some theoretic difficulty in maintaining both, which is just not proven to be true.

Then give Miller a chance to prove those difficulties. Considering the rather open timeline for CST proposals, that is unlikely to hurt progress, since he could do that while the ESTree team continues working on ES6 details.

@kittens

This comment has been minimized.

Show comment
Hide comment
@kittens

kittens Mar 12, 2015

Contributor

@jzaefferer Yes it's true that it may not be ideal for a CST but ESTree is an AST first and foremost and there are lot of different types of consumers. Having a balance between all of our needs is hard and we sometimes need to make compromises. We don't live in a perfect world where everyone gets what they want. Yes, a CST spec designed from the start may be suitable for certain tools but it'd likely be completely useless for a category of others.

Backwards compatibility will not be broken. ESTree was created as a collaborative effort between all the consumers so we have one spec that increases interoperability between the various tools. It's not really a good idea to lecture us on the shortcomings of the current representation or even a proposed one since there are a lot of people involved in this conversation that have a lot of experience doing this sort of stuff. Please don't be under the impression that we're disregarding input into this process, we're not. This conversation is just premature until everything else is out of the way so everyone's effort can be focussed on creating an interoperable and agreeable CST spec for all.

It doesn't make much sense to complain about a process that hasn't even happened yet. As @nzakas said can we please refrain from bikeshedding until the appropriate time since this conversation is very unproductive on it's own. Some patience and composure is more than appreciated in a standards process like this, especially one with so many interested parties involved.

Contributor

kittens commented Mar 12, 2015

@jzaefferer Yes it's true that it may not be ideal for a CST but ESTree is an AST first and foremost and there are lot of different types of consumers. Having a balance between all of our needs is hard and we sometimes need to make compromises. We don't live in a perfect world where everyone gets what they want. Yes, a CST spec designed from the start may be suitable for certain tools but it'd likely be completely useless for a category of others.

Backwards compatibility will not be broken. ESTree was created as a collaborative effort between all the consumers so we have one spec that increases interoperability between the various tools. It's not really a good idea to lecture us on the shortcomings of the current representation or even a proposed one since there are a lot of people involved in this conversation that have a lot of experience doing this sort of stuff. Please don't be under the impression that we're disregarding input into this process, we're not. This conversation is just premature until everything else is out of the way so everyone's effort can be focussed on creating an interoperable and agreeable CST spec for all.

It doesn't make much sense to complain about a process that hasn't even happened yet. As @nzakas said can we please refrain from bikeshedding until the appropriate time since this conversation is very unproductive on it's own. Some patience and composure is more than appreciated in a standards process like this, especially one with so many interested parties involved.

@millermedeiros

This comment has been minimized.

Show comment
Hide comment
@millermedeiros

millermedeiros Mar 12, 2015

Just to make things clear:

My suggestion is to have 2 discrete formats.. one for the ABSTRACT code structure like we have right now (used by transpilers, type inference, code completion, etc..) and another format for the CONCRETE code structure (used by linters, code formatters, any code transformation) - AST !== CST

They don't need to be linked in any way, we could have parsers that only generate the AST, and parsers that only generate the CST. The same way that we would have tools that only work over the CST and tools that only handle AST.

I strongly believe that adding more properties to the AST is a bad idea. Maybe one day I'll have enough time/patience to write code that shows the kinds of operations that are too hard to do with an AST and how a different format could look like and how it would simplify things.

Some flaws of the CST principles listed above:

  • Linked lists can't be serialized into JSON (how would you describe cross/circular references?) so the whole idea of "next" and "prev" tokens doesn't work out.
  • AST is not easily printable, see escodegen source code for an example. You need to infer a lot of things based on the tree structure since it doesn't contain all the tokens (if it doesn't contain all the parenthesis, colons, braces it is not a concrete syntax tree). Adding more properties to the tree will increase the codegen complexity even more...
  • if the AST contains range/loc info, it is not easy/cheap to mutate it (you need to update the loc info on all the nodes of the tree after each change).
  • additive changes are not always backwards compatible since tools might rely on the current tree structure to work.

If you all consider that a new format is not a valid solution, then this will eventually fail because the correct answer to this problem is definitely elsewhere.

millermedeiros commented Mar 12, 2015

Just to make things clear:

My suggestion is to have 2 discrete formats.. one for the ABSTRACT code structure like we have right now (used by transpilers, type inference, code completion, etc..) and another format for the CONCRETE code structure (used by linters, code formatters, any code transformation) - AST !== CST

They don't need to be linked in any way, we could have parsers that only generate the AST, and parsers that only generate the CST. The same way that we would have tools that only work over the CST and tools that only handle AST.

I strongly believe that adding more properties to the AST is a bad idea. Maybe one day I'll have enough time/patience to write code that shows the kinds of operations that are too hard to do with an AST and how a different format could look like and how it would simplify things.

Some flaws of the CST principles listed above:

  • Linked lists can't be serialized into JSON (how would you describe cross/circular references?) so the whole idea of "next" and "prev" tokens doesn't work out.
  • AST is not easily printable, see escodegen source code for an example. You need to infer a lot of things based on the tree structure since it doesn't contain all the tokens (if it doesn't contain all the parenthesis, colons, braces it is not a concrete syntax tree). Adding more properties to the tree will increase the codegen complexity even more...
  • if the AST contains range/loc info, it is not easy/cheap to mutate it (you need to update the loc info on all the nodes of the tree after each change).
  • additive changes are not always backwards compatible since tools might rely on the current tree structure to work.

If you all consider that a new format is not a valid solution, then this will eventually fail because the correct answer to this problem is definitely elsewhere.

@kittens

This comment has been minimized.

Show comment
Hide comment
@kittens

kittens Mar 12, 2015

Contributor

@millermedeiros

Linked lists can't be serialized into JSON (how would you describe cross/circular references?) so the whole idea of "next" and "prev" tokens doesn't work out.

Linked lists were never implied, only that you can get the surrounding tokens easily.

AST is not easily printable

Sure it is. This looks pretty simple and easy to me 😄

additive changes are not always backwards compatible since tools might rely on the current tree structure to work.

Which is why it's designed in such a way that it's not backwards incompatible.

If you all consider that a new format is not a valid solution, then this will eventually fail because the correct answer to this problem is definitely elsewhere.

Dude, come on. There's no easy solution and pessimism wont gets us anywhere.


As I said in my previous comment, these are changes that would benefit you. ESTree has to take more into consideration than just you and your needs, an entire ecosystem relies on this stuff and interoperability is a big deal which is why we want to collaborate and compromise on something that works well for most rather than a few.

Contributor

kittens commented Mar 12, 2015

@millermedeiros

Linked lists can't be serialized into JSON (how would you describe cross/circular references?) so the whole idea of "next" and "prev" tokens doesn't work out.

Linked lists were never implied, only that you can get the surrounding tokens easily.

AST is not easily printable

Sure it is. This looks pretty simple and easy to me 😄

additive changes are not always backwards compatible since tools might rely on the current tree structure to work.

Which is why it's designed in such a way that it's not backwards incompatible.

If you all consider that a new format is not a valid solution, then this will eventually fail because the correct answer to this problem is definitely elsewhere.

Dude, come on. There's no easy solution and pessimism wont gets us anywhere.


As I said in my previous comment, these are changes that would benefit you. ESTree has to take more into consideration than just you and your needs, an entire ecosystem relies on this stuff and interoperability is a big deal which is why we want to collaborate and compromise on something that works well for most rather than a few.

@markelog

This comment has been minimized.

Show comment
Hide comment
@markelog

markelog Jul 8, 2015

I will try to enrich @mdevils answer a bit.

Imagine HTML as JavaScript in this context, then AST + tokens would be HTML specification and CST would play role of DOM implementation.

In other words, we think CST shouldn't be a format that defines static document but an API specification.

This is a whole different way of looking on the CST problem, but this is exactly what we need in jscs, it is pretty enlightening refactoring it using cst module and judging by what listed under "Why Concrete Syntax?", it should work just as well for other use-cases too.

markelog commented Jul 8, 2015

I will try to enrich @mdevils answer a bit.

Imagine HTML as JavaScript in this context, then AST + tokens would be HTML specification and CST would play role of DOM implementation.

In other words, we think CST shouldn't be a format that defines static document but an API specification.

This is a whole different way of looking on the CST problem, but this is exactly what we need in jscs, it is pretty enlightening refactoring it using cst module and judging by what listed under "Why Concrete Syntax?", it should work just as well for other use-cases too.

@nzakas

This comment has been minimized.

Show comment
Hide comment
@nzakas

nzakas Jul 8, 2015

Contributor

ESLint has been investigating how to augment the AST to allow fixing of errors as well. I really like the direction of @mdevils work.

Contributor

nzakas commented Jul 8, 2015

ESLint has been investigating how to augment the AST to allow fixing of errors as well. I really like the direction of @mdevils work.

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Jul 9, 2015

I find @mdevils work to be valuable, but don't think it replaces the need for a JSON-compatible CST extension of estree. The latter need not be much, but should at least capture the content and sequence of all lexical tokens (including comments, whitespace, and line terminators) (for e.g. #94 and babel/babel/issues/497) and connect AST nodes to them (for e.g. #90 and #92).

gibson042 commented Jul 9, 2015

I find @mdevils work to be valuable, but don't think it replaces the need for a JSON-compatible CST extension of estree. The latter need not be much, but should at least capture the content and sequence of all lexical tokens (including comments, whitespace, and line terminators) (for e.g. #94 and babel/babel/issues/497) and connect AST nodes to them (for e.g. #90 and #92).

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Jul 9, 2015

Contributor

I'd be very curious how two separate CST-enabled tools could "share" (aka "pipe") a CST data structure from one to the other? Think *nix CLI file descriptor pipes, not JS streams.

Contributor

getify commented Jul 9, 2015

I'd be very curious how two separate CST-enabled tools could "share" (aka "pipe") a CST data structure from one to the other? Think *nix CLI file descriptor pipes, not JS streams.

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Jul 9, 2015

There are several options; here's a nonexhaustive list:

  • a top-level list of lexical tokens, referenced from syntax nodes by numerical index
  • a top level hash of (autogenerated) identifiers to lexical tokens, referenced from syntax nodes by identifier
  • a list of (lexical token|property name of associated subnode) associated with each syntax node, disjoint between them and complete over a Program.

Let me know if you'd like some code-level examples.

gibson042 commented Jul 9, 2015

There are several options; here's a nonexhaustive list:

  • a top-level list of lexical tokens, referenced from syntax nodes by numerical index
  • a top level hash of (autogenerated) identifiers to lexical tokens, referenced from syntax nodes by identifier
  • a list of (lexical token|property name of associated subnode) associated with each syntax node, disjoint between them and complete over a Program.

Let me know if you'd like some code-level examples.

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Jul 9, 2015

Contributor

It's not clear to me why the notion of CST as a format needs to change to CST as a library, since one of the base passing cases is that it must be serializable for persistence and transfer?

Each of those options you suggested for serialization sound reasonable at first glance, but why wouldn't we just decide on one of them as the CST standard, and then also release a CST helper library to do the extra tasks you're suggesting?

Contributor

getify commented Jul 9, 2015

It's not clear to me why the notion of CST as a format needs to change to CST as a library, since one of the base passing cases is that it must be serializable for persistence and transfer?

Each of those options you suggested for serialization sound reasonable at first glance, but why wouldn't we just decide on one of them as the CST standard, and then also release a CST helper library to do the extra tasks you're suggesting?

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Jul 9, 2015

why wouldn't we just decide on one of them as the CST standard, and then also release a CST helper library to do the extra tasks you're suggesting?

Are you confusing me with @mdevils? That's exactly what I'm suggesting.

gibson042 commented Jul 9, 2015

why wouldn't we just decide on one of them as the CST standard, and then also release a CST helper library to do the extra tasks you're suggesting?

Are you confusing me with @mdevils? That's exactly what I'm suggesting.

@getify

This comment has been minimized.

Show comment
Hide comment
@getify

getify Jul 9, 2015

Contributor

Are you confusing me with @mdevils?

I am not confusing you. I wasn't really intentionally directing my questions/comments at any particular person, just asking in general. Sorry if it came off too directed at you.

Contributor

getify commented Jul 9, 2015

Are you confusing me with @mdevils?

I am not confusing you. I wasn't really intentionally directing my questions/comments at any particular person, just asking in general. Sorry if it came off too directed at you.

@markelog

This comment has been minimized.

Show comment
Hide comment
@markelog

markelog Jul 9, 2015

I'm not sure why would you need to pipe anything, other tools could use mdevils/cst as dep, or implement the same API or use other tools with same API as a dep.

But let's say there is a use-case for that, lets say you need a static document from it,
then serialization would be the opposite process - since you get CST from AST + tokens, then serialization from CST to static document would the same AST + tokens.

markelog commented Jul 9, 2015

I'm not sure why would you need to pipe anything, other tools could use mdevils/cst as dep, or implement the same API or use other tools with same API as a dep.

But let's say there is a use-case for that, lets say you need a static document from it,
then serialization would be the opposite process - since you get CST from AST + tokens, then serialization from CST to static document would the same AST + tokens.

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Jul 9, 2015

Contributor

Other tools having to use mdevils/cst is a non starter. The point is that not only javascript should be consuming the format.

Contributor

mikesherov commented Jul 9, 2015

Other tools having to use mdevils/cst is a non starter. The point is that not only javascript should be consuming the format.

@markelog

This comment has been minimized.

Show comment
Hide comment
@markelog

markelog Jul 9, 2015

Other tools having to use mdevils/cst is a non starter

Not suggesting that, other tools have a choice to use it or choose another library... or another language with implemented standart API.

Since we are talking here about specification right? Not to chose one library that we all like. Again, i will give the same example i gave before - look at how different libraries on different language interact with XML document, through fully agnostic standart DOM API.

Analog of that is proposed here, only for JavaScript.

markelog commented Jul 9, 2015

Other tools having to use mdevils/cst is a non starter

Not suggesting that, other tools have a choice to use it or choose another library... or another language with implemented standart API.

Since we are talking here about specification right? Not to chose one library that we all like. Again, i will give the same example i gave before - look at how different libraries on different language interact with XML document, through fully agnostic standart DOM API.

Analog of that is proposed here, only for JavaScript.

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Aug 16, 2015

Concrete proposal (pun intended) matching my third suggestion: an optional Node#sourceElements array allowing original source to be recovered by depth-first concatenation. This maintains JSON-friendly acyclicity and keeps all information both non-duplicated and node-local (for less complex transformation/manipulation). Source rendering should prefer sourceElements when present, but verify that all child nodes are referenced in order (or alternatively insert them automatically along with necessary punctuators).

An example demonstrating representation of comments/whitespace and Directive Prologue edge cases:

0,function/*anonymous*/( ) {
  // Not a Use Strict Directive
  "use\
 strict";

  // Not a directive at all
  ("use strict")
} // last character is a <LF>

yields the following:

{
  "type": "Program",

  // new optional property: Node#sourceElements: [ SourceElement ]
  "sourceElements": [

    /* new interface
      Nonterminal <: SourceElement {
        reference: string (JSON Pointer relative to the parent AST node);
      }
    */
    // JSON Pointer could be replaced with a different suitably expressive language
    { "reference": "body/0" }
  ],
  "body": [
    {
      "type": "ExpressionStatement",
      "sourceElements": [
        { "reference": "expression" },

        /* new interface
          NonTokenTerminal <: SourceElement {
            element: string (WhiteSpace|LineTerminator|Comment*);
            value: string;
          }
        */
        { "element": "WhiteSpace", "value": " " },

        // Comment decomposition:
        // * `//`- or `/*`-prefixed head
        // * (possibly multi-line) body
        // * `*/`-suffixed tail, when applicable
        // Not strictly necessary, but may make manipulation & verification easier
        { "element": "CommentHead", "value": "//" },
        { "element": "CommentBody", "value": " last character is a <LF>" },
        { "element": "LineTerminator", "value": "\n" }
      ],
      "expression": {
        "type": "SequenceExpression",
        "sourceElements": [
          { "reference": "expressions/0" },

          /* new interface
            TokenTerminal <: SourceElement {
              element: string (Keyword|Identifier|Punctuator|Template*|*Literal);
              value: string;
            }
          */
          { "element": "Punctuator", "value": "," },
          { "reference": "expressions/1" }
        ],
        "expressions": [
          {
            "type": "Literal",
            "value": 0,

            // Note lack of `Literal#raw`—`sourceElements` subsumes such functionality
            "sourceElements": [
              { "element": "NumericLiteral", "value": "0" }
            ]
          },
          {
            "type": "FunctionExpression",
            "id": null,
            "params": [],
            "sourceElements": [
              { "element": "Keyword", "value": "function" },

              // Comments/whitespace can precede or follow any source element
              { "element": "CommentHead", "value": "/*" },
              { "element": "CommentBody", "value": "anonymous" },
              { "element": "CommentTail", "value": "*/" },
              { "element": "Punctuator", "value": "(" },
              { "element": "WhiteSpace", "value": " " },
              { "element": "Punctuator", "value": ")" },
              { "element": "WhiteSpace", "value": " " },
              { "reference": "body" }
            ],
            "body": {
              "type": "BlockStatement",
              "sourceElements": [
                { "element": "Punctuator", "value": "{" },
                { "reference": "body/0" },
                { "reference": "body/1" },
                { "element": "Punctuator", "value": "}" }
              ],
              "body": [
                {
                  "type": "ExpressionStatement",
                  "sourceElements": [
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "element": "CommentHead", "value": "//" },
                    { "element": "CommentBody", "value": " Not a Use Strict Directive" },
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "reference": "expression" },
                    { "element": "Punctuator", "value": ";" }
                  ],
                  "expression": {
                    "type": "Literal",

                    // `value` alone is not enough to detect the trickery
                    "value": "use strict",

                    // …but the corresponding StringLiteral is
                    "sourceElements": [
                      { "element": "StringLiteral", "value": "\"use\\\n strict\"" }
                    ]
                  }
                },
                {
                  "type": "ExpressionStatement",
                  "sourceElements": [
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "element": "CommentHead", "value": "//" },
                    { "element": "CommentBody", "value": " Not a directive at all" },
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "reference": "expression" },
                    { "element": "LineTerminator", "value": "\n" }
                  ],
                  "expression": {
                    "type": "Literal",
                    "value": "use strict",
                    "sourceElements": [
                      // Even simple AST nodes can have multiple source elements
                      // Any non-";" Punctuator terminates a Directive Prologue,
                      // as would a non-ExpressionStatement or non-Literal or non-string
                      { "element": "Punctuator", "value": "(" },
                      { "element": "StringLiteral", "value": "\"use strict\"" },
                      { "element": "Punctuator", "value": ")" }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  ]
}

gibson042 commented Aug 16, 2015

Concrete proposal (pun intended) matching my third suggestion: an optional Node#sourceElements array allowing original source to be recovered by depth-first concatenation. This maintains JSON-friendly acyclicity and keeps all information both non-duplicated and node-local (for less complex transformation/manipulation). Source rendering should prefer sourceElements when present, but verify that all child nodes are referenced in order (or alternatively insert them automatically along with necessary punctuators).

An example demonstrating representation of comments/whitespace and Directive Prologue edge cases:

0,function/*anonymous*/( ) {
  // Not a Use Strict Directive
  "use\
 strict";

  // Not a directive at all
  ("use strict")
} // last character is a <LF>

yields the following:

{
  "type": "Program",

  // new optional property: Node#sourceElements: [ SourceElement ]
  "sourceElements": [

    /* new interface
      Nonterminal <: SourceElement {
        reference: string (JSON Pointer relative to the parent AST node);
      }
    */
    // JSON Pointer could be replaced with a different suitably expressive language
    { "reference": "body/0" }
  ],
  "body": [
    {
      "type": "ExpressionStatement",
      "sourceElements": [
        { "reference": "expression" },

        /* new interface
          NonTokenTerminal <: SourceElement {
            element: string (WhiteSpace|LineTerminator|Comment*);
            value: string;
          }
        */
        { "element": "WhiteSpace", "value": " " },

        // Comment decomposition:
        // * `//`- or `/*`-prefixed head
        // * (possibly multi-line) body
        // * `*/`-suffixed tail, when applicable
        // Not strictly necessary, but may make manipulation & verification easier
        { "element": "CommentHead", "value": "//" },
        { "element": "CommentBody", "value": " last character is a <LF>" },
        { "element": "LineTerminator", "value": "\n" }
      ],
      "expression": {
        "type": "SequenceExpression",
        "sourceElements": [
          { "reference": "expressions/0" },

          /* new interface
            TokenTerminal <: SourceElement {
              element: string (Keyword|Identifier|Punctuator|Template*|*Literal);
              value: string;
            }
          */
          { "element": "Punctuator", "value": "," },
          { "reference": "expressions/1" }
        ],
        "expressions": [
          {
            "type": "Literal",
            "value": 0,

            // Note lack of `Literal#raw`—`sourceElements` subsumes such functionality
            "sourceElements": [
              { "element": "NumericLiteral", "value": "0" }
            ]
          },
          {
            "type": "FunctionExpression",
            "id": null,
            "params": [],
            "sourceElements": [
              { "element": "Keyword", "value": "function" },

              // Comments/whitespace can precede or follow any source element
              { "element": "CommentHead", "value": "/*" },
              { "element": "CommentBody", "value": "anonymous" },
              { "element": "CommentTail", "value": "*/" },
              { "element": "Punctuator", "value": "(" },
              { "element": "WhiteSpace", "value": " " },
              { "element": "Punctuator", "value": ")" },
              { "element": "WhiteSpace", "value": " " },
              { "reference": "body" }
            ],
            "body": {
              "type": "BlockStatement",
              "sourceElements": [
                { "element": "Punctuator", "value": "{" },
                { "reference": "body/0" },
                { "reference": "body/1" },
                { "element": "Punctuator", "value": "}" }
              ],
              "body": [
                {
                  "type": "ExpressionStatement",
                  "sourceElements": [
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "element": "CommentHead", "value": "//" },
                    { "element": "CommentBody", "value": " Not a Use Strict Directive" },
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "reference": "expression" },
                    { "element": "Punctuator", "value": ";" }
                  ],
                  "expression": {
                    "type": "Literal",

                    // `value` alone is not enough to detect the trickery
                    "value": "use strict",

                    // …but the corresponding StringLiteral is
                    "sourceElements": [
                      { "element": "StringLiteral", "value": "\"use\\\n strict\"" }
                    ]
                  }
                },
                {
                  "type": "ExpressionStatement",
                  "sourceElements": [
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "element": "CommentHead", "value": "//" },
                    { "element": "CommentBody", "value": " Not a directive at all" },
                    { "element": "LineTerminator", "value": "\n" },
                    { "element": "WhiteSpace", "value": "  " },
                    { "reference": "expression" },
                    { "element": "LineTerminator", "value": "\n" }
                  ],
                  "expression": {
                    "type": "Literal",
                    "value": "use strict",
                    "sourceElements": [
                      // Even simple AST nodes can have multiple source elements
                      // Any non-";" Punctuator terminates a Directive Prologue,
                      // as would a non-ExpressionStatement or non-Literal or non-string
                      { "element": "Punctuator", "value": "(" },
                      { "element": "StringLiteral", "value": "\"use strict\"" },
                      { "element": "Punctuator", "value": ")" }
                    ]
                  }
                }
              ]
            }
          }
        ]
      }
    }
  ]
}
@RReverser

This comment has been minimized.

Show comment
Hide comment
@RReverser

RReverser Aug 16, 2015

Member

@gibson042 Basically the same question as to any other proposed format - how are you going to keep AST and CST information in sync? Remember than any new structure should be backward-compatible and tools that handle AST only should be somehow resynced with CST data. Exact structure is not a problem (otherwise we would agree on some long time ago) but keeping data in sync is really a pain in... you know, head.

Member

RReverser commented Aug 16, 2015

@gibson042 Basically the same question as to any other proposed format - how are you going to keep AST and CST information in sync? Remember than any new structure should be backward-compatible and tools that handle AST only should be somehow resynced with CST data. Exact structure is not a problem (otherwise we would agree on some long time ago) but keeping data in sync is really a pain in... you know, head.

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Aug 17, 2015

how are you going to keep AST and CST information in sync?

As I see it, that breaks down into more than one problem:

  • parsing: Any node without sourceElements is still valid, so existing parsers are already compliant. And as long as sourceElements (where it appears, which can be a subset of a full Program tree) always includes at least the proper tokens (e.g., keywords and punctuators), it's essentially got an AST fallback—although the whole point is obviously inclusion of whitespace, newlines, and comments as well.
  • node additions:
    • Either omit sourceElements from the incoming node, or generate it with reasonable contents.
    • Either delete sourceElements on the parent, or add a reference to the incoming node.
  • node moves:
    • Either delete sourceElements on the old parent, or remove the reference to the node and update all following references (side note: this suggests that a selection language like <property>#next might be superior to JSON Pointer's <property>/<N>).
    • Either delete sourceElements on the new parent, or add a reference to the incoming node.
    • If desired, update whitespace/newlines associated with the migrating node.
  • literal manipulation: Either delete sourceElements, or update its *Literal.
  • other node manipulation: It's probably best to delete sourceElements, but obviously possible to sync it manually (e.g., by reusing comments and reusing/replacing whitespace and newlines).
  • rendering: Where sourceElements is defined, some sanity checking as described above would be prudent.

any new structure should be backward-compatible

👍

tools that handle AST only should be somehow resynced with CST data

delete node.sourceElements and sometimes delete parent.sourceElements, or update them along with other changes. Rendering operations can also be smart enough to detect mismatches, and in such cases might ignore sourceElements and generate a warning. Addition of these not-originally-planned elements to syntax trees certainly introduces complexity, but this approach attempts to minimize it by allowing for graceful fallback and gradual adoption.

gibson042 commented Aug 17, 2015

how are you going to keep AST and CST information in sync?

As I see it, that breaks down into more than one problem:

  • parsing: Any node without sourceElements is still valid, so existing parsers are already compliant. And as long as sourceElements (where it appears, which can be a subset of a full Program tree) always includes at least the proper tokens (e.g., keywords and punctuators), it's essentially got an AST fallback—although the whole point is obviously inclusion of whitespace, newlines, and comments as well.
  • node additions:
    • Either omit sourceElements from the incoming node, or generate it with reasonable contents.
    • Either delete sourceElements on the parent, or add a reference to the incoming node.
  • node moves:
    • Either delete sourceElements on the old parent, or remove the reference to the node and update all following references (side note: this suggests that a selection language like <property>#next might be superior to JSON Pointer's <property>/<N>).
    • Either delete sourceElements on the new parent, or add a reference to the incoming node.
    • If desired, update whitespace/newlines associated with the migrating node.
  • literal manipulation: Either delete sourceElements, or update its *Literal.
  • other node manipulation: It's probably best to delete sourceElements, but obviously possible to sync it manually (e.g., by reusing comments and reusing/replacing whitespace and newlines).
  • rendering: Where sourceElements is defined, some sanity checking as described above would be prudent.

any new structure should be backward-compatible

👍

tools that handle AST only should be somehow resynced with CST data

delete node.sourceElements and sometimes delete parent.sourceElements, or update them along with other changes. Rendering operations can also be smart enough to detect mismatches, and in such cases might ignore sourceElements and generate a warning. Addition of these not-originally-planned elements to syntax trees certainly introduces complexity, but this approach attempts to minimize it by allowing for graceful fallback and gradual adoption.

@hzoo

This comment has been minimized.

Show comment
Hide comment
@hzoo

hzoo Oct 7, 2015

Contributor

Just an update that we've finished up es6 nodes in https://github.com/cst/cst (tracking other proposals here) and planning on jscs integration soon to test it out

Contributor

hzoo commented Oct 7, 2015

Just an update that we've finished up es6 nodes in https://github.com/cst/cst (tracking other proposals here) and planning on jscs integration soon to test it out

@donaldpipowitch

This comment has been minimized.

Show comment
Hide comment
@donaldpipowitch

donaldpipowitch Oct 7, 2015

Nice! Thank you for the update. Didn't know about cst before.

donaldpipowitch commented Oct 7, 2015

Nice! Thank you for the update. Didn't know about cst before.

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Oct 7, 2015

@hzoo How compatible is your internal representation with my proposal? Or more to the point, how much effort would it take to input and output documents in such an updated ESTree format?

gibson042 commented Oct 7, 2015

@hzoo How compatible is your internal representation with my proposal? Or more to the point, how much effort would it take to input and output documents in such an updated ESTree format?

@markelog

This comment has been minimized.

Show comment
Hide comment
@markelog

markelog Oct 8, 2015

@gibson042 CST module works with AST and tokens only, so if we change format, we would need to change everything

markelog commented Oct 8, 2015

@gibson042 CST module works with AST and tokens only, so if we change format, we would need to change everything

gibson042 added a commit to gibson042/estree that referenced this issue Oct 31, 2015

@gibson042 gibson042 referenced a pull request that will close this issue Oct 31, 2015

Open

Define CST elements #107

@nzakas

This comment has been minimized.

Show comment
Hide comment
@nzakas

nzakas Nov 3, 2015

Contributor

Why are comments separated into parts? That seems like a good way to create individual syntax accidentally (alter CommentHead to change from block to line, then forget to remove CommentTail).

Contributor

nzakas commented Nov 3, 2015

Why are comments separated into parts? That seems like a good way to create individual syntax accidentally (alter CommentHead to change from block to line, then forget to remove CommentTail).

@mikesherov

This comment has been minimized.

Show comment
Hide comment
@mikesherov

mikesherov Nov 3, 2015

Contributor

Seeing as CST is WIP on JSCS, ideal is to have consensus formed here and documented. Lots of smart folk in this repo.

Ideally, even when CST is "finished" in JSCS, we should be striving to ultimately make that work useless by moving that work into parsers and generators anyway.

Nice work @hzoo and @markelog on getting a real POC out there!

Contributor

mikesherov commented Nov 3, 2015

Seeing as CST is WIP on JSCS, ideal is to have consensus formed here and documented. Lots of smart folk in this repo.

Ideally, even when CST is "finished" in JSCS, we should be striving to ultimately make that work useless by moving that work into parsers and generators anyway.

Nice work @hzoo and @markelog on getting a real POC out there!

@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 Nov 4, 2015

Why are comments separated into parts? That seems like a good way to create individual syntax accidentally (alter CommentHead to change from block to line, then forget to remove CommentTail).

@nzakas This is probably better answered on #107, but I would point out the analogous risk of modifying an atomic Comment in such a way that either the head or a necessary tail is removed—making the head and tail discrete is just an effort to capture the structure. But I'm not married to the idea, and would be happy to change it.

gibson042 commented Nov 4, 2015

Why are comments separated into parts? That seems like a good way to create individual syntax accidentally (alter CommentHead to change from block to line, then forget to remove CommentTail).

@nzakas This is probably better answered on #107, but I would point out the analogous risk of modifying an atomic Comment in such a way that either the head or a necessary tail is removed—making the head and tail discrete is just an effort to capture the structure. But I'm not married to the idea, and would be happy to change it.

@forivall

This comment has been minimized.

Show comment
Hide comment
@forivall

forivall Nov 4, 2015

My WIP names have been BlockCommentStart, BlockCommentBody, BlockCommentEnd, LineCommentStart and LineCommentBody. But I'm not married to these specific names either. However, I will need to make the head and body discrete, and it would be nice for that to be in this spec.

Or, the CommentTail of Line comments could have either a value of "" or "\n"

forivall commented Nov 4, 2015

My WIP names have been BlockCommentStart, BlockCommentBody, BlockCommentEnd, LineCommentStart and LineCommentBody. But I'm not married to these specific names either. However, I will need to make the head and body discrete, and it would be nice for that to be in this spec.

Or, the CommentTail of Line comments could have either a value of "" or "\n"

@nzakas

This comment has been minimized.

Show comment
Hide comment
@nzakas

nzakas Nov 4, 2015

Contributor

Some other use cases I think it's worthwhile to think through:

  1. Given a node, determine if it is surrounded by parentheses.
  2. Insert a new argument between two existing arguments of a CallExpression.
  3. Comment out a function.
  4. Combine multiple var statements into a single var statement.
Contributor

nzakas commented Nov 4, 2015

Some other use cases I think it's worthwhile to think through:

  1. Given a node, determine if it is surrounded by parentheses.
  2. Insert a new argument between two existing arguments of a CallExpression.
  3. Comment out a function.
  4. Combine multiple var statements into a single var statement.
@gibson042

This comment has been minimized.

Show comment
Hide comment
@gibson042

gibson042 commented Nov 6, 2015

@nzakas: Answered on the PR.

@mdevils

This comment has been minimized.

Show comment
Hide comment
@mdevils

mdevils Jan 25, 2016

Hello guys. I would say cst package is reaching maturity level: https://github.com/cst/cst

ES6 is completely implemented along with JSX extensions. Plus we managed to implement Scopes in a live tree (scopes are updating while CST tree is being changed). We have almost finished integrating CST into JSCS.

mdevils commented Jan 25, 2016

Hello guys. I would say cst package is reaching maturity level: https://github.com/cst/cst

ES6 is completely implemented along with JSX extensions. Plus we managed to implement Scopes in a live tree (scopes are updating while CST tree is being changed). We have almost finished integrating CST into JSCS.

@DanielSWolf

This comment has been minimized.

Show comment
Hide comment
@DanielSWolf

DanielSWolf Jun 19, 2017

Has there been any progress on this? I'm using Babel to migrate some in-house code. It's working fine, but creates all kinds of whitespace changes.

DanielSWolf commented Jun 19, 2017

Has there been any progress on this? I'm using Babel to migrate some in-house code. It's working fine, but creates all kinds of whitespace changes.

@danez

This comment has been minimized.

Show comment
Hide comment
@danez

danez Jun 19, 2017

@DanielSWolf You might be better off with using https://github.com/square/babel-codemod. Babel by default uses babel-generator which is meant to output generated sourcecode and not necessarily readable code for performance reasons. babel-codemod uses recast to print the AST which tries to preserve the original code as good as possible.

danez commented Jun 19, 2017

@DanielSWolf You might be better off with using https://github.com/square/babel-codemod. Babel by default uses babel-generator which is meant to output generated sourcecode and not necessarily readable code for performance reasons. babel-codemod uses recast to print the AST which tries to preserve the original code as good as possible.

@DanielSWolf

This comment has been minimized.

Show comment
Hide comment
@DanielSWolf

DanielSWolf Jun 19, 2017

Thanks @danez, I didn't know about that one! The only downside seems to be that it's not meant to be used as a library, but as a CLI. But that shouldn't be much of a problem.

DanielSWolf commented Jun 19, 2017

Thanks @danez, I didn't know about that one! The only downside seems to be that it's not meant to be used as a library, but as a CLI. But that shouldn't be much of a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment