Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.raw property of literal nodes, or something similar #14

Open
dherman opened this issue Feb 13, 2015 · 20 comments
Open

.raw property of literal nodes, or something similar #14

dherman opened this issue Feb 13, 2015 · 20 comments

Comments

@dherman
Copy link

dherman commented Feb 13, 2015

Discussed some in #6. The raw property of literal nodes is an Esprima extension.

I have a specific use case for the raw property, or at least for more information than is provided by value: in asm.js, the distinction between an int literal and float literal is significant and distinguished by the type system. Writing an asm.js validator in JS requires the ability to distinguish between e.g. 17.0 and 17, but the value property does not distinguish these.

OTOH, @michaelficarra points out that AST nodes might be produced by tools that don't want to have to specify the raw source. A few options I can see:

  • just leave raw unspecified and treat asm.js as a special case that is adequately served by Esprima's extended behavior
  • add a spec for additional node data to be produced by parsers but optional for other tools, and include raw in that
  • specify just the coarse-grained "type" of the lexeme, like type: "int" | "float" | "string" | "boolean" | "null" | "RegExp"
  • specify finer-grained lexical class information, something like type: "hex" | "octal" | "decimal" | "float" | "boolean" | "null" | "RegExp"

I am hesitant to generalize based on the one use case of asm.js. Without knowing of other use cases, I think the most conservative step is just to document the existing practice of raw in an optional spec.

Thoughts?

@RReverser
Copy link
Member

Acorn also produces raw property, but I'm not sure if it's something that should be standardized as a "must" for all the parsers as raw representation can be easily retrieved using original code string + Literal's range property.

@ariya
Copy link
Contributor

ariya commented Feb 13, 2015

@RReverser That requires the node location to be specified. A code generator that takes an AST may not need the location at all, but it can benefit from raw.

@RReverser
Copy link
Member

@ariya Makes sense, thanks for a good example.

@michaelficarra
Copy link
Member

Remember, SourceLocation (which all Nodes may have)

interface SourceLocation {
    source: string | null;
    start: Position;
    end: Position;
}

has an optional source. Unfortunately, it would require you to have a start/end position listed as well. I think specifying esprima's raw member is fine.

Also notable, @RReverser: range is not part of SourceLocation. It is yet another esprima extension.

@RReverser
Copy link
Member

@michaelficarra Yes, I already agreed that possible benefits of raw are clear :)

@mikesherov
Copy link
Contributor

This delves into the realm of a CST, which has lots of interested parties... Perhaps we can defer this discussion until we have a comprehensive discussion of what a concrete syntax tree would look like?

@getify
Copy link
Contributor

getify commented Mar 4, 2015

This delves into the realm of a CST, which has lots of interested parties...

#41

@abraidwood
Copy link

I'm not sure if this comment belongs to this issue, or should be a new one, but here goes.

The type of the 'value' property (and 'raw' if added) of Literal cannot be determined. Type switching is bad for performance in JITs, so I've been experimenting with multiple Literals which have type assistance.

So rather than
function Literal() {
this.type = 'Literal';
this.value = null;
this.loc = null;
}

You have
function LiteralNumber() {
this.type = 'Literal';
this.value = 0;
this.loc = null;
}
function LiteralTrue() {
this.type = 'Literal';
this.value = true;
this.loc = null;
}
etc

They are all forms of Literal, but have some minor perf assistance

@sebmck
Copy link

sebmck commented Mar 9, 2015

@abraidwood Yep, Literal is overloaded with a lot of different type representations. Nothing can be done about it due to backwards compatibility.

@abraidwood
Copy link

@sebmck I'm not sure that backwards compatibility is such a problem - the node created is the same except for the actually class that created it, so any instanceof like operations would fail, but anything checking Node.type would be fine.

I'm doing this with my experimental acorn derived parser here https://github.com/abraidwood/overture and specifically in this file https://github.com/abraidwood/overture/blob/master/overture-pre.js

@sebmck
Copy link

sebmck commented Mar 9, 2015

Oh right, you're suggesting different constructors for the nodes not different node types. Not relevant to ESTree since it only defines the AST representation and no other parser behaviour.

@abraidwood
Copy link

Ok, if the 'Literal' on the node is just referring to the type within it, then you're right, it doesn't matter for ESTree.

@dead-claudia
Copy link
Contributor

I would also like to mention that V8 can still pick out all of those
possible types in function arguments without a full deopt (I figured out
through experimentation), and I suspect other parsers are the same. Now, in
terms of parsing, I highly doubt that such a small optimization would
actually matter in the realm of parsers.

Now, adding a kind field with the type (number, boolean, null, undefined,
regex, etc.) would be insanely convenient, as I find myself very frequently
type checking the field itself, and I find that way too crude and hackish.
I rarely actually need the actual value of the node itself in practice, and
the literal could just as easily be constructed.

If I could redesign the node, without worries about anything like dependent
tooling, etc. (which is admittedly utopian), it would be something like
this:

interface Literal <: Node {
  kind: "string" |
    "boolean" |
    "null" |
    "undefined" |
    "number" |
    "regexp";
  raw: string;
}

// Examples
kind: "string"
raw: "\"foo\"

kind: "number"
raw: "4.2"

kind: "null"
raw: "null"

kind: "undefined"
raw: "undefined"

kind: "regexp"
raw: "/foo/"

kind: "regexp"
raw: "/foo/i"

kind: "boolean"
raw: "true"

// Usage
let isType = (node, type) =>
  node.type === "Literal" &&
  node.kind === type;

function assertType(node, type) {
  if (!isType(node, type)) {
    throw new TypeError();
  }
}

function isHex(node) {
  assertType(node, "number");
  return /0x/i.test(node.raw);
}

function getValue(node) {
  if (node.type !== "Literal") {
    throw new TypeError();
  }

  switch (node.kind) {
  case "string":
    let {raw} = node;
    // lint for JSON inconsistencies
    if (!/(["'])[^\u2028\u2029]\1/.test(raw)) {
      throw new SyntaxError();
    }
    return JSON.parse(raw);

  case "number": return Number(node.raw);
  case "null": return null;
  case "undefined": return undefined;
  case "boolean": return Boolean(node.raw);

  case "regexp":
    let parts = node.raw.split("/").slice(1);
    let flags = parts.pop();
    let source = parts.join("/");
    return RegExp(source, flags);

  default: throw new TypeError();
}

I am well aware that such a breaking change wouldn't work in practice, but
a simple kind property addition would be enormously useful, especially
with code refactoring and basic type checking (the latter would be far
simpler).

@RReverser
Copy link
Member

@IMPinball kind proposal sounds good at first glance, but better to move it to separate issue for relevant discussion. Could you please create one?

@dead-claudia
Copy link
Contributor

Done: #61
On Mar 12, 2015 12:52 PM, "Ingvar Stepanyan" notifications@github.com
wrote:

@IMPinball https://github.com/impinball kind proposal sounds good at
first glance, but better to move it to separate issue for relevant
discussion. Could you please create one?


Reply to this email directly or view it on GitHub
#14 (comment).

@gibson042
Copy link

Would introduction of raw come with an abandonment of RegexLiteral from #27? regex is entirely redundant when coexisting with raw.

@dead-claudia
Copy link
Contributor

@gibson042 Doubt it. Granted, from a raw property, it would be relatively easy to get each part from it:

let [, source, flags] = /^\/(.*)/([a-z]*)$/.exec(regexNode.raw);

I like my regex/destructuring-based one-liners. ;)

@RReverser
Copy link
Member

@IMPinball

< /^\/(.*)/([a-z]*)$/
> SyntaxError: expected expression, got ')'

I guess you missed at least one more escape character for /.

@dead-claudia
Copy link
Contributor

Oops. It should've been this:

let [, source, flags] = /^\/(.*)\/([a-z]*)$/.exec(regexNode.raw);

@cpcallen
Copy link

I don't think I'm qualified to comment on whether .raw should be part of the ESTree spec, but I will note that it has in practice proven to be very useful when consuming ESTree parse trees converted to JSON, because not all non-JS JSON libraries are able to reliably deal with the .value field having variable type (e.g., when unmarshaling into a statically-typed Go struct).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests