.raw property of literal nodes, or something similar #14

dherman · 2015-02-13T08:37:40Z

Discussed some in #6. The raw property of literal nodes is an Esprima extension.

I have a specific use case for the raw property, or at least for more information than is provided by value: in asm.js, the distinction between an int literal and float literal is significant and distinguished by the type system. Writing an asm.js validator in JS requires the ability to distinguish between e.g. 17.0 and 17, but the value property does not distinguish these.

OTOH, @michaelficarra points out that AST nodes might be produced by tools that don't want to have to specify the raw source. A few options I can see:

just leave raw unspecified and treat asm.js as a special case that is adequately served by Esprima's extended behavior
add a spec for additional node data to be produced by parsers but optional for other tools, and include raw in that
specify just the coarse-grained "type" of the lexeme, like type: "int" | "float" | "string" | "boolean" | "null" | "RegExp"
specify finer-grained lexical class information, something like type: "hex" | "octal" | "decimal" | "float" | "boolean" | "null" | "RegExp"

I am hesitant to generalize based on the one use case of asm.js. Without knowing of other use cases, I think the most conservative step is just to document the existing practice of raw in an optional spec.

Thoughts?

The text was updated successfully, but these errors were encountered:

RReverser · 2015-02-13T09:11:16Z

Acorn also produces raw property, but I'm not sure if it's something that should be standardized as a "must" for all the parsers as raw representation can be easily retrieved using original code string + Literal's range property.

ariya · 2015-02-13T14:03:02Z

@RReverser That requires the node location to be specified. A code generator that takes an AST may not need the location at all, but it can benefit from raw.

RReverser · 2015-02-13T14:25:07Z

@ariya Makes sense, thanks for a good example.

michaelficarra · 2015-02-13T17:08:56Z

Remember, SourceLocation (which all Nodes may have)

interface SourceLocation {
    source: string | null;
    start: Position;
    end: Position;
}

has an optional source. Unfortunately, it would require you to have a start/end position listed as well. I think specifying esprima's raw member is fine.

Also notable, @RReverser: range is not part of SourceLocation. It is yet another esprima extension.

RReverser · 2015-02-13T17:15:18Z

@michaelficarra Yes, I already agreed that possible benefits of raw are clear :)

mikesherov · 2015-02-14T13:33:56Z

This delves into the realm of a CST, which has lots of interested parties... Perhaps we can defer this discussion until we have a comprehensive discussion of what a concrete syntax tree would look like?

getify · 2015-03-04T05:53:50Z

This delves into the realm of a CST, which has lots of interested parties...

#41

abraidwood · 2015-03-09T08:20:53Z

I'm not sure if this comment belongs to this issue, or should be a new one, but here goes.

The type of the 'value' property (and 'raw' if added) of Literal cannot be determined. Type switching is bad for performance in JITs, so I've been experimenting with multiple Literals which have type assistance.

So rather than
function Literal() {
this.type = 'Literal';
this.value = null;
this.loc = null;
}

You have
function LiteralNumber() {
this.type = 'Literal';
this.value = 0;
this.loc = null;
}
function LiteralTrue() {
this.type = 'Literal';
this.value = true;
this.loc = null;
}
etc

They are all forms of Literal, but have some minor perf assistance

sebmck · 2015-03-09T08:21:46Z

@abraidwood Yep, Literal is overloaded with a lot of different type representations. Nothing can be done about it due to backwards compatibility.

abraidwood · 2015-03-09T08:28:11Z

@sebmck I'm not sure that backwards compatibility is such a problem - the node created is the same except for the actually class that created it, so any instanceof like operations would fail, but anything checking Node.type would be fine.

I'm doing this with my experimental acorn derived parser here https://github.com/abraidwood/overture and specifically in this file https://github.com/abraidwood/overture/blob/master/overture-pre.js

sebmck · 2015-03-09T08:30:50Z

Oh right, you're suggesting different constructors for the nodes not different node types. Not relevant to ESTree since it only defines the AST representation and no other parser behaviour.

abraidwood · 2015-03-09T08:36:14Z

Ok, if the 'Literal' on the node is just referring to the type within it, then you're right, it doesn't matter for ESTree.

dead-claudia · 2015-03-09T09:57:12Z

I would also like to mention that V8 can still pick out all of those
possible types in function arguments without a full deopt (I figured out
through experimentation), and I suspect other parsers are the same. Now, in
terms of parsing, I highly doubt that such a small optimization would
actually matter in the realm of parsers.

Now, adding a kind field with the type (number, boolean, null, undefined,
regex, etc.) would be insanely convenient, as I find myself very frequently
type checking the field itself, and I find that way too crude and hackish.
I rarely actually need the actual value of the node itself in practice, and
the literal could just as easily be constructed.

If I could redesign the node, without worries about anything like dependent
tooling, etc. (which is admittedly utopian), it would be something like
this:

interface Literal <: Node {
  kind: "string" |
    "boolean" |
    "null" |
    "undefined" |
    "number" |
    "regexp";
  raw: string;
}

// Examples
kind: "string"
raw: "\"foo\"

kind: "number"
raw: "4.2"

kind: "null"
raw: "null"

kind: "undefined"
raw: "undefined"

kind: "regexp"
raw: "/foo/"

kind: "regexp"
raw: "/foo/i"

kind: "boolean"
raw: "true"

// Usage
let isType = (node, type) =>
  node.type === "Literal" &&
  node.kind === type;

function assertType(node, type) {
  if (!isType(node, type)) {
    throw new TypeError();
  }
}

function isHex(node) {
  assertType(node, "number");
  return /0x/i.test(node.raw);
}

function getValue(node) {
  if (node.type !== "Literal") {
    throw new TypeError();
  }

  switch (node.kind) {
  case "string":
    let {raw} = node;
    // lint for JSON inconsistencies
    if (!/(["'])[^\u2028\u2029]\1/.test(raw)) {
      throw new SyntaxError();
    }
    return JSON.parse(raw);

  case "number": return Number(node.raw);
  case "null": return null;
  case "undefined": return undefined;
  case "boolean": return Boolean(node.raw);

  case "regexp":
    let parts = node.raw.split("/").slice(1);
    let flags = parts.pop();
    let source = parts.join("/");
    return RegExp(source, flags);

  default: throw new TypeError();
}

I am well aware that such a breaking change wouldn't work in practice, but
a simple kind property addition would be enormously useful, especially
with code refactoring and basic type checking (the latter would be far
simpler).

RReverser · 2015-03-12T16:52:24Z

@IMPinball kind proposal sounds good at first glance, but better to move it to separate issue for relevant discussion. Could you please create one?

dead-claudia · 2015-03-13T02:49:14Z

Done: #61
On Mar 12, 2015 12:52 PM, "Ingvar Stepanyan" notifications@github.com
wrote:

@IMPinball https://github.com/impinball kind proposal sounds good at
first glance, but better to move it to separate issue for relevant
discussion. Could you please create one?

—
Reply to this email directly or view it on GitHub
#14 (comment).

gibson042 · 2015-03-16T13:29:32Z

Would introduction of raw come with an abandonment of RegexLiteral from #27? regex is entirely redundant when coexisting with raw.

dead-claudia · 2015-03-17T21:57:46Z

@gibson042 Doubt it. Granted, from a raw property, it would be relatively easy to get each part from it:

let [, source, flags] = /^\/(.*)/([a-z]*)$/.exec(regexNode.raw);

I like my regex/destructuring-based one-liners. ;)

RReverser · 2015-03-17T22:00:14Z

@IMPinball

< /^\/(.*)/([a-z]*)$/
> SyntaxError: expected expression, got ')'

I guess you missed at least one more escape character for /.

dead-claudia · 2015-03-17T22:08:56Z

Oops. It should've been this:

let [, source, flags] = /^\/(.*)\/([a-z]*)$/.exec(regexNode.raw);

cpcallen · 2017-03-22T11:03:18Z

I don't think I'm qualified to comment on whether .raw should be part of the ESTree spec, but I will note that it has in practice proven to be very useful when consuming ESTree parse trees converted to JSON, because not all non-JS JSON libraries are able to reliably deal with the .value field having variable type (e.g., when unmarshaling into a statically-typed Go struct).

sebmck added extension interopability labels Feb 16, 2015

abraidwood mentioned this issue Mar 11, 2015

SourceLocation and byte position #53

Open

gibson042 mentioned this issue Mar 15, 2015

Add kind #63

Closed

sebmck mentioned this issue Mar 15, 2015

ASTs should be JSON compatible #64

Open

jasonLaster mentioned this issue Aug 13, 2015

Closes estree/estree#6 #99

Closed

adrianheine mentioned this issue Dec 20, 2017

Add BigInt #179

Closed

adrianheine added the CST label Feb 28, 2019

This was referenced Mar 16, 2021

feat!: experimentally support remark-mdx@2 mdx-js/eslint-mdx#284

Merged

check node.raw before checking because it's not standard and may break custom parser eslint/eslint#14219

Closed

fisker mentioned this issue Oct 30, 2022

Add raw to Identifier? #291

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.raw property of literal nodes, or something similar #14

.raw property of literal nodes, or something similar #14

dherman commented Feb 13, 2015

RReverser commented Feb 13, 2015

ariya commented Feb 13, 2015

RReverser commented Feb 13, 2015

michaelficarra commented Feb 13, 2015

RReverser commented Feb 13, 2015

mikesherov commented Feb 14, 2015

getify commented Mar 4, 2015

abraidwood commented Mar 9, 2015

sebmck commented Mar 9, 2015

abraidwood commented Mar 9, 2015

sebmck commented Mar 9, 2015

abraidwood commented Mar 9, 2015

dead-claudia commented Mar 9, 2015

RReverser commented Mar 12, 2015

dead-claudia commented Mar 13, 2015

gibson042 commented Mar 16, 2015

dead-claudia commented Mar 17, 2015

RReverser commented Mar 17, 2015

dead-claudia commented Mar 17, 2015

cpcallen commented Mar 22, 2017

.raw property of literal nodes, or something similar #14

.raw property of literal nodes, or something similar #14

Comments

dherman commented Feb 13, 2015

RReverser commented Feb 13, 2015

ariya commented Feb 13, 2015

RReverser commented Feb 13, 2015

michaelficarra commented Feb 13, 2015

RReverser commented Feb 13, 2015

mikesherov commented Feb 14, 2015

getify commented Mar 4, 2015

abraidwood commented Mar 9, 2015

sebmck commented Mar 9, 2015

abraidwood commented Mar 9, 2015

sebmck commented Mar 9, 2015

abraidwood commented Mar 9, 2015

dead-claudia commented Mar 9, 2015

RReverser commented Mar 12, 2015

dead-claudia commented Mar 13, 2015

gibson042 commented Mar 16, 2015

dead-claudia commented Mar 17, 2015

RReverser commented Mar 17, 2015

dead-claudia commented Mar 17, 2015

cpcallen commented Mar 22, 2017