Skip to content

Commit

Permalink
New: Add ranges option to parse() and tokenize()
Browse files Browse the repository at this point in the history
  • Loading branch information
nzakas committed Jun 18, 2020
1 parent 4ba561a commit 5715b35
Show file tree
Hide file tree
Showing 39 changed files with 1,536 additions and 311 deletions.
33 changes: 11 additions & 22 deletions README.md
Expand Up @@ -6,7 +6,7 @@ If you find this useful, please consider supporting my work with a [donation](ht

## About

Momoa is an **experimental** general purpose JSON utility toolkit, containing:
Momoa is a general purpose JSON utility toolkit, containing:

* A **tokenizer** that allows you to separate a JSON string into its component parts.
* A ECMA-404 compliant **parser** that produces an abstract syntax tree (AST) representing everything inside of a JSON string.
Expand Down Expand Up @@ -41,7 +41,13 @@ const { parse } = require("@humanwhocodes/momoa");
const ast = parse(some_json_string);
```

If you want the tokens from the parsing operation returns as a proprety of the AST root, pass `tokens:true` as part of the second argument:
The `parse()` function accepts a second argument, which is an options object that may contain one or more of the following properties:

* `comments` - set to `true` if you want to parse C-style line and block comments inside of JSON.
* `ranges` - set to `true` if you want each node to also have a `range` property, which is an array containing the start and stop index for the syntax. If `tokens` is also `true`, then the tokens will also have `range` properties.
* `tokens` - set to `true` to return a `tokens` property on the root node containing all of the tokens used to parse the code. If `comments` is also `true`, then the tokens include comment tokens.

Here's an example of passing options:

```js
const { parse } = require("@humanwhocodes/momoa");
Expand All @@ -52,16 +58,6 @@ const ast = parse(some_json_string, { tokens: true });
console.dir(ast.tokens);
```

If you want to parse such that C-style comments are allowed in the JSON code, then pass `comments: true` as part of the second argument:

```js
const { parse } = require("@humanwhocodes/momoa");

const ast = parse(some_json_string_with_comments, { comments: true });
```

**Note:** If you use both `tokens:true` and `comments:true`, the returned tokens array will contain the comments along with the other syntax tokens.

### Tokenizing

To produce JSON tokens from a string, use the `tokenize()` function:
Expand All @@ -75,13 +71,10 @@ for (const token of tokenize(some_json_string)) {
}
```

If you want to tokenize C-style comments, then pass `comments:true` as part of the second argument:
The `tokenize()` function accepts a second parameter, which is an options object that may contain one or more of the following properties:

```js
const { tokenize } = require("@humanwhocodes/momoa");

const tokens = tokenize(some_json_string, { comments: true });
```
* `comments` - set to `true` if you want to tokenize C-style line and block comments inside of JSON.
* `ranges` - set to `true` if you want each token to also have a `range` property, which is an array containing the start and stop index for the syntax.

### Traversing

Expand Down Expand Up @@ -228,7 +221,3 @@ There are a number of programs that allow C-style comments in JSON files, most n
### Why are the source files in ESM and the test files are in CommonJS?

Unfortunately, Node.js still doesn't natively support ECMAScript Modules (ESM) and everyone generally expects npm packages to export things via CommonJS. As such, the source files are built (using Rollup) into a CommonJS package before publishing. To ensure that the published API is working correctly, it makes sense to write the tests in CommonJS and to pull in what would be the published package API.

### Is it safe to use this package in production?

No. Absolutely not. This package is still very much experimental and won't be undergoing a lot of maintenance and development until my health improves. I'm sharing it primarily as an educational tool rather than something to depend on in your production environment.
78 changes: 58 additions & 20 deletions src/parse.js
Expand Up @@ -17,6 +17,12 @@ import { UnexpectedToken, ErrorWithLocation } from "./errors.js";
// Helpers
//-----------------------------------------------------------------------------

const DEFAULT_OPTIONS = {
tokens: false,
comments: false,
ranges: false
};

/**
* Converts a JSON-encoded string into a JavaScript string, interpreting each
* escape sequence.
Expand Down Expand Up @@ -101,21 +107,6 @@ function getLiteralValue(token) {
}
}

function createLiteralNode(token) {
return {
type: token.type,
value: getLiteralValue(token),
loc: {
start: {
...token.loc.start
},
end: {
...token.loc.end
}
}
};
}

//-----------------------------------------------------------------------------
// Main Function
//-----------------------------------------------------------------------------
Expand All @@ -127,12 +118,22 @@ function createLiteralNode(token) {
* the AST.
* @param {boolean} [options.comments=false] Determines if comments are allowed
* in the JSON.
* @param {boolean} [options.ranges=false] Determines if ranges will be returned
* in addition to `loc` properties.
* @returns {Object} The AST representing the parsed JSON.
* @throws {Error} When there is a parsing error.
*/
export function parse(text, options = { tokens:false, comments:false }) {
export function parse(text, options) {

options = Object.freeze({
...DEFAULT_OPTIONS,
...options
});

const tokens = tokenize(text, { comments: !!options.comments });
const tokens = tokenize(text, {
comments: !!options.comments,
ranges: !!options.ranges
});
let tokenIndex = 0;

function nextNoComments() {
Expand Down Expand Up @@ -164,13 +165,39 @@ export function parse(text, options = { tokens:false, comments:false }) {
}
}

function createRange(start, end) {
return options.ranges ? {
range: [start.offset, end.offset]
} : undefined;
}

function createLiteralNode(token) {
const range = createRange(token.loc.start, token.loc.end);

return {
type: token.type,
value: getLiteralValue(token),
loc: {
start: {
...token.loc.start
},
end: {
...token.loc.end
}
},
...range
};
}


function parseProperty(token) {
assertTokenType(token, "String");
const name = createLiteralNode(token);

token = next();
assertTokenValue(token, ":");
const value = parseValue();
const range = createRange(name.loc.start, value.loc.end);

return t.member(name, value, {
loc: {
Expand All @@ -180,7 +207,8 @@ export function parse(text, options = { tokens:false, comments:false }) {
end: {
...value.loc.end
}
}
},
...range
});
}

Expand All @@ -207,6 +235,7 @@ export function parse(text, options = { tokens:false, comments:false }) {
}

assertTokenValue(token, "}");
const range = createRange(firstToken.loc.start, token.loc.end);

return t.object(members, {
loc: {
Expand All @@ -216,7 +245,8 @@ export function parse(text, options = { tokens:false, comments:false }) {
end: {
...token.loc.end
}
}
},
...range
});

}
Expand Down Expand Up @@ -244,6 +274,7 @@ export function parse(text, options = { tokens:false, comments:false }) {
}

assertTokenValue(token, "]");
const range = createRange(firstToken.loc.start, token.loc.end);

return t.array(elements, {
type: "Array",
Expand All @@ -255,7 +286,8 @@ export function parse(text, options = { tokens:false, comments:false }) {
end: {
...token.loc.end
}
}
},
...range
});

}
Expand Down Expand Up @@ -295,6 +327,7 @@ export function parse(text, options = { tokens:false, comments:false }) {
throw new UnexpectedToken(unexpectedToken);
}


const docParts = {
loc: {
start: {
Expand All @@ -307,11 +340,16 @@ export function parse(text, options = { tokens:false, comments:false }) {
}
}
};


if (options.tokens) {
docParts.tokens = tokens;
}

if (options.ranges) {
docParts.range = createRange(docParts.loc.start, docParts.loc.end);
}

return t.document(docBody, docParts);

}
22 changes: 19 additions & 3 deletions src/tokens.js
Expand Up @@ -18,6 +18,10 @@ const QUOTE = "\"";
const SLASH = "/";
const STAR = "*";

const DEFAULT_OPTIONS = {
comments: false,
ranges: false
};

function isWhitespace(c) {
return /[\s\n]/.test(c);
Expand Down Expand Up @@ -52,7 +56,12 @@ function isNumberStart(c) {
* @param {string} text The source text to tokenize.
* @returns {Iterator} An iterator over the tokens.
*/
export function tokenize(text, options = { comments: false }) {
export function tokenize(text, options) {

options = Object.freeze({
...DEFAULT_OPTIONS,
...options
});

// normalize line endings
text = text.replace(/\n\r?/g, "\n");
Expand All @@ -66,6 +75,12 @@ export function tokenize(text, options = { comments: false }) {


function createToken(tokenType, value, startLoc, endLoc) {

const endOffset = startLoc.offset + value.length;
let range = options.ranges ? {
range: [startLoc.offset, endOffset]
} : undefined;

return {
type: tokenType,
value,
Expand All @@ -74,9 +89,10 @@ export function tokenize(text, options = { comments: false }) {
end: endLoc || {
line: startLoc.line,
column: startLoc.column + value.length,
offset: startLoc.offset + value.length
offset: endOffset
}
}
},
...range
};
}

Expand Down

0 comments on commit 5715b35

Please sign in to comment.