Optionally compute lineOffsets, refs #67 #68

MikeRalphson · 2019-01-05T02:36:50Z

As discussed in #67 this PR adds the optional ability to compute line offsets and provides a helper function charPosToLineCol.

Both the AST and CST parse(Document) functions can now accept a computeLineOffsets option. Tests added to ensure default behaviour is unchanged and for the new functionality.

README.md has been minimally updated, but not yet the docs. I'd appreciate some guidance on doing that.

eemeli

I think your proposed API is not the right solution here, mainly for two reasons:

It adds a new options argument for parseCST(), which we've managed to avoid so far.
It requires knowing that you may want to access the line offsets when calling parseCST(), rather than optionally doing so later.

I don't have a complete solution envisioned here, but the user story that I'd like the line offset API to solve is something like, "I've parsed this string, and now I have this offset that I'd like to identify as a line/col position." How to do that efficiently is a further problem, and definitely of less importance than having a clean API. cst.lineOffsets does sound like a relatively right sort of place for caching, but that doesn't mean that we need to populate the cache pre-emptively.

What do you think? Sorry to leave this response a bit incomplete; I'll need to return to this a bit later.

eemeli · 2019-01-05T16:22:14Z

src/cst/parse.js

+    return { line: undefined, col: undefined }
+  const lineIndex = lineOffsets.indexOf(offset)
+  if (lineIndex >= 0)
+    return { line: lineIndex, col: offset - lineOffsets[lineIndex] }


This is special-case handling for when offset points to a \n character, yes? Why couldn't it be handled by the following for loop?

I may have optimised pre-emptively here (which is a mistake without measuring it). It just avoids the search-in-Javascript and allows the optimised indexOf method to have a chance. As the array is always sorted, perhaps a binary search-like approach would be better than this plus the loop, especially for large documents?

I'd recommend first going with just a simple for loop, and then seeing if there's a real-world case that would show a need for a more complicated search pattern.

eemeli · 2019-01-05T16:25:52Z

src/cst/parse.js

+    src = src.replace(/\n/g, (match, offset) => {
+      lf.push(offset + 1)
+      return '\n'
+    })


src.replace() is far too heavy an operation for this; use src.indexOf('\n', fromIndex) instead. There's no reason to mutate the string here.

Cool. Will amend in the next revision.

eemeli · 2019-01-05T16:34:39Z

src/index.js

-  return parseCST(src).map(cstDoc => new Document(options).parse(cstDoc))
+  return parseCST(src, {
+    computeLineOffsets: options ? options.computeLineOffsets : false
+  }).map(cstDoc => new Document(options).parse(cstDoc))


The result of the cst.map() operation will not contain the non-iterable properties of cst, which means that setting computeLineOffsets will not influence the output of this function.

Ok, will take another look.

eemeli · 2019-01-05T16:37:32Z

src/cst/parse.js

  return documents
 }
+
+export function charPosToLineCol(offset, lineOffsets) {


Is there a particular reason why this function is being exported from cst/parse.js, rather than its own file?

No, just was unsure of your convention on this.

MikeRalphson · 2019-01-06T05:17:03Z

I don't have a complete solution envisioned here ... What do you think? Sorry to leave this response a bit incomplete; I'll need to return to this a bit later.

I've got no problems with computing the lineOffsets on the first call to charPosToLineCol, we can use the presence or absence of cst.lineOffsets to do that if you're happy with a property appearing post-construction of the CST?

Would that (plus the above changes) be enough for me to do the next revision of this PR?

eemeli · 2019-01-07T07:04:46Z

I'm fine with lineOffsets getting added later. Did you have something in mind yet for locating errors, when encountered during YAML.parseDocument() or YAML.parse(), which don't (at least for now) provide easy accesss to the root cst object?

MikeRalphson · 2019-01-07T09:57:36Z

Did you have something in mind yet for locating errors, when encountered during YAML.parseDocument() or YAML.parse(), which don't (at least for now) provide easy accesss to the root cst object?

The likely (initial) recommendation I think would be to reparse the document(s) at the CST level. I don't know how acceptable you find that... We might be able to do something for the AST where keepCstNodes has been specified. When you say "for now", is access to the CST from higher levels likely to change?

I plan to follow this up with another PR (or a plugin, or a separate module) which can decorate the CST with JSON pointers allowing a lookup from the JS object to the CST, and then to the line/column position. JSON pointers being widely used in the JSON-Schema and OpenAPI worlds.

Just to make you aware, the other PR/plugin/module I have in mind is the optional ability to be able to preserve comments when parseing to, and stringifying from a JS object, in a configurable property key.

eemeli · 2019-01-07T10:53:43Z

If there's a good reason to extend the API, then we can do that -- but it does require a good reason. If your application is usually interested in the CST, then it makes sense to explicitly build the cst, and then use new YAML.Document(options).parse(cst) rather than YAML.parseDocument() to build the document -- and thereby maintain your own reference to the CST.

The main use case that I'd expect for line/col references outside of those cases would be errors, when such are not expected. It would be useful, when encountering one, to be able to use the error's context to build a line/col reference to the source. Then, if the offsets are cached as cst.lineOffsets, it may make sense to add a context.root ref to each CST Node in order to access that.

I've never really worked with JSON pointers, but I'd suspect that it'd be relatively simple to write a getter method for the AST Document that would help with that. Maybe something like Document#get(path: Iterable): Node?, so you could (separately) parse a JSON pointer like "/foo/0" into ['foo', 0] and use that to fetch the matching Node?

Regarding comments, what use case did you have in mind for the object with comments? My suspicion is that once you account for all the comments that YAML allows for, you'd end up with something very similar to the existing Document object.

MikeRalphson · 2019-01-07T11:11:27Z

Regarding comments, what use case did you have in mind for the object with comments?

Simply for passing into existing code which can only handle native JS objects in a fixed representation, and receiving it back (possibly mutated) and then writing it back out as YAML.

For example, a routine which converts an OpenAPI 2.0 document to OpenAPI 3.x. The comments could be converted into x-comment properties (and I understand there's complexity aound commentBefore etc) and then back into YAML comments on stringification. Hence possibly being a plugin or separate module rather than adding it to core.

…e line starts

… string

eemeli · 2019-01-26T19:19:42Z

@MikeRalphson I updated your branch with the changes I'd requested, so that this can be merged and released along with other recent changes.

Essentially, the public API is now a new getter rangeAsLinePos that's available on all CST nodes, along with a context.root reference for the current document. The latter is used by the former in order to cache the line start indices at cstdoc.lineStarts.

The getLinePos(offset, cst) function itself will be available at yaml/dist/cst/getLinePos, though that probably doesn't need to be documented.

For errors, the line positions are now available from err.source.rangeAsLinePos.

eemeli · 2019-01-27T12:03:22Z

Just as a heads-up, I realised after releasing 1.3.0 that zero-indexing the line/col positions was surprising, so pushed out a patch release 1.3.1 that fixes our behaviour to follow the norm of one-indexing these positions.

Optionally compute lineOffsets, refs eemeli#67

82a3644

eemeli requested changes Jan 5, 2019

View reviewed changes

MikeRalphson mentioned this pull request Jan 6, 2019

Consider moving from js-yaml to yaml module wework/speccy#243

Closed

eemeli mentioned this pull request Jan 16, 2019

Add collection accessor methods: add/delete/get/has/set #74

Merged

eemeli added 15 commits January 26, 2019 14:09

Merge branch 'master' into lineOffsets

a556e60

Drop lineOffsets from YAML options

9d22c92

In tests, split cst/getLinePos.js from cst/parse.js

ee12ab9

In src, split cst/getLinePos.js from cst/parse.js

cd8998d

Refactor findLineOffsets(src)

33dffde

Refactor charPosToLineCol(offset, lineOffsets)

69ebde8

Refactor charPosToLineCol() to require CST or string as 2nd arg; cach…

9ea5e8b

…e line starts

Stop exposing findLineOffsets(), rename export as default getLinePos()

c8ce905

Return undefined from getLinePos() if not found

7c6c4c5

Add JSDoc API comment for getLinePos()

0ede782

cst: Add a root reference to the parent document to each node context

06d0885

tests/doc: Split errors.js from corner-cases.js

58d75f5

Fix getLinePos() for positions on last line of non-newline-terminated…

79ffc81

… string

Add rangeAsLinePos getter to CST nodes

7e2d86e

Revert changes to README

a7576df

Update docs

2a244a0

eemeli merged commit 8dcc118 into eemeli:master Jan 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optionally compute lineOffsets, refs #67 #68

Optionally compute lineOffsets, refs #67 #68

MikeRalphson commented Jan 5, 2019

eemeli left a comment

eemeli Jan 5, 2019

MikeRalphson Jan 6, 2019

eemeli Jan 7, 2019

eemeli Jan 5, 2019

MikeRalphson Jan 6, 2019

eemeli Jan 5, 2019

MikeRalphson Jan 6, 2019

eemeli Jan 5, 2019

MikeRalphson Jan 6, 2019

MikeRalphson commented Jan 6, 2019

eemeli commented Jan 7, 2019

MikeRalphson commented Jan 7, 2019

eemeli commented Jan 7, 2019

MikeRalphson commented Jan 7, 2019

eemeli commented Jan 26, 2019

eemeli commented Jan 27, 2019

Optionally compute lineOffsets, refs #67 #68

Optionally compute lineOffsets, refs #67 #68

Conversation

MikeRalphson commented Jan 5, 2019

eemeli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MikeRalphson commented Jan 6, 2019

eemeli commented Jan 7, 2019

MikeRalphson commented Jan 7, 2019

eemeli commented Jan 7, 2019

MikeRalphson commented Jan 7, 2019

eemeli commented Jan 26, 2019

eemeli commented Jan 27, 2019