Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally compute lineOffsets, refs #67 #68

Merged
merged 17 commits into from
Jan 27, 2019

Conversation

MikeRalphson
Copy link
Contributor

As discussed in #67 this PR adds the optional ability to compute line offsets and provides a helper function charPosToLineCol.

Both the AST and CST parse(Document) functions can now accept a computeLineOffsets option. Tests added to ensure default behaviour is unchanged and for the new functionality.

README.md has been minimally updated, but not yet the docs. I'd appreciate some guidance on doing that.

Copy link
Owner

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your proposed API is not the right solution here, mainly for two reasons:

  • It adds a new options argument for parseCST(), which we've managed to avoid so far.
  • It requires knowing that you may want to access the line offsets when calling parseCST(), rather than optionally doing so later.

I don't have a complete solution envisioned here, but the user story that I'd like the line offset API to solve is something like, "I've parsed this string, and now I have this offset that I'd like to identify as a line/col position." How to do that efficiently is a further problem, and definitely of less importance than having a clean API. cst.lineOffsets does sound like a relatively right sort of place for caching, but that doesn't mean that we need to populate the cache pre-emptively.

What do you think? Sorry to leave this response a bit incomplete; I'll need to return to this a bit later.

src/cst/parse.js Outdated
return { line: undefined, col: undefined }
const lineIndex = lineOffsets.indexOf(offset)
if (lineIndex >= 0)
return { line: lineIndex, col: offset - lineOffsets[lineIndex] }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is special-case handling for when offset points to a \n character, yes? Why couldn't it be handled by the following for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have optimised pre-emptively here (which is a mistake without measuring it). It just avoids the search-in-Javascript and allows the optimised indexOf method to have a chance. As the array is always sorted, perhaps a binary search-like approach would be better than this plus the loop, especially for large documents?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend first going with just a simple for loop, and then seeing if there's a real-world case that would show a need for a more complicated search pattern.

src/cst/parse.js Outdated
src = src.replace(/\n/g, (match, offset) => {
lf.push(offset + 1)
return '\n'
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src.replace() is far too heavy an operation for this; use src.indexOf('\n', fromIndex) instead. There's no reason to mutate the string here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Will amend in the next revision.

src/index.js Outdated
return parseCST(src).map(cstDoc => new Document(options).parse(cstDoc))
return parseCST(src, {
computeLineOffsets: options ? options.computeLineOffsets : false
}).map(cstDoc => new Document(options).parse(cstDoc))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of the cst.map() operation will not contain the non-iterable properties of cst, which means that setting computeLineOffsets will not influence the output of this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will take another look.

src/cst/parse.js Outdated
return documents
}

export function charPosToLineCol(offset, lineOffsets) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a particular reason why this function is being exported from cst/parse.js, rather than its own file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just was unsure of your convention on this.

@MikeRalphson
Copy link
Contributor Author

I don't have a complete solution envisioned here ... What do you think? Sorry to leave this response a bit incomplete; I'll need to return to this a bit later.

I've got no problems with computing the lineOffsets on the first call to charPosToLineCol, we can use the presence or absence of cst.lineOffsets to do that if you're happy with a property appearing post-construction of the CST?

Would that (plus the above changes) be enough for me to do the next revision of this PR?

@eemeli
Copy link
Owner

eemeli commented Jan 7, 2019

I'm fine with lineOffsets getting added later. Did you have something in mind yet for locating errors, when encountered during YAML.parseDocument() or YAML.parse(), which don't (at least for now) provide easy accesss to the root cst object?

@MikeRalphson
Copy link
Contributor Author

Did you have something in mind yet for locating errors, when encountered during YAML.parseDocument() or YAML.parse(), which don't (at least for now) provide easy accesss to the root cst object?

The likely (initial) recommendation I think would be to reparse the document(s) at the CST level. I don't know how acceptable you find that... We might be able to do something for the AST where keepCstNodes has been specified. When you say "for now", is access to the CST from higher levels likely to change?

I plan to follow this up with another PR (or a plugin, or a separate module) which can decorate the CST with JSON pointers allowing a lookup from the JS object to the CST, and then to the line/column position. JSON pointers being widely used in the JSON-Schema and OpenAPI worlds.

Just to make you aware, the other PR/plugin/module I have in mind is the optional ability to be able to preserve comments when parseing to, and stringifying from a JS object, in a configurable property key.

@eemeli
Copy link
Owner

eemeli commented Jan 7, 2019

If there's a good reason to extend the API, then we can do that -- but it does require a good reason. If your application is usually interested in the CST, then it makes sense to explicitly build the cst, and then use new YAML.Document(options).parse(cst) rather than YAML.parseDocument() to build the document -- and thereby maintain your own reference to the CST.

The main use case that I'd expect for line/col references outside of those cases would be errors, when such are not expected. It would be useful, when encountering one, to be able to use the error's context to build a line/col reference to the source. Then, if the offsets are cached as cst.lineOffsets, it may make sense to add a context.root ref to each CST Node in order to access that.

I've never really worked with JSON pointers, but I'd suspect that it'd be relatively simple to write a getter method for the AST Document that would help with that. Maybe something like Document#get(path: Iterable): Node?, so you could (separately) parse a JSON pointer like "/foo/0" into ['foo', 0] and use that to fetch the matching Node?

Regarding comments, what use case did you have in mind for the object with comments? My suspicion is that once you account for all the comments that YAML allows for, you'd end up with something very similar to the existing Document object.

@MikeRalphson
Copy link
Contributor Author

Regarding comments, what use case did you have in mind for the object with comments?

Simply for passing into existing code which can only handle native JS objects in a fixed representation, and receiving it back (possibly mutated) and then writing it back out as YAML.

For example, a routine which converts an OpenAPI 2.0 document to OpenAPI 3.x. The comments could be converted into x-comment properties (and I understand there's complexity aound commentBefore etc) and then back into YAML comments on stringification. Hence possibly being a plugin or separate module rather than adding it to core.

@eemeli
Copy link
Owner

eemeli commented Jan 26, 2019

@MikeRalphson I updated your branch with the changes I'd requested, so that this can be merged and released along with other recent changes.

Essentially, the public API is now a new getter rangeAsLinePos that's available on all CST nodes, along with a context.root reference for the current document. The latter is used by the former in order to cache the line start indices at cstdoc.lineStarts.

The getLinePos(offset, cst) function itself will be available at yaml/dist/cst/getLinePos, though that probably doesn't need to be documented.

For errors, the line positions are now available from err.source.rangeAsLinePos.

@eemeli eemeli merged commit 8dcc118 into eemeli:master Jan 27, 2019
@eemeli
Copy link
Owner

eemeli commented Jan 27, 2019

Just as a heads-up, I realised after releasing 1.3.0 that zero-indexing the line/col positions was surprising, so pushed out a patch release 1.3.1 that fixes our behaviour to follow the norm of one-indexing these positions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants