-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parsimmon.index yields incorrect position for tokens right before a newline ('\n') #331
Comments
Hi @hillin, This is actually worked as intended. Since Parsimmon has already finished parsing
After successfully parsing a string, Parsimmon advances to the next character. I understand why this is confusing, but this is both internally easier and what a lot of APIs expect for source code ranges. The second parse is actually past the end of the document for this reason as well. |
Thanks @wavebeem . const parser = P.seqMap(P.index, P.digits, P.index, P.newline, P.index, function (start, value, middle, newline, end) {
console.log(start, value, middle, newline, end);
return value;
});
parser.parse('1234\n'); The output is:
This shows that Parsimmon treats the newline character In reality, we are using Parsimmon to create a parser for a DSL, which then will be used in the monaco editor. Parsimmon's interpretation of newline has created a little difficulty to us when working with monaco's text range API (e.g. https://microsoft.github.io/monaco-editor/api/interfaces/monaco.editor.IMarkerData.html), which does not work as expected if the end of a range is, as |
I can see the argument both ways and I'm pretty sure I've used software that expects the end of the token to exclusive rather than inclusive. var index = Parsimmon(function(input, i) {
return makeSuccess(i, makeLineColumnIndex(input, i));
});
///////////////////////////////////////////////////////////////////////////
var lastIndex = Parsimmon(function(input, i) {
return makeSuccess(i, makeLineColumnIndex(input, i - 1));
});
/////////////////////////////////////////////////////////////////////////// At a glance this seems to do what you want. You'll have to add it directly into parsimmon since Due to the ridiculous complexity of this function, it's not something I'm interested in exposing at this time, but I could maybe see the case for a
|
Strictly speaking my suggestion still won't use your preferred logic for newline line/column numbering, but it will skip the issue as long as you don't actually need to include the newline in your range. Otherwise, if you really really want the |
Well I think here is the misunderstanding: I'm totally OK to use an exclusive end index for a token, and I consent it's the right way to do this. My point is, the end index should not span into the next line if the token does not. So for the example in the original post, for both input I'd expect the end index of |
Here is another example: for the input
|
This still seems to me like the ideal solution is to use inclusive ranges.
I see your point more and more. In UNIX, a line is terminated with a newline, not separated by it. it definitely feels like it's part of the line, like you said. This change won't happen within Parsimmon v1 since it would be a breaking change, but I'll definitely think about this going forward. I'm going to reference this issue in #230 because it's important to think about for v2. I may end up implementing a change for this in my sister library wavebeem/bread-n-butter#34 first to see how I feel about it. But I don't plan on working on it any time soon. |
To me exclusive ending index is still the legitimate way to go. We just need some kind of virtual index, which does not actually map to a character in the source. This is the standard design of almost all range constructs, the Range in javascript selection API, System.Range in .net and range() in python to name a few. One thought is, we don't have to change the behavior of function mark<T>(parser: Parser<T>): Parser<{value: T; begin: Index; end: Index}>;
// in which the end index should be the ideal result mentioned above Actually this could be very useful (in my use cases) because I find myself always using the |
The mark method already does this. But .mark doesn't take any options right now. You could add an options object to that method that implements the alternative line number behavior you suggested. I would certainly review the PR if you want to work on it. https://github.com/jneen/parsimmon/blob/master/API.md#parsermark |
Oops, I've missed that one.
I'll see what I can do with it. BTW for now we've switched to line-based parsing, as a workaround of this issue; as well as better supporting for partially tokenization (as expected by the monaco editor). |
In a recent commit, various packages were updated to more recent versions. Parsimmon, in particular, was bumped from 1.16.0 to 1.18.1. However, starting from version 1.17, the semantic of getting the position of the parser's head after reading a newline character (\n) seems to have changed, as suggested by jneen/parsimmon#331 (instead of returning a position at the end of line N where \n was read, it returns a position at the beginning of line N+1). This was a breaking change, given how i-LaTeX's parser is implemented. For this reason, the package was reverted to version 1.16.0.
Closing this since the project is currently not maintained and it doesn't seem like that will change soon. |
outputs:
I suppose these two outputs should be identical.
The text was updated successfully, but these errors were encountered: