Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntactic predicates cause nonsensical error messages #194

Closed
KamilaBorowska opened this issue Jul 31, 2013 · 6 comments
Closed

Syntactic predicates cause nonsensical error messages #194

KamilaBorowska opened this issue Jul 31, 2013 · 6 comments
Labels
Milestone

Comments

@KamilaBorowska
Copy link

With the following grammar (that matches stuff like "abc", "aabbcc", "aaabbbccc"...), empty test code shows 'Line 1, column 1: Expected end of input but end of input found'. I think it should say 'Expected "a"' instead, as lookahead is looking for 'a'.

S = &(A !'b') 'a'+ B
A = 'a' A? 'b'
B = 'b' B? 'c'
@dmajda
Copy link
Contributor

dmajda commented Aug 16, 2013

Thanks for the report. I plan to look into polishing error messages in 0.9 timeframe. I'll also have a look at this problem at that time.

@dmajda dmajda changed the title Lookahead makes nonsensical error messages Syntactic predicates cause nonsensical error messages Aug 14, 2015
@dmajda dmajda modified the milestones: 0.10.0, 0.9.0 Aug 14, 2015
@dmajda
Copy link
Contributor

dmajda commented Jun 28, 2016

In 0.9.0 and current master, the message changed to “Expected undefined but end of input found.”.

Minimal grammar reproducing the problem (with empty input):

start = &"a"

The bug is caused by a combination of two issues:

  1. & and ! suppress error reporting, which means they don’t produce any expectations on match failure.
  2. The message-building code doesn’t expect an empty expectation list and can’t deal with it correctly.

I’m not sure about the right solution yet.

@Mingun
Copy link
Contributor

Mingun commented Jun 28, 2016

I assume that it can be fixed if in &-predicates the expectations will be produced, but they will still not generate an error. For ! predicates, perhaps, it is necessary to negative captured expectations (!"a" must produce expectation expected not "a"), but I can't invent an example in which this behavior would be required.

@dmajda
Copy link
Contributor

dmajda commented Jul 3, 2016

I assume that it can be fixed if in &-predicates the expectations will be produced, but they will still not generate an error.

Yes.

For ! predicates, perhaps, it is necessary to negative captured expectations (!"a" must produce expectation expected not "a"), but I can't invent an example in which this behavior would be required.

I’m thinking along similar lines — adding a negative reporting flag flipped by the ! predicate. It would cause the expectations generated by matching inside the ! predicate to be reported negated, except that instead of “not X” I’d use “something else than X”, which works better grammatically. Using ! inside ! would flip the flag again, so that e.g. a failed match in !(!"a")) would get reported positively, which corresponds to the semantics of this expression.

One thing I’m not sure about is how to express negative expectations in the expected property of syntax errors. For example, what expectations should matching an empty input with this grammar produce?

start = "a" / !"b" . / "c"

One approach is to embed the value of the negative reporting flag into the expectations:

[
  { type: "literal", text: "a", ignoreCase: false, negative: false },
  { type: "literal", text: "b", ignoreCase: false, negative: true },
  { type: "literal", text: "c", ignoreCase: false, negative: false }
]

Another approach is to introduce a new expectation type to express negation and use expectation nesting:

[
  { type: "literal", text: "a", ignoreCase: false },
  {
    type: "not",
    expectation: { type: "literal", text: "b", ignoreCase: false }
  },
  { type: "literal", text: "c", ignoreCase: false }
]

The first approach is simpler but it complicates the common case of positive expectations and it would probably lead to more complicated and slower code. The second approach introduces the complexity of nested expectations but keeps the common case as it is now and the code implementing it would likely be smaller and faster.

As for the usefulness, negative reporting will make some error messages more accurate. For example, consider this grammar:

string = "'" (!"'" .)* "'"

With input ’a it currently reports “Expected "'" or any character but end of input found.”, which is slightly incorrect. Reporting “Expected "'" or something else than "'" but end of input found.” is more precise (although it sounds a bit like Vulcan logic).

In any case, if we enable reporting for &, some kind of reporting for ! is needed for consistency across both predicates.

@dmajda
Copy link
Contributor

dmajda commented Jul 15, 2016

There is a bigger problem with negative reporting than how to express negative expectations in the expected property of syntax errors. Consider this trivial grammar:

start = !"a"

Now let’s use it on input a. Given current architecture with the negative reporting flag added (toggled on the ! boundary) and silent failures for predicates discarded, "a" will be matched and the match will succeed, producing no expectations at all. But we’d like the match to produce a negative expectation, otherwise the error message and the expected property won’t be right. Reversed situation will occur with input not matching "a" — an expectation will be produced while we don’t want to have any.

To implement the desired behavior, we have to do one of two things:

  1. Check for the value of the negative reporting flag in the successful path of matching code (implementation of ACCEPT_N and ACCEPT_STRING bytecode instructions) and produce an expectation if the flag is set. Similarly, check for the flag value in the unsuccessful path (implementation of FAIL) and don’t produce an expectation if the flag is set.
  2. Track the value of the negative reporting flag statically during compilation and compile the code accordingly. Note this means two versions of the code would need to exist for any expression used both in and out of !.

Option 1 is relatively straightforward but will likely result in severe performance degradation (a need to check a flag on every match, successful or not), which makes it unviable in my eyes. Option 2 is somewhat complex to implement, but it won’t result in performance degradation compared to the current state (except when inside the ! predicate), which makes it a preferred option.

Given the relative complexity of implementation of option 2 and given that I plan to implement other static tracking code as part of performance optimization work going into 1.0.0, I’m moving this issue to 1.0.0 as well.

@dmajda dmajda modified the milestones: 1.0.0, 0.10.0 Jul 15, 2016
@Mingun
Copy link
Contributor

Mingun commented Nov 7, 2017

The problem is more difficult, than it seems. The fact that the current test set contains the wrong tests hints at it at least and this error still remained unnoticed:

  • the positive predicate test only by chance returns the correct set of expectations. However absence of any expectations from &'b' shall come not because they are discarded, and just because expression &'b' in principle does not match any input, therefore, and you should not expect from it any expectations. See more below.
  • the negative predicate test shall return following 3 expectations, however the second variant is missing, though as it is easily possible to be convinced, it is accepted by a parser. The fact that in the generator there is a bug because of which the input c is not accepted, despite assurances is unexpected however:
    • { type: "literal", text: "a", ignoreCase: false }
    • { type: "end" }
    • { type: "literal", text: "c", ignoreCase: false }

To get such results, it is necessary to remember that predicates do not move a parse position. It means that all expectations received from expression under a predicate shall be executed along with expectations which turn out from further expressions. Below analysis of test cases.


Case &'b' actually means &'b' EOF and generates 2 expectations required at the same time:

  • { type: "literal", text: "b", ignoreCase: false }
  • { type: "end" }

It is obvious that they are contradictory -- expression cannot be at the same time to correspond to the end of stream and the b character. Thus, it just is deleted from a result array of expectations.


Case !'b' turns in !'b' EOF, that gives the following expectations:

  • { not: true, type: "literal", text: "b", ignoreCase: false } (assume that we use not key to invert expected value)
  • { type: "end" }

Their combining and simplification of expression gives that the end of stream just expected. From this, in particular, follows that grammar

start = 'a' / !'b' / 'c'

is equivalent to

start = 'a' / !'b' EOF / 'c'
EOF = !.

and even grammar

start = 'a' / EOF / 'c'
EOF = !.

however only last 2 parse an input c as well as is expected.

Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
…me FAIL to EXPECT.

FAILED constant now pushed to the stack by the regular PUSH_FAILED opcode.
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Dec 20, 2017
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 5, 2018
…me FAIL to EXPECT.

FAILED constant now pushed to the stack by the regular PUSH_FAILED opcode.

Conflicts:
	lib/compiler/passes/generate-bytecode.js
	lib/compiler/passes/generate-js.js
	test/spec/unit/compiler/passes/generate-bytecode.spec.js
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 5, 2018
…tch result of expression

Conflicts:
	lib/compiler/passes/generate-bytecode.js
	test/spec/unit/compiler/passes/generate-bytecode.spec.js
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 5, 2018
…med` node

Conflicts:
	lib/compiler/passes/generate-bytecode.js
	test/spec/unit/compiler/passes/generate-bytecode.spec.js
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 5, 2018
Conflicts:
	lib/compiler/passes/generate-js.js
	test/spec/unit/compiler/passes/generate-bytecode.spec.js
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 5, 2018
Conflicts:
	test/spec/unit/compiler/passes/generate-bytecode.spec.js
Mingun added a commit to Mingun/pegjs that referenced this issue Jan 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants