-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweak existing Unicode tests, add a Unicode version test #212
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be a fair number of descrepancies between what the tests think the files are called and what the files are actually called:
- There seems to be a general problem that the tests refer to
unicode-v...
where the files are namedunicode.v...
. - Additionally some (but not all) of the versions less than 10 expect the filename to have a leading zero where they do not (e.g.,
unicode-v06...
vs.unicode-v6...
OK, I've tried to sync them correctly, but at this point of the evening I don't trust my eyes, so I'm asking for another review from you. I'm also troubled by the fact that href-check.xsl is reporting that it cannot find unicode-classes.inp, which seems to contradict the evidence of |
ixample returns an empty result.
This turned out to be due to a power failure that left ixampl in a dubious
state: it appeared to be running, but because of a file that should have
been deleted but wasn't, was failing to include the results in the output.
That notwithstanding, the testcase also surfaced a bug in ixampl that ought
to be a separate test case for Earley parsers:
If a number of alternatives all start with the same nonterminal at the
same input position, only one of the alternatives actually processes the
nonterminal, and then signals the other alternatives to restart if the
nonterminal was successful.
Because Earley treats alternatives in input-position order, normally all
alternatives with the same leading nonterminal will have started by the
time the nonterminal succeeds.
However, if the leading nonterminal succeeds without consuming any input
(i.e. it has an empty alternative), it can be the case that (some of) the
other alternatives with that leading nonterminal have not yet started, and
so miss the signal to restart, and thus hang for ever.
I have to admit, I have long stared at that bit of code, and wondered if
there was an edge case that would fail, and promised myself I would analyse
it at some point. Well, this test case forced me to do it.
Anyway, after changing the ixml to divert around this bug, ixampl returns
15.0
So, my suggestion would be to factor out the s's so that this test only
tests Unicode, and add another test for the bug.
By the way, I edited my earlier version of the Unicode version test in this
way, but hadn't published it yet:
Unicode: version.
@Version: v15; v14; v13; v12; pre-v12.
…-v15: Lo, Lo, Lo, Lo, +"15".
-v14: Cn, Lo, Lo, Lo, +"14".
-v13: Cn, Cn, Lo, Lo, +"13".
-v12: Cn, Cn, Cn, Lo, +"12".
-pre-v12: Cn, Cn, Cn, Cn, +"pre 12".
-Lo: -[Lo], -#a.
-Cn: -[Cn], -#a.
Thus producing the output:
<Unicode version='15'/>
Steven
|
The commit labeled Add tests for shared nullable prefixes, adjust unicode version test adds a separate test set for shared nullable prefixes and changes s in the unicode-version-diagnostic grammar to be non-nullable. |
Steven Pemberton ***@***.***> writes:
That notwithstanding, the testcase also surfaced a bug in ixampl that
ought to be a separate test case for Earley parsers:
...
So, my suggestion would be to factor out the s's so that this test
only tests Unicode, and add another test for the bug.
I'm happy to add a test for multiple right-hand sides with the same
leading nullable nonterminal.
I'm not so sure about reformulating the Unicode version test to
eliminate its use of whitespace. To be honest, I'm not very happy that
there is not whitespace between the probe characters -- mixing writing
systems within blank-delimited tokens bothers me.
But perhaps just changing the * to + in the definition of 's' will do
the trick?
Michael
…--
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
http://blackmesatech.com
|
This pull request
Currently MarkupBlitz passes the new test, as does jwiXML (if the grammar is modified to work around lack of support for . as a name character); both return a result of Unicode 15.0. Coffeepot rejects the input in a way I don't understand (which may reflect an issue with the encoding of the input); ixample returns an empty result. From the mixture of successful runs and failures, it appears there may be an issue with the test input or test grammar, but I think the easiest way to find such an issue is to check the test in so more people can run it.