Tweak existing Unicode tests, add a Unicode version test #212

cmsmcq · 2023-11-17T17:58:59Z

This pull request

Adds a dependency flag to the existing 'unicode-classes' test case, specifying that it requires Unicode 14.0 or higher.
Adds a unicode-version-check test set with a grammar and input that should produce output identifying the version of Unicode used by the ixml processor. This test set is intended to cover Unicode versions 6.0 through 15.1.
Renames the existing unicode-range1 test case in the unicode-range2 test set as unicode-range2. The existing name is not an error (test case names only have to be unique within a test set) but it seemed unnecessarily confusing.

Currently MarkupBlitz passes the new test, as does jwiXML (if the grammar is modified to work around lack of support for . as a name character); both return a result of Unicode 15.0. Coffeepot rejects the input in a way I don't understand (which may reflect an issue with the encoding of the input); ixample returns an empty result. From the mixture of successful runs and failures, it appears there may be an issue with the test input or test grammar, but I think the easiest way to find such an issue is to check the test in so more people can run it.

ndw

There seem to be a fair number of descrepancies between what the tests think the files are called and what the files are actually called:

There seems to be a general problem that the tests refer to unicode-v... where the files are named unicode.v....
Additionally some (but not all) of the versions less than 10 expect the filename to have a leading zero where they do not (e.g., unicode-v06... vs. unicode-v6...

cmsmcq · 2023-11-22T03:08:39Z

OK, I've tried to sync them correctly, but at this point of the evening I don't trust my eyes, so I'm asking for another review from you.

I'm also troubled by the fact that href-check.xsl is reporting that it cannot find unicode-classes.inp, which seems to contradict the evidence of ls.

spemberton · 2023-11-22T12:16:10Z

ixample returns an empty result. This turned out to be due to a power failure that left ixampl in a dubious state: it appeared to be running, but because of a file that should have been deleted but wasn't, was failing to include the results in the output. That notwithstanding, the testcase also surfaced a bug in ixampl that ought to be a separate test case for Earley parsers: If a number of alternatives all start with the same nonterminal at the same input position, only one of the alternatives actually processes the nonterminal, and then signals the other alternatives to restart if the nonterminal was successful. Because Earley treats alternatives in input-position order, normally all alternatives with the same leading nonterminal will have started by the time the nonterminal succeeds. However, if the leading nonterminal succeeds without consuming any input (i.e. it has an empty alternative), it can be the case that (some of) the other alternatives with that leading nonterminal have not yet started, and so miss the signal to restart, and thus hang for ever. I have to admit, I have long stared at that bit of code, and wondered if there was an edge case that would fail, and promised myself I would analyse it at some point. Well, this test case forced me to do it. Anyway, after changing the ixml to divert around this bug, ixampl returns 15.0 So, my suggestion would be to factor out the s's so that this test only tests Unicode, and add another test for the bug. By the way, I edited my earlier version of the Unicode version test in this way, but hadn't published it yet: Unicode: version. @Version: v15; v14; v13; v12; pre-v12.

…

-v15: Lo, Lo, Lo, Lo, +"15". -v14: Cn, Lo, Lo, Lo, +"14". -v13: Cn, Cn, Lo, Lo, +"13". -v12: Cn, Cn, Cn, Lo, +"12". -pre-v12: Cn, Cn, Cn, Cn, +"pre 12".

-Lo: -[Lo], -#a. -Cn: -[Cn], -#a. Thus producing the output: <Unicode version='15'/> Steven

cmsmcq · 2023-11-22T15:50:47Z

The commit labeled Add tests for shared nullable prefixes, adjust unicode version test adds a separate test set for shared nullable prefixes and changes s in the unicode-version-diagnostic grammar to be non-nullable.

spemberton · 2023-11-22T16:06:05Z

Steven Pemberton ***@***.***> writes:

That notwithstanding, the testcase also surfaced a bug in ixampl that ought to be a separate test case for Earley parsers: ... So, my suggestion would be to factor out the s's so that this test only tests Unicode, and add another test for the bug.

I'm happy to add a test for multiple right-hand sides with the same leading nullable nonterminal. I'm not so sure about reformulating the Unicode version test to eliminate its use of whitespace. To be honest, I'm not very happy that there is not whitespace between the probe characters -- mixing writing systems within blank-delimited tokens bothers me. But perhaps just changing the * to + in the definition of 's' will do the trick? Michael

…

-- C. M. Sperberg-McQueen Black Mesa Technologies LLC http://blackmesatech.com

Tweak existing Unicode tests, add a Unicode version test

64714d2

cmsmcq requested review from ndw, johnlumley and spemberton November 17, 2023 17:58

ndw requested changes Nov 21, 2023

View reviewed changes

Correct file names for unicode version diagnostic test

8c5ceac

cmsmcq requested a review from ndw November 22, 2023 03:06

Add tests for shared nullable prefixes, adjust unicode version test

bef5796

Merge branch 'master' into more-metadata

643d240

cmsmcq merged commit 942c016 into invisibleXML:master Nov 28, 2023
2 checks passed

cmsmcq deleted the more-metadata branch November 28, 2023 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tweak existing Unicode tests, add a Unicode version test #212

Tweak existing Unicode tests, add a Unicode version test #212

cmsmcq commented Nov 17, 2023

ndw left a comment

cmsmcq commented Nov 22, 2023

spemberton commented Nov 22, 2023 via email

cmsmcq commented Nov 22, 2023

spemberton commented Nov 22, 2023 via email

Tweak existing Unicode tests, add a Unicode version test #212

Tweak existing Unicode tests, add a Unicode version test #212

Conversation

cmsmcq commented Nov 17, 2023

ndw left a comment

Choose a reason for hiding this comment

cmsmcq commented Nov 22, 2023

spemberton commented Nov 22, 2023 via email

cmsmcq commented Nov 22, 2023

spemberton commented Nov 22, 2023 via email