New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[unicode-grant] Commit New version of GraphemeBreakTest.t #267

Merged
merged 3 commits into from May 11, 2017

Conversation

Projects
None yet
1 participant
@samcv
Member

samcv commented May 10, 2017

New script tests the contents of each grapheme individually from
the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.

Previously we only checked the total number of ‘.chars’ for the
string as a whole. Here we actually check the string length as well
as that each grapheme contains the exact correct codepoints
in the correct order and correct graphemes.

This new test uses a grammar to parse the file and generally is much more
robust than the previous script.

Running the parse class generates an array of arrays where the index
of the outer array indicates which

[[10084, 776], [9757]] would indicate the 0th grapheme is made up of
cp's 10084 and 776 and the 1st grapheme is made up cp 9757.

@samcv

This comment has been minimized.

Member

samcv commented May 10, 2017

Looks like github is trying to show a diff for a file that I deleted and put in the new one. To see the new test file go here, instead of seeing a diff against a really long file that was deleted: https://github.com/samcv/roast/blob/ce6eb28b17f3722cf93724fa768f42614d9b4d2e/S15-nfg/GraphemeBreakTest.t

Reworked it into multiple commits. Can view here: be6b376

@samcv samcv force-pushed the samcv:gcb-- branch from ce6eb28 to be6b376 May 10, 2017

samcv added some commits May 10, 2017

[unicode-grant] Commit New version of GraphemeBreakTest.t
New script tests the contents of each grapheme individually from
the GraphemeClusterBreak.txt file from the Unicode 9.0 test suite.

Previously we only checked the total number of ‘.chars’ for the
string as a whole. Here we actually check the string length as well
as that each grapheme contains the exact correct codepoints
in the correct order and correct graphemes.

This new test uses a grammar to parse the file and generally is much more
robust than the previous script.

Running the parse class generates an array of arrays where the index
of the outer array indicates which

[[10084, 776], [9757]] would indicate the 0th grapheme is made up of
cp's 10084 and 776 and the 1st grapheme is made up cp 9757.

* Add in UCD 9.0's GraphemeBreakTest.txt to
  3rdparty/Unicode/9.0.0/ucd/auxiliary/GraphemeBreakTest.txt

* Add Unicode license to 3rdparty/Unicode/LICENSE

@samcv samcv force-pushed the samcv:gcb-- branch from 829f88e to b4b72e8 May 11, 2017

@samcv samcv merged commit ad4ee6c into perl6:master May 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment