Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggested changes for the collator based sort #2

Merged
merged 3 commits into from Aug 16, 2020

Conversation

splitbrain
Copy link

second try, after I messed up the branch for #1

The proper use of data providers now make it much easier to add
addtional languages to the test
Now the use of the intl extension can be turned off, allowing for easy
testing of the fallback. The test now inherits from the collator test so
we avoid too much duplicate code
Copy link
Owner

@moisesbr-dw moisesbr-dw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! Only some missing details...

inc/Utf8/Sort.php Outdated Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
_test/tests/inc/sort_with_collator.test.php Outdated Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
Copy link
Owner

@moisesbr-dw moisesbr-dw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent job! Please keep the comments as indicated, they will guide anyone interested in providing new examples in other languages.

inc/Utf8/Sort.php Show resolved Hide resolved
inc/Utf8/Sort.php Show resolved Hide resolved
@moisesbr-dw
Copy link
Owner

@splitbrain thanks for your suggestions. All in all, everything is OK except minor details. The most important thing missing is to substitute Sort::xxx for intl_xxx in files outside the inc folder.

I have revised everything. This way, if you are busy with other things, I can accept the changes and make the appointed corrections later.

By the way, it would be interesting to use the intl extension in other contexts -- date format, for example.

@splitbrain
Copy link
Author

splitbrain commented Aug 15, 2020

If you have all the changes ready to be applied, please go ahead. Thanks for double checking my stuff. Once this is done, we can merge it into code.

@moisesbr-dw moisesbr-dw merged commit ec16ab5 into moisesbr-dw:sort-with-collator Aug 16, 2020
@moisesbr-dw
Copy link
Owner

@splitbrain I think you'll like to know about the Co-official statuses of German in Brazil. Just found that on Wikipedia.

@moisesbr-dw
Copy link
Owner

moisesbr-dw commented Aug 16, 2020

@splitbrain I tried the collation "an äh b c d e f g h i j k l m n os öl p q r so ßa t um üb v w x y z" but the tests failed for the umlaut pairs.
It seems that a = ä, o = ö, u = ü and s < ß in the German collation. Is it true?

@moisesbr-dw
Copy link
Owner

It seems that a = ä, o = ö, u = ü and s < ß in the German collation. Is it true?

I have commented the test code so it is utterly clear:

        // German
        // vowels with umlaut come after (thus "a" < "ä"), but are equivalent
        // in collation (thus "äpfel" < "arzt"), so a 2nd letter breaks the test
        'de' => 'a ä b c d e f g h i j k l m n o ö p q r so ßa t u ü v w x y z',

@splitbrain
Copy link
Author

https://de.wikipedia.org/wiki/Deutsches_Alphabet#Alphabetische_Sortierung has some Details (I didn't know about): According to DIN 5007:1991, there are two variants:

  1. Dictionary-Sorting (used for common words): Ä,Ö,Ü are treated exactly as A,O,U. ß is treated as ss
  2. Phonbook-Sorting (used for people's names): Ä, Ö, Ü are treated as Ae, Oe. Ue

I guess variant 1 makes sense for us. In general Germans will not care at all and are fine even with the words sorted at the end.

@moisesbr-dw
Copy link
Owner

https://de.wikipedia.org/wiki/Deutsches_Alphabet#Alphabetische_Sortierung has some Details (I didn't know about): According to DIN 5007:1991, there are two variants:

English Wikipedia says the same.

ß is treated as ss

Yes, "so ßa" works, "sz ßa" fails. I'll update the comment in the test code.

I guess variant 1 makes sense for us.

The collator uses this variant. Actually, the default collator is used for German.

In general Germans will not care at all and are fine even with the words sorted at the end.

Yeah, but Spaniards would cry if their beloved "ñ" is treated like a normal "n" or put after "z" :-)

@moisesbr-dw
Copy link
Owner

OK, I've updated the comment and the collation.

        // German
        // vowels with umlaut come after (thus "a" < "ä"), but are equivalent
        // in collation (thus "äpfel" < "arzt"), so a 2nd letter breaks the test;
        // "ß" comes after "s", but is equivalent to "ss" in collation
        'de' => 'a ä b c d e f g h i j k l m n o ö p q r sr ß st t u ü v w x y z',

moisesbr-dw added a commit that referenced this pull request Aug 16, 2020
Many minor details and use of Sort::xyz() instead of intl_xyz() in files outside the "inc" folder.
splitbrain added a commit to ssahara/dokuwiki that referenced this pull request Sep 10, 2020
* master: (111 commits)
  Update translation
  translation update
  don't crush tables too narrow. fixes dokuwiki#3250
  translation update
  Thorough tests for EO, DE, PT and ES
  translation update
  Optimized pageRestoreConfirm function
  Tests for Portuguese and Spanish
  Changes according to revisions in moisesbr-dw#2
  adjust callstack depth for deprecation message further
  better deprecation messages for self required plugin base files
  don't test on old PHP releases anymore
  increase minimum PHP version to 7.2
  fixed tests for cleanID and romanization for Greeklish
  Improved the transliteration from greek to latin.
  extension cli: do not try to upgrade bundled plugins
  Public access to patterns in external link parser
  test the collator fallback always
  cleanup for collator tests
  wrap sorting functions into their own class
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants