Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Falsified "CI Text" test #7

Open
felixonmars opened this issue Feb 25, 2020 · 11 comments
Open

Falsified "CI Text" test #7

felixonmars opened this issue Feb 25, 2020 · 11 comments

Comments

@felixonmars
Copy link
Contributor

  CI Text:            FAIL
    *** Failed! Falsified (after 77 tests and 6 shrinks):
    "\43929"
    "\43929" /= "\5065"
    Use --quickcheck-replay=93264 to reproduce.
@phadej
Copy link
Collaborator

phadej commented Feb 25, 2020

Which GHC this is (the toUpper and friends change based on Unicode spec and different base use different ones)

@phadej
Copy link
Collaborator

phadej commented Feb 25, 2020

Ok, I see:

-- on GHC-8.6
Prelude Data.Char> generalCategory '\43929'
NotAssigned

-- on GHC-8.8
Prelude Data.Char> generalCategory '\43929'
LowercaseLetter

This will be fixed when https://github.com/nick8325/quickcheck/pull/274/files change makes it to Hackage.

@felixonmars
Copy link
Contributor Author

It's GHC 8.8.2 in case it still matters :P

@phadej
Copy link
Collaborator

phadej commented Feb 25, 2020

And it seems that behavior is "wrong" in Text:

Data.Text.Lazy> toCaseFold  "\43929" :: Text
"\5065"

I guess what happens for unassigned characters is undefined, so I don't know what should/shouldn't happen (one could agree that doing nothing is what should happen)

@phadej
Copy link
Collaborator

phadej commented Feb 25, 2020

@phadej
Copy link
Collaborator

phadej commented Feb 25, 2020

Ok, found an issue:

Cherokee letters should fold to upper case, but now the don't converge:

Prelude Data.Char Data.Text> toCaseFold "\43929" 
"\5065"
Prelude Data.Char Data.Text> toCaseFold "\5065"
"\43929"

The docs say:

toCaseFold :: Text -> Text

O(n) Convert a string to folded case. Subject to fusion.

This function is mainly useful for performing caseless (also known as case insensitive) string comparisons.

A string x is a caseless match for a string y if and only if:

toCaseFold x == toCaseFold y

https://unicode.org/faq/casemap_charprop.html

Says

Q: What happens if the uppercase letter is the one that is already encoded?
A: That situation is more complicated. When the existing encoded letter is an uppercase letter and the proposal is to encode a new lowercase letter case pair for it, that is normally disallowed. The case folding for the existing uppercase letter would change, and that is blocked by the requirement for case folding stability. In exceptional situations, if a lowercase letter must be added, it would need to be case-folded to the existing uppercase letter, rather than changing the case folding for that existing letter. Such an exceptional situation did, in fact, apply for the addition of Cherokee lowercase syllables in Version 8.0. Cherokee case folding rules were specified to map to the old uppercase syllables, to preserve case folding stability for them.


Could you report this to text repository? Something wrong is happening with cherokee stuff: Folding of cherokee letters is not correct.

@felixonmars
Copy link
Contributor Author

@phadej Sure. Thanks for investigating!

@felixonmars
Copy link
Contributor Author

  CI Text:            FAIL
    *** Failed! Falsified (after 72 tests and 6 shrinks):
    "\5115"
    "\5115" /= "\5107"
    Use --quickcheck-replay=960636 to reproduce.

There is yet another CI Text failure with tasty-golden 2.3.3.2, should be a different thing though.

@phadej
Copy link
Collaborator

phadej commented Apr 28, 2020

It is the same issue:

Unicode Character 'CHEROKEE SMALL LETTER YU' (U+13FB) -- 5115

@felixonmars
Copy link
Contributor Author

Hrm, weird that it's only triggered after a tasty-golden update though. Sorry for the noise.

lyokha added a commit to lyokha/binary-instances that referenced this issue Jul 28, 2020
The foldedCase is derivative of the original string, while the latter
is unique. So, putting foldedCase instead of original string leads to
loss of data, while getting from the original string must always restore
CI object correctly as it has been built from the original string.

Probably it should also fix weird cases like in issue haskellari#7.

My own case was restoring serialized ResponseHeaders from
Network.HTTP.Types. Say, putting Custom-Header and restoring it later
makes it custom-header (when I take it with original), while when
putting original string it restores correctly as Custom-Header (when I
take it with original).
@trofi
Copy link

trofi commented Aug 20, 2020

This text bug makes binary-instances tests non-deterministic: sometimes they pass, sometimes they fail.

I suggest making tests always fail by adding counterexample explicitly. Or always pass by filtering out known problematic cases.

DanBurton added a commit to commercialhaskell/stackage that referenced this issue Feb 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants