Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error message when comparing same string in NFC and NFD forms is not clear on Python 3 #3426

Closed
pekkaklarck opened this issue Apr 24, 2018 · 8 comments · Fixed by #10355
Closed
Labels
status: easy easy issue that is friendly to new contributor topic: reporting related to terminal output and user-facing messages and errors topic: rewrite related to the assertion rewrite mechanism

Comments

@pekkaklarck
Copy link

If I compare same string in different Unicode normal forms the error message doesn't show any differences between them when using Python 3. For example, if I have a test like

def test_nfc_nfd():
    nfc = 'hyv\xe4'
    nfd = 'hyva\u0308'
    assert nfc == nfd

the result I got is

>       assert nfc == nfd
E       AssertionError: assert 'hyvä' == 'hyvä'
E         - hyvä
E         + hyvä

which isn't very informative.

The problem is caused by repr('hyva\u0308') being 'hyva\u0308' which is rendered as 'hyvä'. I submitted an issue about repr() escaping combining characters (i.e. turning the result to 'hyva\\u0308') but it was closed as invalid.

In PyTest it would be possible to use ascii() instead of repr(), but that would make all non-ASCII strings unreadable and that would be worse. I guess the best solution would be using ascii() if strings look the same, but I have no idea how to detect that.

@pytestbot pytestbot added the platform: mac mac platform-specific problem label Apr 24, 2018
@pekkaklarck
Copy link
Author

Cool bot but I don't think any of the referenced issues is related. os: mac label is somewhat appropriate as the OSX file system uses NFD and is probably the most common source of Unicode strings in NFD form. The problem itself is OS agnostic.

@RonnyPfannschmidt RonnyPfannschmidt added topic: reporting related to terminal output and user-facing messages and errors status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity and removed platform: mac mac platform-specific problem labels Apr 24, 2018
@pytest-dev pytest-dev deleted a comment from pytestbot Apr 24, 2018
@RonnyPfannschmidt
Copy link
Member

its not clear what to do here, but i do indeed wonder if we should have an heuristics that adds a diff comment line when the difference character is equal when normalized

@Zac-HD Zac-HD removed the status: needs information reporter needs to provide more information; can be closed after 2 or more weeks of inactivity label Oct 18, 2018
@Zac-HD
Copy link
Member

Zac-HD commented May 10, 2020

I guess the best solution would be using ascii() if strings look the same, but I have no idea how to detect that.

Check if the strings are different, but the NFKD forms are the same - see wikipedia on Unicode equivalence for details and unicodedata.normalize() for the function to use.

@Zac-HD Zac-HD added the topic: rewrite related to the assertion rewrite mechanism label Dec 28, 2020
@dnstone
Copy link

dnstone commented May 16, 2021

Hi @Zac-HD, I talked with you about this issue as part of pycon mentored sprint. Is it okay if I take over this issue?

@Zac-HD
Copy link
Member

Zac-HD commented May 16, 2021

Go for it!

@Zac-HD Zac-HD added the status: easy easy issue that is friendly to new contributor label Sep 27, 2022
@itxasos23
Copy link
Contributor

Hey all!
Is it ok if I try to tackle this as part of Hacktoberfest?

@Zac-HD
Copy link
Member

Zac-HD commented Oct 6, 2022

Absolutely!

@pekkaklarck
Copy link
Author

Great to see this fixed! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: easy easy issue that is friendly to new contributor topic: reporting related to terminal output and user-facing messages and errors topic: rewrite related to the assertion rewrite mechanism
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants