Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misleading reflective behaviour due to PEP 3131 NFKC identifiers normalization. #76664

Closed
Iago-lito mannequin opened this issue Jan 2, 2018 · 3 comments
Closed

Misleading reflective behaviour due to PEP 3131 NFKC identifiers normalization. #76664

Iago-lito mannequin opened this issue Jan 2, 2018 · 3 comments
Labels
topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@Iago-lito
Copy link
Mannequin

Iago-lito mannequin commented Jan 2, 2018

BPO 32483
Nosy @vstinner, @benjaminp, @ezio-melotti
Superseder
  • bpo-13793: hasattr, delattr, getattr fail with unnormalized names
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-01-02.17:52:57.249>
    created_at = <Date 2018-01-02.16:07:22.961>
    labels = ['type-bug', 'expert-unicode']
    title = 'Misleading reflective behaviour due to PEP 3131 NFKC identifiers normalization.'
    updated_at = <Date 2018-01-02.17:52:57.247>
    user = 'https://bugs.python.org/Iago-lito'

    bugs.python.org fields:

    activity = <Date 2018-01-02.17:52:57.247>
    actor = 'benjamin.peterson'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-01-02.17:52:57.249>
    closer = 'benjamin.peterson'
    components = ['Unicode']
    creation = <Date 2018-01-02.16:07:22.961>
    creator = 'Iago-lito -'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 32483
    keywords = []
    message_count = 3.0
    messages = ['309382', '309383', '309389']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'benjamin.peterson', 'ezio.melotti', 'Iago-lito -']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '13793'
    type = 'behavior'
    url = 'https://bugs.python.org/issue32483'
    versions = ['Python 3.5']

    @Iago-lito
    Copy link
    Mannequin Author

    Iago-lito mannequin commented Jan 2, 2018

    Consistent with PEP-3131 and NFKC normalization of identifiers, these two last lines yield an error, since 𝜏 (U+1D70F) is automatically converted to τ (U+03C4).

        class Base(object):
            def __init__(self):
                self.𝜏 = 5 # defined with U+1D70F
    
        a = Base()
        print(a.𝜏)     # 5             # (U+1D70F) expected and intuitive
        print(a.τ)     # 5 as well     # (U+03C4)  normalized version, okay.
        d = a.__dict__ # {'τ':  5}     # (U+03C4)  still normalized version
        print(d['τ'])  # 5             # (U+03C4)  consistent with normalization
        assert hasattr(a, 'τ')         # (U+03C4)  consistent with normalization
        # But if I want to retrieve it the way I entered it because I can type (U+1D70F)
        print(d['𝜏'])  # KeyError: '𝜏' # (U+1D70F) counterintuitive
        assert hasattr(a, '𝜏') # Fails # (U+1D70F) counterintuitive

    I've described and undestood the problem in this post.

    Nothing is unconsistent here. However, I am worried that:

    • this behaviour might be counterintuitive and misleading, especially if it occurs that the character user can easily enter for some reason (e.g. U+1D70F) is not equivalent to its NFKC normalization (e.g. U+03C4)

    • this behaviours makes it more difficult to enjoy python's reflective __dict__, hasattr and getattr features in this particular case.

    Maybe it is user's responsibility to be aware of this limitation, and to keep considering utf-8 coding a bad practice. In this case, maybe this particular reflective limitation could be made explicit in PEP-3131.

    Or maybe it is python's responsibility to ensure intuitive and consistent behaviour even in tricky-unicode-cases. So reflective features like __dict__.__getitem__, hasattr or getattr would NFKC-convert their arguments before searching just like a.𝜏 does, so that:

    getattr(a, '𝜏') is gettatr(a, 'τ')
    

    always yields True.

    I actually have no idea of the philosophy to stick to. And the only purpose of this post is to inform the community about this particular, low-priority case.

    Thank you for supporting Python anyway, cheers for your patience.. and happy 2018 to everyone :)

    --
    Iago-lito

    @Iago-lito Iago-lito mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jan 2, 2018
    @Iago-lito
    Copy link
    Mannequin Author

    Iago-lito mannequin commented Jan 2, 2018

    I just found out about this very close issue. Much of the philosophy has been made very clear there.

    Since the solution to bpo-13793 is to *document* much this NFKC normalization. Then I think I'd be a good thing to make an explicit statement about these particular reflective limitations in PEP-3131 :)

    @benjaminp
    Copy link
    Contributor

    We don't generally update finalized PEPs. The official documentation for a feature is in the Python docs. Feel free to propose a PR if you think it could be improved.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant