Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of a "character" is wrong #44150

Closed
Rhamphoryncus mannequin opened this issue Oct 20, 2006 · 9 comments
Closed

Definition of a "character" is wrong #44150

Rhamphoryncus mannequin opened this issue Oct 20, 2006 · 9 comments
Labels
docs Documentation in the Doc dir topic-unicode type-feature A feature request or enhancement

Comments

@Rhamphoryncus
Copy link
Mannequin

Rhamphoryncus mannequin commented Oct 20, 2006

BPO 1581182
Nosy @malemburg, @loewis, @birkenfeld, @devdanzin, @ezio-melotti
Superseder
  • bpo-20906: Issues in Unicode HOWTO
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-07-06.19:35:50.292>
    created_at = <Date 2006-10-20.10:13:07.000>
    labels = ['type-feature', 'expert-unicode', 'docs']
    title = 'Definition of a "character" is wrong'
    updated_at = <Date 2014-07-06.19:35:50.291>
    user = 'https://bugs.python.org/Rhamphoryncus'

    bugs.python.org fields:

    activity = <Date 2014-07-06.19:35:50.291>
    actor = 'ezio.melotti'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2014-07-06.19:35:50.292>
    closer = 'ezio.melotti'
    components = ['Documentation', 'Unicode']
    creation = <Date 2006-10-20.10:13:07.000>
    creator = 'Rhamphoryncus'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 1581182
    keywords = []
    message_count = 9.0
    messages = ['61023', '61024', '61025', '61026', '61027', '84524', '84554', '112466', '214532']
    nosy_count = 8.0
    nosy_names = ['lemburg', 'loewis', 'georg.brandl', 'Rhamphoryncus', 'ajaksu2', 'ezio.melotti', 'docs@python', 'BreamoreBoy']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '20906'
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1581182'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented Oct 20, 2006

    Python's definition of a character does not match that
    of Unicode. Python's documentation should, at a
    minimum, explain how python definition compares to
    Unicode's definition of a code unit, code point, glyph,
    grapheme cluster, or character.

    Unicode's definition of a character can be found here:
    http://unicode.org/reports/tr17/

    Python seems to use the Code Units option given here:
    http://www.unicode.org/faq/char_combmark.html#7

    @rhamphoryncushistoric rhamphoryncushistoric mannequin added docs Documentation in the Doc dir labels Oct 20, 2006
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 20, 2006

    Logged In: YES
    user_id=21627

    The Python string type is not at all Unicode compliant, so I
    don't see a need to use Unicode terminology to explain it.

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented Oct 20, 2006

    Logged In: YES
    user_id=12364

    Sorry, I wasn't clear. I only intended this to be about the
    unicode type.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 21, 2006

    Logged In: YES
    user_id=21627

    Ok. Can you come up with a patch?

    @Rhamphoryncus
    Copy link
    Mannequin Author

    Rhamphoryncus mannequin commented Oct 21, 2006

    Logged In: YES
    user_id=12364

    Not at the moment.

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Mar 30, 2009

    Anyone brave enough can find the mentioned definitions in the thread
    below. Reading all of it is necessary, as there are some contradictory
    quotes and interpretations before an agreement is (sort of) achieved.

    http://mail.python.org/pipermail/python-dev/2008-July/080886.html

    @devdanzin devdanzin mannequin added the topic-unicode label Mar 30, 2009
    @devdanzin devdanzin mannequin assigned birkenfeld Mar 30, 2009
    @devdanzin devdanzin mannequin added type-feature A feature request or enhancement topic-unicode labels Mar 30, 2009
    @devdanzin devdanzin mannequin assigned birkenfeld Mar 30, 2009
    @devdanzin devdanzin mannequin added the type-feature A feature request or enhancement label Mar 30, 2009
    @malemburg
    Copy link
    Member

    See this talk for an explanation of the various Unicode terms and how
    they map to Python's implementation:

    http://www.egenix.com/library/presentations/#PythonAndUnicode

    Also note that the Unicode standard has evolved a lot since Unicode
    support was added to Python in late 1999. Some terms used in Python
    differ from those used in Unicode 5.0 or have been defined in more
    strict ways than were common at the time.

    And finally: don't forget that Python provides ways of *working* with
    Unicode, i.e. it does not guarantee that a Python Unicode string always
    contains all code points required for e.g. UTF-16. It is well possible
    to store lone surrogates and invalid or unassigned code points in a
    Python Unicode string.

    @malemburg
    Copy link
    Member

    Without patch, I don't see how this issue can be moved forward.

    Adding a list of such Unicode term definitions would at best cause additional confusion and only address people knowledgable in the Unicode field.

    Note that Python's use of code units and code points matches those of the Unicode standard in most respects. Glyphs and all higher-level definitions are out-of-scope for Python.

    @malemburg malemburg added stale Stale PR or inactive for long period of time. labels Aug 2, 2010
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Mar 23, 2014

    Can this be tied in with the work being done on the unicode howto bpo-20906?

    @ezio-melotti ezio-melotti removed the stale Stale PR or inactive for long period of time. label Jul 6, 2014
    @ezio-melotti ezio-melotti removed the stale Stale PR or inactive for long period of time. label Jul 6, 2014
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    docs Documentation in the Doc dir topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants