Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python and Turkish Locale #41929

Closed
caglar mannequin opened this issue Apr 30, 2005 · 6 comments
Closed

Python and Turkish Locale #41929

caglar mannequin opened this issue Apr 30, 2005 · 6 comments
Assignees

Comments

@caglar
Copy link
Mannequin

caglar mannequin commented Apr 30, 2005

BPO 1193061
Nosy @malemburg, @birkenfeld
Superseder
  • bpo-1528802: Turkish Character
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/malemburg'
    closed_at = <Date 2007-08-30.10:14:40.410>
    created_at = <Date 2005-04-30.17:37:22.000>
    labels = ['expert-unicode']
    title = 'Python and Turkish Locale'
    updated_at = <Date 2007-08-30.10:14:40.408>
    user = 'https://bugs.python.org/caglar'

    bugs.python.org fields:

    activity = <Date 2007-08-30.10:14:40.408>
    actor = 'georg.brandl'
    assignee = 'lemburg'
    closed = True
    closed_date = <Date 2007-08-30.10:14:40.410>
    closer = 'georg.brandl'
    components = ['Unicode']
    creation = <Date 2005-04-30.17:37:22.000>
    creator = 'caglar'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 1193061
    keywords = []
    message_count = 6.0
    messages = ['25185', '25186', '25187', '25188', '25189', '55471']
    nosy_count = 5.0
    nosy_names = ['lemburg', 'georg.brandl', 'exa', 'caglar', 'usta']
    pr_nums = []
    priority = 'high'
    resolution = 'duplicate'
    stage = None
    status = 'closed'
    superseder = '1528802'
    type = None
    url = 'https://bugs.python.org/issue1193061'
    versions = []

    @caglar
    Copy link
    Mannequin Author

    caglar mannequin commented Apr 30, 2005

    On behalf of this thread;

    http://mail.python.org/pipermail/python-dev/2005-April/052968.html

    As described in
    http://www.i18nguy.com/unicode/turkish-i18n.html [ How
    Applications Fail With Turkish Language
    ] , Turkish has 4 "i" in their alphabet.

    Without --with-wctype-functions support Python convert
    these characters locare-independent manner in
    tr_TR.UTF-8 locale. So all conversitons maps to "i" or
    "I" which is wrong in Turkish locale.

    So if Python Developers will remove the wctype
    functions from Python, then there must be a
    locale-dependent upper/lower funtion to handle these
    characters properly.

    @caglar caglar mannequin assigned malemburg Apr 30, 2005
    @caglar caglar mannequin added the topic-unicode label Apr 30, 2005
    @caglar caglar mannequin assigned malemburg Apr 30, 2005
    @caglar caglar mannequin added the topic-unicode label Apr 30, 2005
    @malemburg
    Copy link
    Member

    Logged In: YES
    user_id=38388

    I'm not sure I understand: are you saying that the Unicode
    mappings for upper and lower case are wrong in the standard ?

    Note that removing the wctype functions will only remove the
    possibility to use these functions for case mapping of
    Unicode characters instead of using the builtin Unicode
    character database. This was originally meant as
    optimization to avoid having to load the Unicode database -
    nowadays the database is always included, so the
    optimization is no longer needed. Even worse: the wctype
    functions sometimes behave differently than the mappings in
    the Unicode database (due to differences in the Unicode
    database version or implementation s).

    Now, since the string .lower() and .upper() methods are
    locale dependent (due to their reliance on the C functions
    toupper() and tolower() - not by intent), while the Unicode
    versions are not, we have a rather annoying situation where
    switching from strings to Unicode cause semantic differences.

    Ideally, both string and Unicode methods should do case
    mapping in an locale independent way. The support for
    differences in locale dependent case mapping, collation,
    etc. should be moved to an external module, e.g. the locale
    module.

    @caglar
    Copy link
    Mannequin Author

    caglar mannequin commented May 2, 2005

    Logged In: YES
    user_id=858447

    No, im not. These rules defined in
    http://www.unicode.org/Public/UNIDATA/CaseFolding.txt and
    http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt.
    Note that there is a comments says;

    # T: special case for uppercase I and dotted uppercase I
    # - For non-Turkic languages, this mapping is normally
    not used.
    # - For Turkic languages (tr, az), this mapping can be
    used instead of the normal mapping for these characters.
    # Note that the Turkic mappings do not maintain
    canonical equivalence without additional processing.
    # See the discussions of case mapping in the Unicode
    Standard for more information.

    So without wctype functions support, python can't convert
    these. This _is_ the problem. As a side effect of this,
    another huge problem occurs, keywords can't be locale
    dependent. If Python compiled with wctype support functions,
    all "i".upper() turns into "0" which is wrong for keyword
    comparision ( like quit v.s QU0T )

    So i suggest implement two new functions like
    localeAwareLower()/localeAwareUpper() for python and let
    lower()/upper() locale independent. And as you wrote locale
    module may be a perfect home for these :)

    @exa
    Copy link
    Mannequin

    exa mannequin commented Oct 11, 2005

    Logged In: YES
    user_id=1454

    The better solution is to use an optional locale argument for
    upper/lower functions and other language-dependent text
    processing functions.

    @usta
    Copy link
    Mannequin

    usta mannequin commented Sep 30, 2006

    Logged In: YES
    user_id=278064

    http://img147.imageshack.us/img147/3717/pythonte4.jpg
    I think this photo summarize the bug which is related to
    upper() in Turkish encoding.

    @birkenfeld
    Copy link
    Member

    Dupe of bpo-1528802.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants