Python and Turkish Locale #41929

caglar · 2005-04-30T17:37:22Z

BPO	1193061
Nosy	@malemburg, @birkenfeld
Superseder	bpo-1528802: Turkish Character

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/malemburg'
closed_at = <Date 2007-08-30.10:14:40.410>
created_at = <Date 2005-04-30.17:37:22.000>
labels = ['expert-unicode']
title = 'Python and Turkish Locale'
updated_at = <Date 2007-08-30.10:14:40.408>
user = 'https://bugs.python.org/caglar'

bugs.python.org fields:

activity = <Date 2007-08-30.10:14:40.408>
actor = 'georg.brandl'
assignee = 'lemburg'
closed = True
closed_date = <Date 2007-08-30.10:14:40.410>
closer = 'georg.brandl'
components = ['Unicode']
creation = <Date 2005-04-30.17:37:22.000>
creator = 'caglar'
dependencies = []
files = []
hgrepos = []
issue_num = 1193061
keywords = []
message_count = 6.0
messages = ['25185', '25186', '25187', '25188', '25189', '55471']
nosy_count = 5.0
nosy_names = ['lemburg', 'georg.brandl', 'exa', 'caglar', 'usta']
pr_nums = []
priority = 'high'
resolution = 'duplicate'
stage = None
status = 'closed'
superseder = '1528802'
type = None
url = 'https://bugs.python.org/issue1193061'
versions = []

caglar · 2005-04-30T17:37:22Z

On behalf of this thread;

http://mail.python.org/pipermail/python-dev/2005-April/052968.html

As described in
http://www.i18nguy.com/unicode/turkish-i18n.html [ How
Applications Fail With Turkish Language
] , Turkish has 4 "i" in their alphabet.

Without --with-wctype-functions support Python convert
these characters locare-independent manner in
tr_TR.UTF-8 locale. So all conversitons maps to "i" or
"I" which is wrong in Turkish locale.

So if Python Developers will remove the wctype
functions from Python, then there must be a
locale-dependent upper/lower funtion to handle these
characters properly.

malemburg · 2005-05-02T08:00:58Z

Logged In: YES
user_id=38388

I'm not sure I understand: are you saying that the Unicode
mappings for upper and lower case are wrong in the standard ?

Note that removing the wctype functions will only remove the
possibility to use these functions for case mapping of
Unicode characters instead of using the builtin Unicode
character database. This was originally meant as
optimization to avoid having to load the Unicode database -
nowadays the database is always included, so the
optimization is no longer needed. Even worse: the wctype
functions sometimes behave differently than the mappings in
the Unicode database (due to differences in the Unicode
database version or implementation s).

Now, since the string .lower() and .upper() methods are
locale dependent (due to their reliance on the C functions
toupper() and tolower() - not by intent), while the Unicode
versions are not, we have a rather annoying situation where
switching from strings to Unicode cause semantic differences.

Ideally, both string and Unicode methods should do case
mapping in an locale independent way. The support for
differences in locale dependent case mapping, collation,
etc. should be moved to an external module, e.g. the locale
module.

caglar · 2005-05-02T08:45:12Z

Logged In: YES
user_id=858447

No, im not. These rules defined in
http://www.unicode.org/Public/UNIDATA/CaseFolding.txt and
http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt.
Note that there is a comments says;

# T: special case for uppercase I and dotted uppercase I
# - For non-Turkic languages, this mapping is normally
not used.
# - For Turkic languages (tr, az), this mapping can be
used instead of the normal mapping for these characters.
# Note that the Turkic mappings do not maintain
canonical equivalence without additional processing.
# See the discussions of case mapping in the Unicode
Standard for more information.

So without wctype functions support, python can't convert
these. This _is_ the problem. As a side effect of this,
another huge problem occurs, keywords can't be locale
dependent. If Python compiled with wctype support functions,
all "i".upper() turns into "0" which is wrong for keyword
comparision ( like quit v.s QU0T )

So i suggest implement two new functions like
localeAwareLower()/localeAwareUpper() for python and let
lower()/upper() locale independent. And as you wrote locale
module may be a perfect home for these :)

exa · 2005-10-11T21:36:55Z

Logged In: YES
user_id=1454

The better solution is to use an optional locale argument for
upper/lower functions and other language-dependent text
processing functions.

usta · 2006-09-30T15:58:01Z

Logged In: YES
user_id=278064

http://img147.imageshack.us/img147/3717/pythonte4.jpg
I think this photo summarize the bug which is related to
upper() in Turkish encoding.

birkenfeld · 2007-08-30T10:14:40Z

Dupe of bpo-1528802.

caglar mannequin assigned malemburg Apr 30, 2005

caglar mannequin added the topic-unicode label Apr 30, 2005

caglar mannequin assigned malemburg Apr 30, 2005

caglar mannequin added the topic-unicode label Apr 30, 2005

birkenfeld closed this as completed Aug 30, 2007

ezio-melotti transferred this issue from another repository Apr 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python and Turkish Locale #41929

Python and Turkish Locale #41929

caglar mannequin commented Apr 30, 2005

caglar mannequin commented Apr 30, 2005

malemburg commented May 2, 2005

caglar mannequin commented May 2, 2005

exa mannequin commented Oct 11, 2005

usta mannequin commented Sep 30, 2006

birkenfeld commented Aug 30, 2007

Python and Turkish Locale #41929

Python and Turkish Locale #41929

Comments

caglar mannequin commented Apr 30, 2005

caglar mannequin commented Apr 30, 2005

malemburg commented May 2, 2005

caglar mannequin commented May 2, 2005

exa mannequin commented Oct 11, 2005

usta mannequin commented Sep 30, 2006

birkenfeld commented Aug 30, 2007