test_re is failing when local is set for `en_IN` #73757

ultimatecoder · 2017-02-15T17:29:11Z

BPO	29571
Nosy	@warsaw, @doko42, @pfmoore, @ncoghlan, @vstinner, @tjguk, @benjaminp, @ezio-melotti, @zware, @serhiy-storchaka, @zooba, @ultimatecoder, @Naman-Bhalla, @tirkarthi
PRs	bpo-29571: Use correct locale encoding in test_re #149 [3.6] bpo-29571: Use correct locale encoding in test_re (#149) #153 [3.5] bpo-29571: Use correct locale encoding in test_re (#149) #154 update locale aliases for glibc 2.24 #422 Revert "bpo-29571: Use correct locale encoding in test_re (#149)" #554 Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) #555 Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) #556 bpo-29571 - test_re needs en_US.iso88591 locale check in test_locale_flag() #2686 bpo-29571: Fix test_re.test_locale_flag() #12099 [3.7] bpo-29571: Fix test_re.test_locale_flag() (GH-12099) #12108 [3.7] bpo-29571: Fix test_re.test_locale_flag() #12178
Files	test_re_locale_flag.patch loc.py loc.log _testcapi.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-03-05.15:26:32.126>
created_at = <Date 2017-02-15.17:29:11.116>
labels = ['expert-regex', 'type-bug', '3.8', '3.7', 'tests', 'OS-windows']
title = 'test_re is failing when local is set for `en_IN`'
updated_at = <Date 2019-03-05.15:26:32.125>
user = 'https://github.com/ultimatecoder'

bugs.python.org fields:

activity = <Date 2019-03-05.15:26:32.125>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2019-03-05.15:26:32.126>
closer = 'vstinner'
components = ['Regular Expressions', 'Tests', 'Windows']
creation = <Date 2017-02-15.17:29:11.116>
creator = 'jaysinh.shukla'
dependencies = []
files = ['46641', '48187', '48188', '48189']
hgrepos = []
issue_num = 29571
keywords = ['patch']
message_count = 37.0
messages = ['287867', '287879', '287880', '287882', '287893', '287894', '287933', '288056', '288065', '288068', '288069', '288101', '288102', '289054', '289071', '289072', '289073', '289074', '289075', '289076', '289077', '289118', '290268', '290269', '290272', '310947', '310948', '311074', '311076', '336826', '336855', '336877', '336878', '336883', '337185', '337200', '337204']
nosy_count = 15.0
nosy_names = ['barry', 'doko', 'paul.moore', 'ncoghlan', 'vstinner', 'tim.golden', 'benjamin.peterson', 'ezio.melotti', 'mrabarnett', 'zach.ware', 'serhiy.storchaka', 'steve.dower', 'jaysinh.shukla', 'Naman-Bhalla', 'xtreak']
pr_nums = ['149', '153', '154', '422', '554', '555', '556', '2686', '12099', '12108', '12178']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue29571'
versions = ['Python 3.7', 'Python 3.8']

ultimatecoder · 2017-02-15T17:29:10Z

Description:
A test case is failing while running ./python -m test -v test_re.

Traceback:
$>./python -m test -v test_re
== CPython 3.7.0a0 (default, Feb 15 2017, 22:28:32) [GCC 5.4.0 20160609]
== Linux-4.4.0-62-generic-x86_64-with-debian-stretch-sid little-endian
== hash algorithm: siphash24 64bit
== cwd: /home/bigj/Jaysinh/cpython_git/cpython/build/test_python_613
== encodings: locale=UTF-8, FS=utf-8
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0)
Run tests sequentially
0:00:00 [1/1] test_re
test_re_benchmarks (test.test_re.ExternalTests)
re_tests benchmarks ... ok
test_re_tests (test.test_re.ExternalTests)
re_tests test suite ... ok
test_overlap_table (test.test_re.ImplementationTest) ... ok
test_bytes (test.test_re.PatternReprTests) ... ok
test_inline_flags (test.test_re.PatternReprTests) ... ok
test_locale (test.test_re.PatternReprTests) ... ok
test_long_pattern (test.test_re.PatternReprTests) ... ok
test_multiple_flags (test.test_re.PatternReprTests) ... ok
test_quotes (test.test_re.PatternReprTests) ... ok
test_single_flag (test.test_re.PatternReprTests) ... ok
test_unicode_flag (test.test_re.PatternReprTests) ... ok
test_unknown_flags (test.test_re.PatternReprTests) ... ok
test_without_flags (test.test_re.PatternReprTests) ... ok
test_anyall (test.test_re.ReTests) ... ok
test_ascii_and_unicode_flag (test.test_re.ReTests) ... ok
test_backref_group_name_in_exception (test.test_re.ReTests) ... ok
test_basic_re_sub (test.test_re.ReTests) ... ok
test_big_codesize (test.test_re.ReTests) ... ok
test_bigcharset (test.test_re.ReTests) ... ok
test_bug_113254 (test.test_re.ReTests) ... ok
test_bug_114660 (test.test_re.ReTests) ... ok
test_bug_117612 (test.test_re.ReTests) ... ok
test_bug_1661 (test.test_re.ReTests) ... ok
test_bug_16688 (test.test_re.ReTests) ... ok
test_bug_20998 (test.test_re.ReTests) ... ok
test_bug_2537 (test.test_re.ReTests) ... ok
test_bug_29444 (test.test_re.ReTests) ... ok
test_bug_3629 (test.test_re.ReTests) ... ok
test_bug_418626 (test.test_re.ReTests) ... ok
test_bug_448951 (test.test_re.ReTests) ... ok
test_bug_449000 (test.test_re.ReTests) ... ok
test_bug_449964 (test.test_re.ReTests) ... ok
test_bug_462270 (test.test_re.ReTests) ... ok
test_bug_527371 (test.test_re.ReTests) ... ok
test_bug_581080 (test.test_re.ReTests) ... ok
test_bug_612074 (test.test_re.ReTests) ... ok
test_bug_6509 (test.test_re.ReTests) ... ok
test_bug_6561 (test.test_re.ReTests) ... ok
test_bug_725106 (test.test_re.ReTests) ... ok
test_bug_725149 (test.test_re.ReTests) ... ok
test_bug_764548 (test.test_re.ReTests) ... ok
test_bug_817234 (test.test_re.ReTests) ... ok
test_bug_926075 (test.test_re.ReTests) ... ok
test_bug_931848 (test.test_re.ReTests) ... ok
test_bytes_str_mixing (test.test_re.ReTests) ... ok
test_category (test.test_re.ReTests) ... ok
test_character_set_errors (test.test_re.ReTests) ... ok
test_compile (test.test_re.ReTests) ... ok
test_constants (test.test_re.ReTests) ... ok
test_dealloc (test.test_re.ReTests) ... ok
test_debug_flag (test.test_re.ReTests) ... ok
test_dollar_matches_twice (test.test_re.ReTests)
$ matches the end of string, and just before the terminating ... ok
test_empty_array (test.test_re.ReTests) ... ok
test_enum (test.test_re.ReTests) ... ok
test_error (test.test_re.ReTests) ... ok
test_expand (test.test_re.ReTests) ... ok
test_finditer (test.test_re.ReTests) ... ok
test_flags (test.test_re.ReTests) ... ok
test_getattr (test.test_re.ReTests) ... ok
test_getlower (test.test_re.ReTests) ... ok
test_group (test.test_re.ReTests) ... ok
test_group_name_in_exception (test.test_re.ReTests) ... ok
test_groupdict (test.test_re.ReTests) ... ok
test_ignore_case (test.test_re.ReTests) ... ok
test_ignore_case_range (test.test_re.ReTests) ... ok
test_ignore_case_set (test.test_re.ReTests) ... ok
test_inline_flags (test.test_re.ReTests) ... ok
test_issue17998 (test.test_re.ReTests) ... ok
test_keep_buffer (test.test_re.ReTests) ... ok
test_keyword_parameters (test.test_re.ReTests) ... ok
test_large_search (test.test_re.ReTests) ... ok
test_large_subn (test.test_re.ReTests) ... ok
test_locale_caching (test.test_re.ReTests) ... skipped 'test needs en_US.iso88591 locale'
test_locale_flag (test.test_re.ReTests) ... FAIL
test_lookahead (test.test_re.ReTests) ... ok
test_lookbehind (test.test_re.ReTests) ... ok
test_match_getitem (test.test_re.ReTests) ... ok
test_match_repr (test.test_re.ReTests) ... ok
test_misc_errors (test.test_re.ReTests) ... ok
test_multiple_repeat (test.test_re.ReTests) ... ok
test_not_literal (test.test_re.ReTests) ... ok
test_nothing_to_repeat (test.test_re.ReTests) ... ok
test_other_escapes (test.test_re.ReTests) ... ok
test_pattern_compare (test.test_re.ReTests) ... ok
test_pattern_compare_bytes (test.test_re.ReTests) ... ok
test_pickling (test.test_re.ReTests) ... ok
test_qualified_re_split (test.test_re.ReTests) ... ok
test_qualified_re_sub (test.test_re.ReTests) ... ok
test_re_escape (test.test_re.ReTests) ... ok
test_re_escape_byte (test.test_re.ReTests) ... ok
test_re_escape_non_ascii (test.test_re.ReTests) ... ok
test_re_escape_non_ascii_bytes (test.test_re.ReTests) ... ok
test_re_findall (test.test_re.ReTests) ... ok
test_re_fullmatch (test.test_re.ReTests) ... ok
test_re_groupref (test.test_re.ReTests) ... ok
test_re_groupref_exists (test.test_re.ReTests) ... ok
test_re_groupref_overflow (test.test_re.ReTests) ... ok
test_re_match (test.test_re.ReTests) ... ok
test_re_split (test.test_re.ReTests) ... ok
test_re_subn (test.test_re.ReTests) ... ok
test_repeat_minmax (test.test_re.ReTests) ... ok
test_repeat_minmax_overflow (test.test_re.ReTests) ... ok
test_repeat_minmax_overflow_maxrepeat (test.test_re.ReTests) ... ok
test_scanner (test.test_re.ReTests) ... ok
test_scoped_flags (test.test_re.ReTests) ... ok
test_search_coverage (test.test_re.ReTests) ... ok
test_search_dot_unicode (test.test_re.ReTests) ... ok
test_search_star_plus (test.test_re.ReTests) ... ok
test_special_escapes (test.test_re.ReTests) ... ok
test_sre_byte_class_literals (test.test_re.ReTests) ... ok
test_sre_byte_literals (test.test_re.ReTests) ... ok
test_sre_character_class_literals (test.test_re.ReTests) ... ok
test_sre_character_literals (test.test_re.ReTests) ... ok
test_stack_overflow (test.test_re.ReTests) ... ok
test_string_boundaries (test.test_re.ReTests) ... ok
test_sub_template_numeric_escape (test.test_re.ReTests) ... ok
test_symbolic_groups (test.test_re.ReTests) ... ok
test_symbolic_refs (test.test_re.ReTests) ... ok
test_unlimited_zero_width_repeat (test.test_re.ReTests) ... ok
test_weakref (test.test_re.ReTests) ... ok

======================================================================
FAIL: test_locale_flag (test.test_re.ReTests)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "/home/bigj/Jaysinh/cpython_git/cpython/Lib/test/test_re.py", line 1422, in test_locale_flag
    self.assertTrue(pat.match(bletter))
AssertionError: None is not true

Ran 120 tests in 2.079s

FAILED (failures=1, skipped=1)
test test_re failed
test_re failed

1 test failed:
test_re

Total duration: 2 sec
Tests result: FAILURE

Local value:
$>locale
LANG=en_IN
LANGUAGE=en_IN:en
LC_CTYPE="en_IN"
LC_NUMERIC="en_IN"
LC_TIME="en_IN"
LC_COLLATE="en_IN"
LC_MONETARY="en_IN"
LC_MESSAGES="en_IN"
LC_PAPER="en_IN"
LC_NAME="en_IN"
LC_ADDRESS="en_IN"
LC_TELEPHONE="en_IN"
LC_MEASUREMENT="en_IN"
LC_IDENTIFICATION="en_IN"
LC_ALL=

Operating system: Ubuntu 16.04 LTS(64 bit)

mrabarnett · 2017-02-15T18:56:17Z

I'm just wondering whether the problem is just due to the locale's encoding being UTF-8. The locale support in re really only works with encodings that use 1 byte/character.

serhiy-storchaka · 2017-02-15T19:03:28Z

Locale encoding is ISO8859-1. This test is skipped on non 8-bit locale.

This is a problem with tests, not with the re module. I don't have a solution.

mrabarnett · 2017-02-15T19:25:47Z

The report says "== encodings: locale=UTF-8, FS=utf-8".

It says that "test_locale_caching" was skipped, but also that "test_locale_flag" failed.

serhiy-storchaka · 2017-02-15T22:57:27Z

Good point. The test used locale.getlocale() and it returned returned ('en_IN', 'ISO8859-1').

Following patch makes the test using locale.getpreferredencoding(False), the same encoding as was reported at the header of test report.

vstinner · 2017-02-15T22:58:47Z

Following patch ...

Seriously? Not a GitHub pull request? ;-) (old habit?)

serhiy-storchaka · 2017-02-16T12:05:05Z

Seriously? Not a GitHub pull request? ;-) (old habit?)

I'm not experienced with git, and devguide still looks not ready.

ncoghlan · 2017-02-18T05:04:16Z

I have a few folks hitting this at the PyCon Pune sprints, so I'm going to apply Serhiy's patch :)

ncoghlan · 2017-02-18T08:54:01Z

Looking into this at the PyCon Pune sprints, the problem appears to be arising due to the following difference in behaviour when the unqualifed en_IN locale is set:

$ LANG=en_IN.UTF-8 python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'UTF-8')
UTF-8

$ LANG=en_IN python3 -c "import locale; print(locale.getlocale(locale.LC_CTYPE), locale.getpreferredencoding(False), sep='\n')"
('en_IN', 'ISO8859-1')                                                                                                       
UTF-8

re.LOCALE is presumably picking up the "UTF-8" rather than the "ISO8859-1", and hence the test is failing.

serhiy-storchaka · 2017-02-18T09:25:59Z

Yes, please push it Nick.

ncoghlan · 2017-02-18T09:31:25Z

New changeset ace5c0f by GitHub in branch 'master':
bpo-29571: Use correct locale encoding in test_re (#149)
ace5c0f

ncoghlan · 2017-02-19T04:33:37Z

New changeset 0683d68 by GitHub in branch '3.6':
[3.6] bpo-29571: Use correct locale encoding in test_re (#149) (#153)
0683d68

ncoghlan · 2017-02-19T04:33:52Z

New changeset 760f596 by GitHub in branch '3.5':
[3.5] bpo-29571: Use correct locale encoding in test_re (#149) (#154)
760f596

zware · 2017-03-06T01:34:31Z

This seems to have broken test_re on Windows, see https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.1

I found this change to be the culprit via git bisect, unfortunately we didn't have any working CI on Windows (buildbots were otherwise broken) at the time this was merged.

benjaminp · 2017-03-06T07:52:43Z

Yep, I think we should merge #422 and revert ncoghlan's change.

serhiy-storchaka · 2017-03-06T07:54:36Z

I'm not sure this will help on Windows.

serhiy-storchaka · 2017-03-06T07:55:15Z

And I don't understand why my fix doesn't work on Windows.

benjaminp · 2017-03-06T07:55:22Z

But the test was never broken on windows.

On Sun, Mar 5, 2017, at 23:54, Serhiy Storchaka wrote:

Serhiy Storchaka added the comment:

I'm not sure this will help on Windows.

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue29571\>

benjaminp · 2017-03-06T07:55:56Z

getpreferredencoding() takes a completely different path on windows
(returns a codepage) and isn't related to the C locale.

ncoghlan · 2017-03-06T07:58:37Z

I'm with Serhiy on this one: if the "re" module isn't using locale.getpreferredencoding(), then there's something odd going on.

It just sounds like the disconnect on Windows is the opposite of the one we hit on Linux without Benjamin's patch, perhaps due to the UTF-8 mode changes - it wouldn't surprise me to learn that the re module is still using mbcs there instead of utf-8.

benjaminp · 2017-03-06T08:01:39Z

I don't see what's odd about it. re.LOCALE uses the C locale, which one
obtains from locale.getlocale(). getpreferredencoding() is not
documented to have anything to do with the C locale, and indeed on
Windows it may be completely different.

ncoghlan · 2017-03-06T15:52:28Z

Thanks for the explanation - given that, I agree that simply reverting the attempted test-based fix and instead relying on the bpo-20087 updates is the way to go.

benjaminp · 2017-03-24T22:42:04Z

New changeset 6a4b04c by Benjamin Peterson in branch '3.6':
Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) (#555)
6a4b04c

benjaminp · 2017-03-24T22:42:11Z

New changeset 312f7df by Benjamin Peterson in branch '3.5':
Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554) (#556)
312f7df

benjaminp · 2017-03-24T22:42:26Z

New changeset 21a7431 by Benjamin Peterson in branch 'master':
Revert "bpo-29571: Use correct locale encoding in test_re (#149)" (#554)
21a7431

ncoghlan · 2018-01-28T14:01:11Z

Hmm, even though we reverted the original test_re based change, and the initial attempted fix for bpo-20087 was also reverted, I'm still not currently seeing the failure for:

    LANG=en_IN.utf8 ./python -m test -v test_re

I do have the locale installed, so it's not a result of falling back to the C locale and that getting coerced to C.UTF-8:

    $ LANG=en_IN.utf8 locale -k currency_symbol
    currency_symbol="₹"

Jaysinh, are you still seeing this test failure on a fresh checkout?

ncoghlan · 2018-01-28T14:15:38Z

Hmm, this actually works for me on Fedora 27 even if I go back to 1b3d88e, the commit just before the initially merged (and subsequently reverted) test change above.

Unassigning, since I can't readily reproduce it myself.

ultimatecoder · 2018-01-29T07:00:04Z

Hello Nick,

At the devsprints of Pycon India 2017, a few participants were facing this bug. They all were from the Ubuntu land. I have switched to Gentoo distro. I am not facing this bug, but let me confirm from any Ubuntu user. Thanks

ncoghlan · 2018-01-29T07:21:07Z

I've also added Matthias and Barry to the cc list, in case this does turn out to be a Debian or Ubuntu specific quirk.

Restating the problem, the issue is that test_locale_flag in test_re may fail for at least the en_IN locale, and we're not sure yet whether that's a test bug, a locale module bug, or a distro bug:

    LANG=en_IN ./python -m test -v test_re

We've only confirmed it on Ubuntu so far though - I haven't been able to reproduce it on Fedora, and Jaysinh hasn't been able to reproduce it since switching to Gentoo.

tirkarthi · 2019-02-28T11:23:45Z

Similar issue reported on debian9.8 stretch with python 3.7.2 and en_IN : bpo-36134

vstinner · 2019-02-28T17:34:31Z

Ah, I can reproduce the bug on Fedora 29 using "LANG=en_IN ./python -m test -v test_re".

The problem is that locale.getlocale() is not reliable: it pretends that the locale encoding is ISO8859-1, whereas the real encoding is UTF-8:

$ LANG=en_IN ./python 
Python 3.8.0a2+ (heads/master:4cbea518a0, Feb 28 2019, 18:19:44) 
>>> chr(224).encode('ISO8859-1')
b'\xe0'
>>> import _testcapi
>>> _testcapi.DecodeLocaleEx(b'\xe0', 0, 'strict')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: decode error: pos=0, reason=decoding error

>> import locale

# Wrong encoding
>>> locale.getlocale(locale.LC_CTYPE)
('en_IN', 'ISO8859-1')
>>> locale.setlocale(locale.LC_CTYPE, None)
'en_IN'
>>> locale._parse_localename('en_IN')
('en_IN', 'ISO8859-1')

# Real encoding
>>> locale.getpreferredencoding()
'UTF-8'
>>> locale.nl_langinfo(locale.CODESET)
'UTF-8'

Attached PR 12099 fix the issue.

vstinner · 2019-02-28T23:05:44Z

This seems to have broken test_re on Windows, see https://ci.appveyor.com/project/python/cpython/build/3.7.0a0.1

It seems like the ANSI code page is 1252 ("cp1252").

== CPython 3.7.0a0 (master:d31b28e16a2387d0251df948ef5d1b33d4357652, Mar 5 2017, 21:47:06) [MSC v.1900 32 bit (Intel)]
== Windows-2012ServerR2-6.3.9600-SP0 little-endian
== hash algorithm: siphash24 32bit
== cwd: C:\projects\cpython\build\test_python_1844
== encodings: locale=cp1252, FS=utf-8
Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=1, verbose=0, bytes_warning=2, quiet=0, hash_randomization=1, isolated=0)
Using random seed 5949816

...

FAIL: test_locale_flag (test.test_re.ReTests)
----------------------------------------------------------------------

Traceback (most recent call last):
  File "C:\projects\cpython\lib\test\test_re.py", line 1422, in test_locale_flag
    self.assertTrue(pat.match(bletter))
AssertionError: None is not true

getpreferredencoding() takes a completely different path on windows
(returns a codepage) and isn't related to the C locale.

On my Windows 10 with Python 3.8, getpreferredencoding() (and getpreferredencoding(False)) returns "cp1252", getlocale(LC_CTYPE)[1] returns "1252". Python has an alias "1252" for "cp1252".

On Windows, getpreferredencoding() is implemented as _locale._getdefaultlocale()[1]. _getdefaultlocale()[1] is implemented with:

    PyOS_snprintf(encoding, sizeof(encoding), "cp%d", GetACP());

At the end, it's the ANSI code page (1252).

--

I don't understand how the change ace5c0f introduced a regression. And so I don't understand how commit 21a7431 (revert) could fix anything.

--

On my PR 12099, two Windows CI run and both succeeded:

AppVeyor: pythoninfo says "locale.encoding: cp1252"
https://ci.appveyor.com/project/python/cpython/builds/22726025
Windows PR Tests on Azure Pipeline: pythoninfo also says "locale.encoding: cp1252"

When the change ace5c0f was merged, Python had no working Windows CI. Things evolved at lot in the meanwhile.

I also tested manually my PR 12099 on my Windows 10 VM which also uses cp1252: test_re pass.

--

re.LOCALE flag of re.compile() for a bytes pattern uses the following function of Modules/_sre.c:

LOCAL(int)
char_loc_ignore(SRE_CODE pattern, SRE_CODE ch)
{
    return ch == pattern
        || (SRE_CODE) sre_lower_locale(ch) == pattern
        || (SRE_CODE) sre_upper_locale(ch) == pattern;
}

vstinner · 2019-02-28T23:08:09Z

New changeset ab71f8b by Victor Stinner in branch 'master':
bpo-29571: Fix test_re.test_locale_flag() (GH-12099)
ab71f8b

vstinner · 2019-03-01T01:13:07Z

AppVeyor failed on the backport to Python 3.7 of my fix: PR 12108.

Ok, now I understand the bug in Python 3.7. locale.getlocale(locale.LC_CTYPE)[1] returns None because Python doesn't set LC_CTYPE to the user preferred locale. I'm not sure of which locale is used in practice in that case, but at least I can say that None is not the expected encoding name... str.encode() and bytes.decode() use UTF-8 when None is passed as the encoding. locale.getpreferredencoding() returns 'cp1252' which is the ANSI code page.

Python 3.8 is different. In bpo-34485, I modified Python 3.8 to set LC_CTYPE locale to the user preference (ANSI code page):
---
commit 177d921
Author: Victor Stinner <vstinner@redhat.com>
Date: Wed Aug 29 11:25:15 2018 +0200

bpo-34485, Windows: LC_CTYPE set to user preference (GH-8988)

On Windows, the LC_CTYPE is now set to the user preferred locale at
startup: _Py_SetLocaleFromEnv(LC_CTYPE) is now called during the
Python initialization. Previously, the LC_CTYPE locale was "C" at
startup, but changed when calling setlocale(LC_CTYPE, "") or
setlocale(LC_ALL, "").

pymain_read_conf() now also calls _Py_SetLocaleFromEnv(LC_CTYPE) to
behave as _Py_InitializeCore(). Moreover, it doesn't save/restore the
LC_ALL anymore.

On Windows, standard streams like sys.stdout now always use
surrogateescape error handler by default (ignore the locale).

---

vstinner · 2019-03-05T12:34:20Z

I wrote C and Python code to check what is the effective encoding used by the LC_CTYPE locale before setlocale(LC_CTYPE, "") is called on Python 3.7. Result: Windows uses the Latin1 encoding.

See attached files: _testcapi.patch + loc.py produced loc.log (output).

vstinner · 2019-03-05T15:17:46Z

New changeset 279657b by Victor Stinner in branch '3.7':
[3.7] bpo-29571: Fix test_re.test_locale_flag() (GH-12178)
279657b

vstinner · 2019-03-05T15:26:32Z

I don't understand the relationship with bpo-20087, so I removed the dependency.

I fixed test_re in 3.7 and master branches. I close the issue.

ultimatecoder mannequin added 3.7 (EOL) end of life topic-regex labels Feb 15, 2017

serhiy-storchaka self-assigned this Feb 15, 2017

serhiy-storchaka added the type-bug An unexpected behavior, bug, or error label Feb 15, 2017

serhiy-storchaka added the tests Tests in the Lib/test dir label Feb 15, 2017

ncoghlan assigned ncoghlan and unassigned serhiy-storchaka Feb 18, 2017

ncoghlan closed this as completed Feb 19, 2017

serhiy-storchaka added the OS-windows label Mar 6, 2017

serhiy-storchaka reopened this Mar 6, 2017

ncoghlan removed their assignment Jan 28, 2018

vstinner added the 3.8 only security fixes label Mar 5, 2019

vstinner closed this as completed Mar 5, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

test_re is failing when local is set for en_IN #73757

test_re is failing when local is set for en_IN #73757

Comments

ultimatecoder mannequin commented Feb 15, 2017

ultimatecoder mannequin commented Feb 15, 2017

mrabarnett mannequin commented Feb 15, 2017

serhiy-storchaka commented Feb 15, 2017

mrabarnett mannequin commented Feb 15, 2017

serhiy-storchaka commented Feb 15, 2017

vstinner commented Feb 15, 2017

serhiy-storchaka commented Feb 16, 2017

ncoghlan commented Feb 18, 2017

ncoghlan commented Feb 18, 2017

serhiy-storchaka commented Feb 18, 2017

ncoghlan commented Feb 18, 2017

ncoghlan commented Feb 19, 2017

ncoghlan commented Feb 19, 2017

zware commented Mar 6, 2017

benjaminp commented Mar 6, 2017

serhiy-storchaka commented Mar 6, 2017

serhiy-storchaka commented Mar 6, 2017

benjaminp commented Mar 6, 2017

benjaminp commented Mar 6, 2017

ncoghlan commented Mar 6, 2017

benjaminp commented Mar 6, 2017

ncoghlan commented Mar 6, 2017

benjaminp commented Mar 24, 2017

benjaminp commented Mar 24, 2017

benjaminp commented Mar 24, 2017

ncoghlan commented Jan 28, 2018

ncoghlan commented Jan 28, 2018

ultimatecoder mannequin commented Jan 29, 2018

ncoghlan commented Jan 29, 2018

tirkarthi commented Feb 28, 2019

vstinner commented Feb 28, 2019

vstinner commented Feb 28, 2019

vstinner commented Feb 28, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

test_re is failing when local is set for `en_IN` #73757

test_re is failing when local is set for `en_IN` #73757