-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CODESET error on AMD64 FreeBSD 10.x Shared 3.x caused by the PEP 538 #74832
Comments
Regression caused by the commit 6ea4186, bpo-28180: Implementation for PEP-538. Python detected LC_CTYPE=C: LC_CTYPE coerced to UTF-8 (set another locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behavior). Current thread 0x0000000802006400 (most recent call first): |
On my FreeBSD 11 VM, I only have the "C" locale, not "UTF-8 C" locale: [haypo@freebsd ~/prog/python/master]$ locale -a|grep ^C But CPython still asks me to use a non existent locale (newlines added for readability): [haypo@freebsd ~/prog/python/master]$ ./python Python runtime initialized with LC_CTYPE=C (a locale with default ASCII encoding), which may cause Unicode compatibility problems. Using C.UTF-8, C.utf8, or UTF-8 (if available) as alternative Unicode-compatible locales is recommended. Python 3.7.0a0 (heads/master:d79c1d4a94, Jun 13 2017, 10:59:23)
[GCC 4.2.1 Compatible FreeBSD Clang 3.8.0 (tags/RELEASE_380/final 262564)] on freebsd11
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, None)
'C' |
Per POSIX, the C locale is only expected to be ASCII. C.UTF-8 is a linux only thing (actually I thought it was a debian only thing, but maybe not). I was thinking about creating a C.utf8 locale on FreeBSD but it is not that simple to do (still doable and an interesting idea). Note that if it fails here, it is probably due also failing on other OS. At minimum: Dragonfly and Illumos for sure, maybe NetBSD and OpenBSD as well. haypo, do not hesitate to ping me on irc as usual if you want to discuss the issue. |
macOS is also BSD-like with regard to locales: it also does not have any C.* locales other than plain C. See, for example, the discussion at bpo-18378. |
More details here: The closest thing to a C locale with unicode would be to set everything to locale C but LC_CTYPE which would be set to en_US.UTF-8. The problem is if your data for ctype comes from CLDR they are different per locales. On FreeBSD, Dragonfly and Illumos, we have extected it so LC_CTYPE is the same on all locales. |
Note that the coercion logic includes a runtime check to see if 'setlocale(LC_CTYPE, "<locale_name>")' succeeds. That's how we skip over the non-existent C.UTF-8 and C.utf8 to get to "LC_CTYPE=UTF-8" on Mac OS X and FreeBSD. That appears to work (and really does work on Mac OS X as far as CPython's test suite is concerned), but on FreeBSD we subsequently get the CODESET failure when we try to call Victor's suggestion, which seems reasonable to me, is that we could also add the That way, instead of the interpreter failing to start, we'd just skip the locale coercion logic in that case (and update the test suite's expectations accordingly). |
Current status of the PR:
Accordingly, I've revised the tests as follows:
At the locale coercion level, I've added an extra check where we save the initial locale (i.e. before we change anything), and if setlocale() succeeds, but nl_langinfo(CODESET) fails, we do setlocale(LC_CTYPE, initial_locale) to try to get things back to their original state. This seems to *mostly* work on FreeBSD, but doesn't quite get readline back to where it is by default, so test_non_ascii in test_readline fails with the error:
My two current guesses as to what may be going wrong there are:
I'm leaning towards the former, as if it was the latter, I'd expect to have already seen the same error *without* locale coercion. |
I was able to fix the test_readline failure by restoring the locale based on the environment settings with That means we can leave the runtime coercion checks enabled on *BSD systems, and if/when any given BSD variant adds working Linux-style C.UTF-8 or OS-X-style UTF-8 locales, we'll automatically start using them. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: