Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

Closed
WanderingLogic mannequin opened this issue Oct 27, 2014 · 11 comments
Closed

Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined #66936

WanderingLogic mannequin opened this issue Oct 27, 2014 · 11 comments
Labels
interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@WanderingLogic
Copy link
Mannequin

WanderingLogic mannequin commented Oct 27, 2014

BPO 22747
Nosy @malemburg, @loewis, @pitrou, @vstinner, @skrah, @xdegaye, @Fak3
Files
  • no_langinfo_during_init.patch
  • locale.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-05-15.02:56:56.877>
    created_at = <Date 2014-10-27.21:30:08.131>
    labels = ['interpreter-core', 'type-crash']
    title = 'Interpreter fails in initialize on systems where HAVE_LANGINFO_H is undefined'
    updated_at = <Date 2019-05-15.02:56:56.876>
    user = 'https://bugs.python.org/WanderingLogic'

    bugs.python.org fields:

    activity = <Date 2019-05-15.02:56:56.876>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-05-15.02:56:56.877>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2014-10-27.21:30:08.131>
    creator = 'WanderingLogic'
    dependencies = []
    files = ['37046', '42585']
    hgrepos = []
    issue_num = 22747
    keywords = ['patch']
    message_count = 11.0
    messages = ['230106', '230111', '230385', '230391', '230393', '230394', '230407', '264160', '264202', '264203', '342542']
    nosy_count = 10.0
    nosy_names = ['lemburg', 'loewis', 'pitrou', 'vstinner', 'Arfrever', 'skrah', 'xdegaye', 'python-dev', 'Roman.Evstifeev', 'WanderingLogic']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue22747'
    versions = ['Python 3.4']

    @WanderingLogic
    Copy link
    Mannequin Author

    WanderingLogic mannequin commented Oct 27, 2014

    On systems where configure is unable to find langinfo.h (or where nl_langinfo() is not defined), configure undefines HAVE_LANGINFO_H in pyconfig.h. Then in pythonrun.c:get_locale_encoding() the call to nl_langinfo() is wrapped in an #ifdef, but the #else path on the ifdef does a PyErr_SetNone(PyExc_NotImplementedError) and returns NULL, which causes initfsencoding() to fail with the message "Py_Initialize: Unable to get the locale encoding", which causes the interpreter to abort.

    I'm confused because http://bugs.python.org/issue8610 (from 2010) seems to have come down on the side of deciding that nl_langinfo() failures should be treated as implicitly returning either "ASCII" or "UTF-8" (I'm not sure which). But maybe that was for a different part of the interpreter?

    In any case there are 4 choices here, all of which are preferable to what we are doing now.

    1. Fail during configure. If we can't even start the interpreter, then why waste the users time with the build?
    2. Fail during compilation. The #else path could contain #error "Python only works on systems where nl_langinfo() is correctly implemented." Again, this would be far preferable to failing only once the user has finished the install and tries to get the interpreter prompt.
    3. Implement our own python_nl_langinfo() that we fall back on when the system one doesn't exist. (It could, for example, return "ASCII" (or "ANSI_X3.4-1968") to start with, and "UTF-8" after we see a call to setlocale(LC_CTYPE, "") or setlocale(LC_ALL, "").
    4. just return the string "ASCII".

    The attached patch does the last. I'm willing to try to write the patch for choice (3) if that's what you'd prefer. (I have an implementation that does (3) for systems that also don't have setlocale() implemented, but I don't yet know how to do it if nl_langinfo() doesn't exist but setlocale() does.)

    @WanderingLogic WanderingLogic mannequin added interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 27, 2014
    @vstinner
    Copy link
    Member

    vstinner commented Oct 27, 2014

    I'm confused because http://bugs.python.org/issue8610 (from 2010) seems
    to have come down on the side of deciding that nl_langinfo() failures
    should be treated as implicitly returning either "ASCII" or "UTF-8"

    It's very important than Py_DecodeLocale and Py_EncodeLocale use the same
    encoding than sys.getfilesystemencoding().

    What is your platform? Which encoding is used by these functions?

    @WanderingLogic
    Copy link
    Mannequin Author

    WanderingLogic mannequin commented Oct 31, 2014

    My platform is the Android command-line shell. Essentially it is like an embedded linux platform with a very quirky partially implemented libc (not glibc). It has no langinfo.h and while it has locale.h, the implementations of setlocale() and localeconv() do nothing (and return null). The wcstombs() and mbstowcs() functions are both mapped to strncpy().

    As was the original intent of utf-8, since the Linux kernel (and most supported file systems) store filenames as null-terminated byte strings, utf-8 encoded file names "work" with software that assumes that the encoding is utf-8 (for example the xterm program that I'm using to "ssh" into the machine) (for another example, the Dalvik JVM that runs user-apps.)

    My intent with this tracker is to make it slightly easier for people who have libc like Android where the locale support is completely broken and really only 8-bit "ascii" is supported to get something reasonable to compile and run, while simultaneously not breaking the supported platforms.

    If you look at what Kivy and Py4A have done, they basically have patches all over the main interpreter that, once applied, make the interpreter not work on any supported platform. I'm trying to avoid that approach. Two possibilities for this particular part of the interpreter are to implement option (3) above, or to implement option (4) above. Option (3) is preferable in the long run, but option(4) is a much smaller change (as long as it does consistently with the decision of tracker 8610.)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Oct 31, 2014

    Has anyone made an effort to get this fixed in Android? I find it strange that hundreds of projects now work around Android bugs instead of putting (friendly) pressure on the Android maintainers.

    Minimal langinfo.h and locale.h support should be trivial to implement.

    @WanderingLogic
    Copy link
    Mannequin Author

    WanderingLogic mannequin commented Oct 31, 2014

    I am working on using my resources at Intel to put some pressure on Google to fix some of the (many) problems in the Bionic libc.

    I have a sort of "polyfill" library that implements locale.h, langinfo.h, as well as the structure definitions for wchar.h, and it borrows the utf8 mbs*towcs() and wcs*tombs() implementations from FreeBSD. It implements a setlocale() and nl_langinfo() that starts in locale "C", fakes it as though the user's envvars are set to "C.UTF-8" (so if you call setlocale(LC_ALL, "") the encoding is changed to UTF-8).

    But Bionic has been broken for many years, and it will most likely take many more years before I (or somebody) can arrange the right set of things to get it fixed. It is not really in Google's interest to have people writing non-JVM code, so they seem to only grudgingly support it, their JVM APIs are the "walled garden" that keeps apps sticky to their platform, while allowing them to quickly switch to new processor architectures if they need to.

    But all of that is not really germane to this bug. The fact is that cpython, when compiled for a system with no langinfo.h creates an executable that does nothing but crash.

    What other systems (other than Android) have no langinfo.h? (Alternatively, why has this feature-test been in configure.ac for many years?) If the solution for Android is "it's android's bug and they should fix it" then shouldn't we remove all the #ifdef HAVE_LANGINFO_H tests from the code and just let compilation fail on systems that don't have langinfo.h? That is option (1) or (2) that I suggested above.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Oct 31, 2014

    To expand a little, here ...

    https://code.google.com/p/android/issues/list

    ... I cannot find either a localeconv() or an nl_langinfo() issue.

    Perhaps the maintainers would be willing to add minimal versions?

    @vstinner
    Copy link
    Member

    vstinner commented Oct 31, 2014

    If the platform doesn't provide anything, we can maybe adopt the same
    approach than Mac OS X: force the encoding to UTF-8 and just don't use
    the C library.

    @xdegaye
    Copy link
    Mannequin

    xdegaye mannequin commented Apr 25, 2016

    Android default system encoding is UTF-8 as specified at http://developer.android.com/reference/java/nio/charset/Charset.html

    <quote>The platform's default charset is UTF-8. (This is in contrast to some older implementations, where the default charset depended on the user's locale.) </quote>

    If the platform doesn't provide anything, we can maybe adopt the same
    approach than Mac OS X: force the encoding to UTF-8 and just don't use
    the C library.

    The attached patch does the same thing as proposed by Victor but emphasizes that Android does not HAVE_LANGINFO_H and does not have CODESET. And the fact that HAVE_LANGINFO_H and CODESET are not defined causes other problems (maybe as well in Mac OS X). In that case, PyCursesWindow_New() in _cursesmodule.c falls back nicely to "utf-8", but _Py_device_encoding() in fileutils.c instead does a Py_RETURN_NONE. It seems that this impacts _io_TextIOWrapper___init___impl() in textio.c and os_device_encoding_impl() in posixmodule.c. And indeed, os.device_encoding(0) returns None on android.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 25, 2016

    New changeset ad6be34ce8c9 by Stefan Krah in branch 'default':
    Issue bpo-22747: Workaround for systems without langinfo.h.
    https://hg.python.org/cpython/rev/ad6be34ce8c9

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 26, 2016

    We don't support Android officially yet, but I think until bpo-8610
    is resolved something must be done here.

    @vstinner
    Copy link
    Member

    vstinner commented May 15, 2019

    Python 3 (I don't recall which version exactly) has been fixed to always use UTF-8 on Android for the filesystem encoding and even for the locale encoding in most places. I close the issue.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core Interpreter core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant