Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS X: python3 from python-3.1.dmg crashes at startup #50642

Closed
mdickinson opened this issue Jul 1, 2009 · 21 comments
Closed

OS X: python3 from python-3.1.dmg crashes at startup #50642

mdickinson opened this issue Jul 1, 2009 · 21 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-mac type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@mdickinson
Copy link
Member

BPO 6393
Nosy @ronaldoussoren, @mdickinson, @pitrou, @vstinner, @benjaminp, @ned-deily
PRs
  • [2.7] bpo-6393: Fix locale.getprerredencoding() on macOS #1555
  • Files
  • issue6393-fix.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ronaldoussoren'
    closed_at = <Date 2009-11-24.16:37:44.359>
    created_at = <Date 2009-07-01.10:51:55.936>
    labels = ['OS-mac', 'interpreter-core', 'type-crash']
    title = 'OS X: python3 from python-3.1.dmg crashes at startup'
    updated_at = <Date 2017-05-12.09:51:40.558>
    user = 'https://github.com/mdickinson'

    bugs.python.org fields:

    activity = <Date 2017-05-12.09:51:40.558>
    actor = 'vstinner'
    assignee = 'ronaldoussoren'
    closed = True
    closed_date = <Date 2009-11-24.16:37:44.359>
    closer = 'ronaldoussoren'
    components = ['Interpreter Core', 'macOS']
    creation = <Date 2009-07-01.10:51:55.936>
    creator = 'mark.dickinson'
    dependencies = []
    files = ['14476']
    hgrepos = []
    issue_num = 6393
    keywords = ['patch', 'needs review']
    message_count = 21.0
    messages = ['89972', '90285', '90302', '90303', '90308', '90310', '90312', '90314', '90320', '90323', '90373', '90445', '90447', '90608', '90609', '90610', '90617', '92322', '93174', '95124', '293537']
    nosy_count = 10.0
    nosy_names = ['ronaldoussoren', 'mark.dickinson', 'pitrou', 'vstinner', 'benjamin.peterson', 'ned.deily', 'grahamd', 'srid', 'Phil', 'slavi']
    pr_nums = ['1555']
    priority = 'critical'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue6393'
    versions = ['Python 3.1', 'Python 3.2']

    @mdickinson
    Copy link
    Member Author

    There was a report[1] on c.l.p. that python3 from the OS X Python 3.1
    dmg download at www.python.org/download/releases/3.1/ crashes on
    startup. I can reproduce this with the python.org download (using the
    OS X Terminal) only with a bad locale setting:

    newton:~ dickinsm$ LANG=utf-8 python3
    Fatal Python error: Py_Initialize: can't initialize sys standard streams
    LookupError: unknown encoding:
    Abort trap (core dumped)

    The core dump isn't useful: just lots of 'No symbol table info
    available.'

    This is on OS X 10.5.7/Intel.

    I can't reproduce it with either the py3k branch or the release31-maint
    branch, built from scratch.

    I suspect that this has to do with the behaviour of nl_langinfo(CODESET)
    on OS X: namely, after doing (in C) setlocale(LC_CTYPE, ""), the result
    of nl_langinfo(CODESET) appears to be "UTF-8" for well-defined utf-8
    locales (e.g., 'en_US.UTF-8'), "US-ASCII" for meaningless locales (e.g.,
    'invalid'), but one just gets "" for locales like 'utf-8' or 'en_US'.
    This in turn affects Python's locale.getpreferredencoding function.
    See also bpo-2173, which may be related.

    Ronald, any ideas?

    [1] http://mail.python.org/pipermail/python-list/2009-June/718255.html

    @mdickinson mdickinson added interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-mac type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 1, 2009
    @ned-deily
    Copy link
    Member

    This is a side effect of the fix for bpo-6202. Prior to r73268,
    locale.getpreferredencoding always returned "mac-roman" regardless of the
    setting of LANG, so this wasn't a problem in py3k (or 3.0.x builds) up
    through 3.1rc1. I can reproduce it on current py3k and release31-maint.

    @ned-deily
    Copy link
    Member

    Note, you can produce the same error on OS X or linux by setting
    PYTHONIOENCODING="", which effectively overrides the value returned
    nl_langinfo(CODESET). In pythonrun.c, create_stdio passes
    PYTHONENCODING, if set, on as the "encoding" value to TextIOWrapper. If
    no encoding was specified, TextIOWrapper uses the value returned by
    locale.getpreferrencoding(). It then calls PyCodec_IncrementalDecoder
    and the unknown (or empty) encoding is finally detected.

    That raises the question of how far python should go in protecting the
    user. One *could* add a check in pythonrun.c to substitute some
    suitable default (UTF-8) if nl_langinfo(CODESET) returns an empty value.
    Or perhaps just abort there with a more meaningful error message.

    @ned-deily
    Copy link
    Member

    "... create_stdio passes PYTHONIOENCODING ..."

    @mdickinson
    Copy link
    Member Author

    One *could* add a check in pythonrun.c to substitute some suitable
    default (UTF-8) if nl_langinfo(CODESET) returns an empty value.

    While googling for the source of this problem, I found other software
    projects that take this approach. It doesn't seem totally unreasonable.

    I just wish I understood *why* nl_langinfo(CODESET) is returning "" in
    these cases. I've looked for the source at
    http://www.opensource.apple.com, but can't find it; maybe that part of
    Darwin isn't open source.

    It seems that a lot of people end up with an OS X Terminal setup such that
    LC_CTYPE is 'UTF-8' (perhaps this is a 10.4 thing---I haven't encountered
    this myself); I don't think these people should have to deal with a
    confusing error on startup; defaulting to UTF-8 on OS X seems like a
    reasonable compromise.

    @ronaldoussoren
    Copy link
    Contributor

    The manpage says that nl_langinfo returns an empty string when there is
    an invalid setting.

    There is validity in saying that 'LANG=utf-8' is an invalid setting, the
    LANG variable is supposed to a locale name, which would be a language
    setting (possibly combined with a codeset definition). "utf-8" is not a
    language.

    I wouldn't mind falling back to utf-8 as the default codeset when
    nl_langinfo returns an empty string because utf-8 is the default
    character set on OSX, and furthermore defaulting to some value is way
    better than crashing.

    I do wonder how the user ended up with LANG=utf-8 in the first place.

    @mdickinson
    Copy link
    Member Author

    There is validity in saying that 'LANG=utf-8' is an invalid setting

    Agreed. But that doesn't really explain why e.g. LANG=en_US also
    produces "", while LANG=invalid produces "US-ASCII".

    I do wonder how the user ended up with LANG=utf-8 in the first place.

    Me too. As far as I can gather, it's a result of setting the Terminal
    preferences (particularly the character encoding and 'Set LANG
    environment variable on startup' checkbox) in some particular way, on
    some versions of OS X, for users in some countries, at some particular
    phases of the moon, etc...

    @ronaldoussoren
    Copy link
    Contributor

    The attached patch (bpo-6393-fix.patch) seems to fix the issue.

    Could you please test and have a look at the patch? It basicly tests if
    the output of nl_langinfo(CODESET) is the empty string and defaults to
    'UTF-8' in that case (but only on OSX).

    I intent to apply this patch unless someone objects to that.

    @mdickinson
    Copy link
    Member Author

    Thanks, Ronald! The patch fixes the problem for me.
    (I directly patched the locale.py file installed from
    the Python dmg, since I still haven't figured out how
    to build a python executable that exhibits this
    problem.)

    The patch doesn't look quite right, though: in the else clause,
    it looks as though you're testing 'result' before it exists.
    Shouldn't the 'result = nl_langinfo(CODESET)' line come
    before the 'if not result and ....' line?

    On the subject of Terminal and LANG, LC_CTYPE settings, I found an
    interesting link:

    http://pastie.textmate.org/111807

    Indeed, after setting my region to 'South Africa' in Preferences ->
    International -> Formats, a newly opened Terminal window gives me:

    newton:~ dickinsm$ locale
    LANG=
    LC_COLLATE="C"
    LC_CTYPE="UTF-8"
    LC_MESSAGES="C"
    LC_MONETARY="C"
    LC_NUMERIC="C"
    LC_TIME="C"
    LC_ALL=

    And then python3 crashes on startup as above. This is on a newborn (3-
    week old) MacBook Pro that's been barely changed from default settings
    (and no transfer of files and settings from an old Mac, either).

    @ronaldoussoren
    Copy link
    Contributor

    Good catch, the code in the else is indeed in the wrong order.

    @ned-deily
    Copy link
    Member

    Looks good and the "patched" patch also works in a py3k installer build.

    BTW, Mark, I was curious as to why you were unable to reproduce the
    problem with your own build. I should have mentioned that my testing
    was with complete installer (framework) builds. I subsequently
    experimented with a non-framework build and found that I could not
    reproduce the problem running from the ./python in the build directory.
    Stepping through gdb showed that, during the calls from create_stdio,
    the import of locale fails in textio.c, so it falls back to using
    "ascii" as the default encoding (~line 899) and avoids the crash. If I
    do a make install, the unpatched installed bin/python3 does crash in the
    same way as with the installer python3.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 12, 2009

    Once this patch is checked in, should we do an emergency 3.1.1 release?

    @mdickinson
    Copy link
    Member Author

    I'm don't know whether this is really worth a 3.1.1, all by itself.
    There's an easy workaround, which is for affected users to set their
    locale properly.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Jul 17, 2009

    I see this problem on both MacOS X 10.5 and on Windows. This is when using
    Python embedded inside of Apache/mod_wsgi.

    On MacOS X the error is:

    Fatal Python error: Py_Initialize: can't initialize sys standard streams
    ImportError: No module named encodings.utf_8

    On Windows the error is:

    Fatal Python error: Py_Initialize: can't initialize sys standard streams
    LookupError: unknown encoding: cp0

    The talk about the fix mentioned it only addressing MacOS X. What about
    Windows case I am seeing. Will it help with that at all?

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Jul 17, 2009

    Hmmm, actually my MacOS X error is different, although Windows one is
    same, except that encoding is listed and isn't empty.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Jul 17, 2009

    You can ignore my MacOS X example as that was caused by something else.

    My question still stands as to whether the fix will address the similar
    problem I saw on Windows.

    @grahamd
    Copy link
    Mannequin

    grahamd mannequin commented Jul 17, 2009

    I have created bpo-6501 for my Windows variant of this problem given that
    it appears to be subtly different due to there being an encoding where as
    the MacOS X variant doesn't have one.

    Seeing that the fix for the MacOS X issue is in Python code, I will when I
    have a chance look at whether can work out any fix for the Windows
    variant. Not sure I have right tools to compile Python from C code on
    Windows, so if a C code problem, not sure can really investigate.

    @ronaldoussoren
    Copy link
    Contributor

    I've applied the fixed version of my patch in r74687 (3.x) and r74688
    (3.1).

    @slavi
    Copy link
    Mannequin

    slavi mannequin commented Sep 27, 2009

    There is an error in r74687 (3.x) and r74688 (3.1) fixes - in the 'else'
    clause there should be 'return result' at the end.

    @ned-deily
    Copy link
    Member

    The missing return result in the else case has been subsequently fixed in
    r75539 (py3k) and r75541 (3.0) so this issue should be re-closed.

    @vstinner
    Copy link
    Member

    New changeset 94a3694 by Victor Stinner in branch '2.7':
    bpo-6393: Fix locale.getprerredencoding() on macOS (bpo-1555)
    94a3694

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) OS-mac type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants