Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solaris/Oracle Studio: Fatal Python error: PyThreadState_Get when built --with-pymalloc #65611

Closed
jbeck mannequin opened this issue May 1, 2014 · 11 comments
Closed

Solaris/Oracle Studio: Fatal Python error: PyThreadState_Get when built --with-pymalloc #65611

jbeck mannequin opened this issue May 1, 2014 · 11 comments
Labels
build The build process and cross-build type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@jbeck
Copy link
Mannequin

jbeck mannequin commented May 1, 2014

BPO 21412
Nosy @jcea, @vstinner, @ned-deily
Superseder
  • bpo-21166: Bus error in pybuilddir.txt 'python -m sysconfigure --generate-posix-vars' build step
  • Files
  • where.out: output (260 frames) of 'where' in gdb
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2014-08-12.03:30:29.104>
    created_at = <Date 2014-05-01.20:37:58.716>
    labels = ['build', 'type-crash']
    title = 'Solaris/Oracle Studio: Fatal Python error: PyThreadState_Get when built --with-pymalloc'
    updated_at = <Date 2014-08-18.23:11:28.323>
    user = 'https://bugs.python.org/jbeck'

    bugs.python.org fields:

    activity = <Date 2014-08-18.23:11:28.323>
    actor = 'jbeck'
    assignee = 'none'
    closed = True
    closed_date = <Date 2014-08-12.03:30:29.104>
    closer = 'ned.deily'
    components = ['Build']
    creation = <Date 2014-05-01.20:37:58.716>
    creator = 'jbeck'
    dependencies = []
    files = ['35140']
    hgrepos = []
    issue_num = 21412
    keywords = []
    message_count = 11.0
    messages = ['217723', '217724', '217733', '217735', '217782', '217804', '218386', '218396', '218410', '225218', '225511']
    nosy_count = 5.0
    nosy_names = ['jcea', 'vstinner', 'ned.deily', 'swalker', 'jbeck']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '21166'
    type = 'crash'
    url = 'https://bugs.python.org/issue21412'
    versions = ['Python 3.4']

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented May 1, 2014

    I am porting Python 3.4.0 to Solaris 12. The Makefile I inherited from my predecessor had --without-pymalloc as an option to be passed to configure. Curious why, I removed this line, only to find that after python was built, it core dumped:

    LD_LIBRARY_PATH=/builds/jbeck/ul-python-3/components/python/python34/build/sparcv9 ./python -E -S -m sysconfig --generate-posix-vars
    Fatal Python error: PyThreadState_Get: no current thread
    make[3]: *** [pybuilddir.txt] Abort (core dumped)

    But if I add the --without-pymalloc line back to my Makefile, everything works fine.

    Note that although this example was on sparc, the exact same thing occurred on x86.

    I searched for a similar bug but did not find out; please feel free to close this as a duplicate if there is one that I missed. I also suspect I have not provided enough information, out of a desire not to trigger information overload. But I would be happy to provide whatever specifics might be requested.

    @jbeck jbeck mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels May 1, 2014
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented May 1, 2014

    On SPARC/suncc the flags in http://bugs.python.org/issue15963#msg170661
    appear to work.

    Also, we have several Solaris build slaves that don't core dump.
    Some are offline, but you can click through to the ./configure
    steps of past builds to see the build flags.

    http://buildbot.python.org/all/waterfall?category=3.x.stable&category=3.x.unstable

    @jcea
    Copy link
    Member

    jcea commented May 1, 2014

    What compiler are you using?.

    I compile fine on Solaris with GCC.

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented May 1, 2014

    Using Oracle Studio 12.3, same as mentioned in http://bugs.python.org/issue15963#msg170661 (as Stefan pointed out). I am using some of those flags but not all of them. I will try the others when I have a chance, then report back.

    @vstinner
    Copy link
    Member

    vstinner commented May 2, 2014

    "LD_LIBRARY_PATH=/builds/jbeck/ul-python-3/components/python/python34/build/sparcv9 ./python -E -S -m sysconfig --generate-posix-vars
    Fatal Python error: PyThreadState_Get: no current thread"

    Could you please run this command in gdb and copy/paste the C traceback (gdb command "where") where the fatal error occurs?

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented May 2, 2014

    Victor: sure; see attached.

    @vstinner
    Copy link
    Member

    Victor: sure; see attached.

    Ok, so the error occurs when Python tries to import the _heapq dynamic module: PyModule_Create2() calls PyThreadState_Get() to retrieve to current thread, but it fails. There is a current thread because PyModule_Create2() is called indirectly by PyEval_EvalFrameExReal() (and I don't see where the GIL would be released in the call stack).

    It looks like a bug in PyThreadState_Get(). This function relies on _Py_atomic_load_relaxed() which is defined in Include/pyatomic.h. This file has an implementation of atomic functions for Intel processors and contains an interesting comment:

    ...
    #else /* !gcc x86 */
    /* Fall back to other compilers and processors by assuming that simple
    volatile accesses are atomic. This is false, so people should port
    this. */
    ...

    It looks like John tries Python on SPARC which may explain the issue.

    This is just a theory. It also looks like we had SPARC buildbots running on Solaris with system C compiler ("/opt/solarisstudio12.3/bin/cc") and it was able to run tests.

    I don't understand the link with pymalloc.

    @john: Did you try to build Python 3.3? Did it work?

    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented May 13, 2014

    Victor:

    • This is not a SPARC-specific issue; the exact same failure occurs
      on x86.

    • I had built Python 3.3 (some time ago) but only --without-pymalloc.
      But I tried just now rebuilt Python 3.3 --with-pymalloc, and it
      failed in the exact same way.

    @vstinner vstinner changed the title core dump in PyThreadState_Get when built --with-pymalloc Solaris/Oracle Studio: Fatal Python error: PyThreadState_Get when built --with-pymalloc May 13, 2014
    @vstinner
    Copy link
    Member

    "This is not a SPARC-specific issue; the exact same failure occurs on x86."

    Ah ok, good to know. To me, it looks like a compiler issue. Did you try Stefan's advices in issue bpo-15963?

    You may try to disable compiler optimizations to see if you get the same behaviour.

    @ned-deily
    Copy link
    Member

    This appears to be another variation on the problem recently identified in bpo-21166, namely that the pybuildir.txt Makefile rule can incorrectly import a shared library module from a previously installed Python instance and, if the ABIs of the installed and being-built Pythons differ, the newly-built interpreter can fail in various ways. From your supplied trace, one can see that _heapq.so has incorrectly been inported from the installed system Python 3.4 which was probably built with --without-pymalloc:

    #7 0x00007ff2f9ee2a6d in PyInit__heapq ()
    from /usr/lib/python3.4/lib-dynload/64/_heapq.so
    #8 0x00007ff2f94c7c78 in _PyImport_LoadDynamicModule ()
    from /builds/jbeck/ul-python-3/components/python/python34/build/amd64/libpython3.4m.so.1.0

    The fixes for bpo-21166, when applied, should prevent this problem.

    @ned-deily ned-deily added build The build process and cross-build and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Aug 12, 2014
    @jbeck
    Copy link
    Mannequin Author

    jbeck mannequin commented Aug 18, 2014

    Ned: yes, I can confirm that the patch from http://bugs.python.org/issue21166 does indeed fix the problem. Thank you very much!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    build The build process and cross-build type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants