Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curses crash on FreeBSD #51633

Closed
mdickinson opened this issue Nov 23, 2009 · 52 comments
Closed

curses crash on FreeBSD #51633

mdickinson opened this issue Nov 23, 2009 · 52 comments
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@mdickinson
Copy link
Member

BPO 7384
Nosy @akuchling, @mdickinson, @vstinner, @ashemedai, @bitdancer, @skrah
Files
  • freebsd-curses.diff: Possible fix
  • issue7384.patch
  • issue7384-2.patch
  • issue7384-3-py3k.patch
  • issue7384-4-py3k.patch
  • issue7384-5-py3k.patch
  • issue7384-5-trunk.patch
  • ldd-retval-py3k.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-07-17.12:41:55.048>
    created_at = <Date 2009-11-23.20:03:46.996>
    labels = ['extension-modules', 'type-bug']
    title = 'curses crash on FreeBSD'
    updated_at = <Date 2010-07-17.12:41:55.047>
    user = 'https://github.com/mdickinson'

    bugs.python.org fields:

    activity = <Date 2010-07-17.12:41:55.047>
    actor = 'skrah'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-07-17.12:41:55.048>
    closer = 'skrah'
    components = ['Extension Modules']
    creation = <Date 2009-11-23.20:03:46.996>
    creator = 'mark.dickinson'
    dependencies = []
    files = ['16935', '16963', '16973', '17023', '17050', '17064', '17528', '17997']
    hgrepos = []
    issue_num = 7384
    keywords = ['patch', 'buildbot']
    message_count = 52.0
    messages = ['95652', '97709', '97722', '99657', '99658', '99659', '103231', '103256', '103261', '103263', '103264', '103265', '103267', '103295', '103307', '103308', '103393', '103394', '103395', '103429', '103432', '103497', '103503', '103828', '103838', '103980', '103996', '103997', '104000', '104002', '104054', '104057', '104070', '104071', '104074', '104283', '104302', '104311', '104315', '106199', '106939', '106940', '106948', '107323', '107999', '110222', '110224', '110225', '110238', '110271', '110378', '110550']
    nosy_count = 8.0
    nosy_names = ['akuchling', 'mark.dickinson', 'vstinner', 'asmodai', 'rpetrov', 'Arfrever', 'r.david.murray', 'skrah']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7384'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @mdickinson
    Copy link
    Member Author

    test_curses is currently causing the test runs to abort on the FreeBSD 6.4
    and 7.2 buildbots.

    I can reproduce this on a FreeBSD 7.2 /amd64 machine by doing

    ./python Lib/test/regrtest.py -uall test___all__ test_curses

    This dumps core, and the traceback points at the call to delwin() in
    PyCursesWindow_Dealloc, but it's far from obvious (to me) what's going
    wrong. wo->win is not NULL here, and appears to point to a valid WINDOW.
    However, stdscr is NULL! As far as I can tell, this shouldn't happen.

    test_curses by itself doesn't crash, unless I add an 'import readline' or
    'import rlcompleter' to the top of test_curses.py.

    I expect to have access to the FreeBSD machine for a few more days. Any
    hints about what to try next would be appreciated.

    @mdickinson mdickinson added the extension-modules C modules in the Modules dir label Nov 23, 2009
    @mdickinson
    Copy link
    Member Author

    I've not had any success tracking the cause of this failure down, and no longer have the resources to do so. It does appear that curses itself is broken on FreeBSD: it's not just a problem with the tests.

    Adding Andrew Kuchling to the nosy in case he has any ideas what's wrong here.

    Since the test_curses crash is currently aborting the test run, and so preventing us from getting feedback from the other tests on the FreeBSD buildbots, I propose that test_curses be skipped with a "the curses module is broken on FreeBSD" message.

    @mdickinson mdickinson self-assigned this Jan 13, 2010
    @mdickinson mdickinson changed the title test_curses crash on FreeBSD buildbots curses crash on FreeBSD Jan 13, 2010
    @bitdancer
    Copy link
    Member

    Given your diagnosis so far, +1 on the skip.

    @mdickinson
    Copy link
    Member Author

    It does appear that curses itself is broken on FreeBSD

    Rereading this, it doesn't say what I meant it to say: I meant that the Python curses module seems to be broken, not that the system-level curses library is broken (though that seems possible too).

    @mdickinson
    Copy link
    Member Author

    Applied the test_curses skip in r78281 (trunk); will merge to the other branches.

    Leaving this issue open, since the root cause isn't fixed.

    @mdickinson mdickinson removed their assignment Feb 21, 2010
    @mdickinson mdickinson added the type-bug An unexpected behavior, bug, or error label Feb 21, 2010
    @mdickinson
    Copy link
    Member Author

    Merged to the other 3 branches in revisions r78282 (release26-maint), r78283 (py3k), r78284 (release31-maint).

    @mdickinson
    Copy link
    Member Author

    I'm looking at this again, after installing FreeBSD 8.0/amd64 in a VM.

    I've reduced Lib/test/test_curses.py to the following 9 lines:

    import rlcompleter
    import curses
    f = open('mytempfile', 'w+b')
    stdscr = curses.initscr()
    stdscr.putwin(f)
    f.seek(0)
    curses.getwin(f)
    f.close()
    curses.endwin()

    I then get:

    $ ./python Lib/test/regrtest.py test_curses
    test_curses
    Bus error (core dumped)

    From looking at the core dump, and tracing through with gdb, the core dump occurs when delwin is called (from PyCursesWindow_Dealloc) on the result of curses.getwin(f), as a result of garbage collection.

    The 'import rlcompleter' line appears to be necessary to cause this; I've no idea why.

    @mdickinson
    Copy link
    Member Author

    Here's the top of the backtrace. (Thanks asmodai for helping me out with working out how to build a FreeBSD system ncurses with debugging information.)

    #0 0x0000000801460714 in cannot_delete (win=0x80116b1d0)
    at /usr/src/lib/ncurses/ncursesw/../../../contrib/ncurses/ncurses/base/lib_delwin.c:54
    p = (struct _win_list *) 0xdbdbdbdbdbdbdbdb
    result = false
    #1 0x0000000801460773 in delwin (win=0x80116b1d0)
    at /usr/src/lib/ncurses/ncursesw/../../../contrib/ncurses/ncurses/base/lib_delwin.c:71
    result = -1
    #2 0x000000080170d140 in PyCursesWindow_Dealloc (wo=0x800eb74c0)
    at /usr/home/dickinsm/python/svn/trunk/Modules/cursesmodule.c:357
    No locals.
    #3 0x000000000046325f in _Py_Dealloc (op=0x800eb74c0) at Objects/object.c:2211
    dealloc = 0x80170d110 <PyCursesWindow_Dealloc>
    #4 0x00000000004578d8 in PyDict_DelItem (op=0x800f121b0, key=0x8011062e0)
    at Objects/dictobject.c:829
    mp = (PyDictObject *) 0x800f121b0
    hash = -3668919459648339544
    ep = (PyDictEntry *) 0x8010cb5a8
    old_value = (PyObject *) 0x800eb74c0
    old_key = (PyObject *) 0x8011062e0
    __func
    _ = "PyDict_DelItem"
    #5 0x0000000000458a48 in dict_ass_sub (mp=0x800f121b0, v=0x8011062e0, w=0x0)
    ---Type <return> to continue, or q <return> to quit---
    at Objects/dictobject.c:1184
    No locals.
    #6 0x000000000041aadd in PyObject_DelItem (o=0x800f121b0, key=0x8011062e0)
    at Objects/abstract.c:205
    m = (PyMappingMethods *) 0x6c2960

    @akuchling
    Copy link
    Member

    Could I get a login on the buildbot to make a fix?

    I bet the problem is with the stdscr object. PyCurses_InitScr()
    does 'return (PyObject *)PyCursesWindow_New(stdscr);'.

    PyCursesWindow_Dealloc() does:
    if (wo->win != stdscr) delwin(wo->win);

    I bet FreeBSD is clearing contents of the stdscr global variable. The condition in PyCursesWindow_Dealloc() is then true, and it tries to delwin() the old value, which is in wo->win.

    One fix might be to keep a reference to that PyCursesWindow object holding stdscr, and change dealloc to 'if (wo != saved_stdscr_object)'. Or maybe, since multiple calls to initscr() will create multiple window objects holding the value of stdscr, window objects should have a 'do_not_delwin' flag.

    @akuchling
    Copy link
    Member

    Here's a possible patch; it at least doesn't seem to break the module on MacOS, though MacOS doesn't crash with the current code either.

    @mdickinson
    Copy link
    Member Author

    Could I get a login on the buildbot to make a fix?

    I think David Bolen (db3l) is the maintainer. David?

    @mdickinson
    Copy link
    Member Author

    Here's a possible patch

    Thanks. I'll give it a try on my FreeBSD VM and report back.
    BTW, did you mean to include the threading change in that patch?

    @mdickinson
    Copy link
    Member Author

    With that patch, I'm still getting the core dump (with the traceback looking pretty much as it did before).

    When I traced through this with gdb, I didn't see stdscr getting set to 0 at any point. Unless I missed any, the only curses library calls made (in sequence) were:

    1. initscr() -> new window win (=stdscr, presumably)
    2. putwin(file, win)
    3. getwin(file) -> new window win2, with win2 != win
    4. freewin(win2) -> segfault
      ---
      and presumably without the segfault, there would have been calls
      to freewin(win) and endwin() too.

    And I'm at a complete loss to explain why importing rlcompleter makes a difference. (importing readline also causes the segfault). I don't think it's just to do with random memory changes, since if I replace the readline or rlcompleter import by any other randomly chosen python module then there's no segfault.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented Apr 16, 2010

    For the record, this happens on FreeBSD 8 as well.

    It seems it is still the same bug as what I reported back in March 2009 on the Python-dev list.

    If you run the test stand-alone with ./python Lib/test/regrtest.py -uall test_curses it passes and prints "1 test OK".

    If you add something like test__all__ before it it will crash with a SIGSEGV: segmentation fault (core dumped).

    Mark's condensed test case switches to a SIGBUS, which is a bit different.

    Mark, did your initial backtrace look like this:

    #0 0x282e115e in memcpy () from /lib/libc.so.7
    #1 0x282de375 in fwrite () from /lib/libc.so.7
    #2 0x282de132 in fwrite () from /lib/libc.so.7
    #3 0x28b7a1ca in putwin (win=0x28409640, filep=0x282f39f8)
    at /newusr/src/lib/ncurses/ncursesw/../../../contrib/ncurses/ncurses/base/lib_screen.c:132
    #4 0x28d9b361 in PyCursesWindow_PutWin (self=0x28442ef0, args=0x2867f80c)
    at /home/asmodai/projects/python/Modules/_cursesmodule.c:1351
    #5 0x080da60d in PyEval_EvalFrameEx (f=0x296d760c, throwflag=0)
    at Python/ceval.c:4013
    #6 0x080db10e in PyEval_EvalFrameEx (f=0x296a948c, throwflag=0)
    at Python/ceval.c:4099
    #7 0x080db10e in PyEval_EvalFrameEx (f=0x29692d8c, throwflag=0)
    at Python/ceval.c:4099
    #8 0x080dc68b in PyEval_EvalCodeEx (co=0x297675c0, globals=0x2866bbdc,
    locals=0x2866bbdc, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0,
    defcount=0, closure=0x0) at Python/ceval.c:3253
    #9 0x080dc7d7 in PyEval_EvalCode (co=0x297675c0, globals=0x2866bbdc,
    locals=0x2866bbdc) at Python/ceval.c:666
    #10 0x080ef70c in PyImport_ExecCodeModuleEx (
    name=0xbfbfd683 "test.test_curses", co=0x297675c0,
    pathname=0xbfbfd223 "/home/asmodai/projects/python/Lib/test/test_curses.py")

    @mdickinson
    Copy link
    Member Author

    Mark, did your initial backtrace look like this:

    No; the segfault was definitely happening in delwin rather than putwin. But I did see something like your backtrace when I tried to use ncurses from ports (installed in /usr/local) rather than the system ncurses. This was all on FreeBSD 8.0/amd64, by the way, running in a VM on Parallels. I got the same results both when working directly within the VM terminal, and when ssh'ing to the VM from an OS X Terminal.

    Maybe running this through Valgrind or something similar might show what's going on. (Though it's not clear from a quick google whether Valgrind works on FreeBSD.)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 16, 2010

    Valgrind can be installed by:

    cd /usr/ports/devel/valgrind && make install

    Then you can do (curses_test.py is your short test program):

    1. valgrind --db-attach=yes --suppressions=Misc/valgrind-python.supp ./python curses_test.py

    2. valgrind --suppressions=Misc/valgrind-python.supp ./python curses_test.py

    Valgrind finds invalid writes. The problem with 1) is that the
    terminal is in an unusable state, so controlling gdb isn't possible.

    The best thing is probably to use 2) and wade through the unformatted
    output starting here:

    ==12043== Invalid write of size 8
    ==12043== at 0x27A71B7: getwin (in/li /libncursesw.so.8) ==12043== by 0x2A3EAAB: PyCurses_GetWin (_cursesmodule.c:1902)
    ==12043== by 0x4573FB: PyEval_EvalFrameEx (ceval.c:3833)
    ==12043== by 0x457DF9: PyEval_EvalCodeEx (ceval.c:3282)

    (I don't have time to do that right now, I might do it later.)

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 17, 2010

    One oddity: In Mark's test case, the error only shows if readline
    is imported _before_ curses. The other way around it's fine.

    On FreeBSD 8.0 amd64, with the _default_ libcurses, the Valgrind output
    for py3k looks like this:

    [...]
    ==31089== Invalid write of size 8
    ==31089== at 0x284F1AE: getwin (in /lib/libncursesw.so.8)
    ==31089== by 0x2AE8532: PyCurses_GetWin (_cursesmodule.c:1903)
    ==31089== by 0x47FBC7: call_function (ceval.c:3833)
    ==31089== by 0x47AAC0: PyEval_EvalFrameEx (ceval.c:2645)
    ==31089== by 0x47DF41: PyEval_EvalCodeEx (ceval.c:3282)
    ==31089== by 0x47189F: PyEval_EvalCode (ceval.c:721)
    ==31089== by 0x4B31AA: run_mod (pythonrun.c:1692)
    ==31089== by 0x4B2FC3: PyRun_FileExFlags (pythonrun.c:1649)
    ==31089== by 0x4B1734: PyRun_SimpleFileExFlags (pythonrun.c:1177)
    ==31089== by 0x4B0C75: PyRun_AnyFileExFlags (pythonrun.c:963)
    ==31089== by 0x4CB029: Py_Main (main.c:650)
    ==31089== by 0x4150E4: main (python.c:152)
    ==31089== Address 0x25c71e0 is 0 bytes after a block of size 112 alloc'd
    ==31089== at 0x25A8AE: calloc (in /usr/local/lib/valgrind/vgpreload_memcheck-amd64-freebsd.so)
    ==31089== by 0x29C518A: _nc_makenew (in /lib/libncurses.so.8)
    ==31089== by 0x29C569F: newwin (in /lib/libncurses.so.8)
    ==31089== by 0x284F2EE: getwin (in /lib/libncursesw.so.8)
    ==31089== by 0x2AE8532: PyCurses_GetWin (_cursesmodule.c:1903)
    ==31089== by 0x47FBC7: call_function (ceval.c:3833)
    ==31089== by 0x47AAC0: PyEval_EvalFrameEx (ceval.c:2645)
    ==31089== by 0x47DF41: PyEval_EvalCodeEx (ceval.c:3282)
    ==31089== by 0x47189F: PyEval_EvalCode (ceval.c:721)
    ==31089== by 0x4B31AA: run_mod (pythonrun.c:1692)
    ==31089== by 0x4B2FC3: PyRun_FileExFlags (pythonrun.c:1649)
    ==31089== by 0x4B1734: PyRun_SimpleFileExFlags (pythonrun.c:1177)
    ==31089==
    [...]

    Then I installed the curses from /usr/ports/devel/ncurses, and the
    error didn't show up any more. I'm inclined to think that the bug is
    in the system ncurses. Still, it would be nice to know why the import
    order matters.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 17, 2010

    I take that back. With the curses from /usr/ports/devel/ncurses,
    Mark's test case is fine, but

    ./python Lib/test/regrtest.py -uall test_curses

    fails again.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 17, 2010

    Alas, after installing curses from /usr/ports/devel/ncurses I did not
    recompile Modules/_curses_panel.c.

    So, after a proper build

    ./python Lib/test/regrtest.py -uall test_curses

    shows no errors.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 17, 2010

    It seems that FreeBSD has problems with the fact that readline.so is
    linked with -lreadline and -lncursesw (why?).

    With bpo-7384.patch I get no more errors using either Mark's test case
    or test_curses.py.

    @mdickinson
    Copy link
    Member Author

    That patch works for me, too. Nice!

    It seems that FreeBSD has problems with the fact that readline.so is
    linked with -lreadline and -lncursesw (why?).

    Good question...

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 18, 2010

    To clarify a couple of things:

    On some systems (Redhat?) readline is not linked against ncurses in order to give the user the possibility to choose. This is why setup.py
    has to select an ncurses version.

    However, things can go wrong if readline is already linked against
    a specific ncurses version. On FreeBSD-8.0 this version is ncurses,
    but setup.py selects ncursesw:

    stefan@freebsd-amd64:> ldd /lib/libreadline.so.8
    /lib/libreadline.so.8:
    libncurses.so.8 => /lib/libncurses.so.8 (0x800b3e000)
    libc.so.7 => /lib/libc.so.7 (0x800648000)
    stefan@freebsd-amd64:
    > ls /lib/libncurses*
    /lib/libncurses.so.8 /lib/libncursesw.so.8

    bpo-7384.patch suppresses the selection, but is a little primitive.

    I've created a new patch, which does the following:

    1. Detect if readline is already linked against ncurses and
      if so, skip any further selection. This must be done.

    2. Use the same version of ncurses for readline.so and _curses.so.

    I'm not sure if 2) is necessary. With the previous patch, readline.so
    was linked against ncurses and _curses.so against ncursesw. All tests
    were passed though.

    Any thoughts whether readline.so and _curses.so should link against
    the same curses library?

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented Apr 18, 2010

    Just to state the obvious: ncursesw is needed for wide character support (i.e. Unicode).

    Also, have you tried asking Thomas Dickey (dickey@invisible-island.net) about this? He might be able to give some clue about it since he's the main curses maintainer.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 21, 2010

    Jeroen, thanks for the idea. I asked Thomas Dickey and he said that
    one should not load both libncurses.so and libncursesw.so.

    I think this means that if libreadline.so is already linked against
    libncurses.so, we are stuck with libncurses.so for the curses module.

    If this affects users who want the wide character version, they could
    file a bug report with their distro:

    Thomas Dickey pointed out that there are two ways for a distro to
    deal with this problem:

    1. Link libreadline against ncursesw.

    2. Split out the termcap interface (which readline uses) as
      libtinfo. This is a configure option for ncurses and SuSE
      and Redhat are doing this.

    I'm attaching a new patch against py3k that makes sure that the
    readline and curses modules use the same curses library.

    (This does not apply to Darwin, but I don't want to touch that logic.)

    I'm going to test the patch on py3k-cdecimal to see if it works on
    the buildbots.

    @mdickinson
    Copy link
    Member Author

    This patch looks good to me, assuming that the buildbots are happy. I agree that this seems like a sensible solution for now, even if it means limiting users to ncurses rather than ncursesw.

    I was initially a bit surprised that it works on OS X, since OS X doesn't have 'ldd'; but in that case the os.system call simply outputs "sh: ldd: command not found" to stderr and (presumably) nothing to stdout; no Python exception is raised, so it's all okay. It might be worth adding code to avoid the os.system('ldd ...') call on OS X, just to avoid the unnecessary error message on the console. Apart from this, I say +1 to applying the patch.

    Many thanks for all the detective work!

    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Apr 22, 2010

    Instead to test in setup.py we could use result from configure script - just uncomment line and use it

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 23, 2010

    Mark, thanks for reviewing the patch. In the new patch, I added a skip
    for OS X.

    Buildbot testing looks good. In particular, one FreeBSD bot passes
    test_curses now (the other one is hanging in multiprocessing).

    For most bots nothing changes. The solaris bot has the same unrelated
    failures as before. Ubuntu sparc previously did the same weird linking
    (readline already linked with ncurses, but using -lncursesw) and now
    uses ncurses throughout. Tests pass. Debian sparc did the same, tests
    give the same failures as before ("getmouse returned ERR", almost certainly
    unrelated.)

    Roumen, I do not see a line in configure.in that tests for the
    libraries that readline is linked against.

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented Apr 23, 2010

    I did some digging on my side, the fact you see ncurses referenced from readline is due to the build linking readline to libtermcap:

    cc -fstack-protector -shared -Wl,-x -o libreadline.so.8 -Wl,-soname,libreadline.so.8 lorder readline.So vi_mode.So funmap.So keymaps.So parens.So search.So rltty.So complete.So bind.So isearch.So display.So signals.So util.So kill.So undo.So macro.So input.So callback.So terminal.So text.So nls.So misc.So compat.So xmalloc.So history.So histexpand.So histfile.So histsearch.So shell.So mbutil.So tilde.So | tsort -q -ltermcap

    And libtermcap is:

    % ll /usr/lib/libtermcap.so*
    0 lrwxr-xr-x 1 root wheel - 13B 18 apr 08:29 /usr/lib/libtermcap.so@ -> libncurses.so

    That configuration option you referenced, Stefan, is that --with-termlib (generate separate terminfo library)?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 23, 2010

    Yes, readline uses only the termcap part of ncurses. I think that
    --with-termlib is the correct option, see:

    http://www.mail-archive.com/util-linux-ng@vger.kernel.org/msg00273.html

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 23, 2010

    Actually this means that we should also look for -ltinfo in the ldd
    check (A Redhat buildbot would be nice).

    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Apr 23, 2010

    Roumen, I do not see a line in configure.in that tests for the
    libraries that readline is linked against.

    The test in configure is how to link application to readline libs.

    Platforms that support linking of shared libraries with unresolved
    symbols cannot link readline to termcap compatible library if they offer
    more then one. I think that this is the bug in package build on those
    system as this limit applications to use other termcap libraries.

    Not all linux link readline to termcap compatible library:

    • SuSe (checked on 11.0) linked to ncurses :(
    • Fedora (verified v 12) and Slackware - not linked . So no issue
      (before) on those platforms as application can link to any termcap
      compatible library and python will select ncursesw. On those platforms I
      expect Stefan patch to return empty string and python to fail to build
      readline module.

    As configure detect how to link readline we could uncomment
    READLINE_LIBS and to add as makefile macroand to use by setup.py. If
    READLINE_LIBS contain only -lreadline => on this platform readline is
    already linked to termcap compatible library.

    Also detection of dependent libraries that use ldd is limited to
    platforms that has this command, i.e. is not portable.
    If distutils support a method that return dependency libraries we could
    use. (

    I'm not familiar with python curses module to propose a patch .
    Is possible to to run sample program to detect readline curses library ?

    Or may be to try to link sample "int main() { readline(); }" and to ask
    compiler/linker to warn for duplicate symbols. Something like :
    $ gcc -Wl,--warn-common test-readline.c -lreadline -lncursesw -lncursesw
    $ gcc -Wl,--warn-common test-readline.c -lreadline -ltermcap -lncurses
    .../libncurses.so: warning: common of ospeed' overridden by larger common .../libtermcap.so: warning: larger common is here $ gcc -Wl,--warn-common test-readline.c -lreadline -ltermcap -lncursesw ..../libncursesw.so: warning: common of ospeed' overridden by larger common
    ..../../libtermcap.so: warning: larger common is here
    FIXME with more portable and more correct command.

    Roumen

    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Apr 23, 2010

    Stefan Krah wrote:

    Stefan Krah<stefan-usenet@bytereef.org> added the comment:

    Actually this means that we should also look for -ltinfo in the ldd
    check (A Redhat buildbot would be nice).

    Or may be this mean that in configure to add test with -ltinfo and if
    readline link succeed then is save to link python curses module with
    first curses library found.

    ldd - what about platforms without GNU libc ?

    Roumen

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 24, 2010

    I included the test for libtinfo in the latest patch. The patch is tested
    on Fedora and correctly links the curses module with -lncursesw.

    This means that the ldd method works on all buildbots, OpenBSD, OpenSolaris
    and Fedora.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 24, 2010

    I'm not against sorting things out in configure.in, but I'm not quite
    sure that it will be more portable than ldd:

    On FreeBSD (the problem system!) I can't get this to work:

    [stefan@freebsd-i386 ~]$ echo 'int main() { readline(); }' > test_readline.c
    [stefan@freebsd-i386 ~]$ gcc -Wl,--warn-common xxx.c -lreadline -ltermcap -lncurses -lncursesw
    [stefan@freebsd-i386 ~]$ gcc -Wl,--warn-common xxx.c -lreadline -lncurses -lncursesw
    [stefan@freebsd-i386 ~]$ gcc -Wl,--warn-common xxx.c -lreadline -lncursesw

    On OpenSolaris with suncc, ld does not have -warn-common.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 24, 2010

    Sigh. xxx.c == test_readline.c in the previous comment.

    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Apr 26, 2010

    Yes , I understand .
    For the protocol did gcc on FreeBSD warn if library order is -lncursesw
    -lreadline ?
    Forget for

    Also I'm not able to write C test case similar to python msg103231 by
    Mark Dickinson that fail on system where readline library is not linked
    to ncurses. Always program work and didn't code dump(=bus error)
    nevertheless order of ncurses (with w and without w suffix) and readline
    libraries.

    So if there is no way to write C test program that fail I could not see
    ather way to detect issue except to parse result from programs that
    output library dependencies. Also I expect this to fail for static build
    (--disable-shared).
    I'm not sure that readline library work well with static builds - but
    this is another issue and my time machine is stop working :) .

    To write script that check platform and if is freebsd, suse link with a,
    b, c if os is XX link with d, e, f will work with shared and static
    build - It is not reasonable solution :(

    P.S. Issue with readline library linked to termcap compatible library on
    system that distribute more then one termcap compatible library is about
    10 years old.

    Roumen

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 27, 2010

    Roumen Petrov <report@bugs.python.org> wrote:

    Yes , I understand .
    For the protocol did gcc on FreeBSD warn if library order is -lncursesw
    -lreadline ?

    No.

    P.S. Issue with readline library linked to termcap compatible library on
    system that distribute more then one termcap compatible library is about
    10 years old.

    I didn't want to touch the termcap logic. There's potential for breakage,
    and a real investigation would be time consuming.

    (There's a needless warning on Tiger about /usr/lib/termcap that could
    be fixed in another issue.)

    @ashemedai
    Copy link
    Mannequin

    ashemedai mannequin commented Apr 27, 2010

    Stefan, I was emailing with Rong-En Fan, a FreeBSD committer, about this issue and he asked:

    "Basically, this is caused by

    a) our readline.so is linked against ncurses.so (via -ltermcap which is the same lib)
    b) wide-character enabled ncurses, ncursesw.so, is also loaded in the same process

    To solve that, we need to have a separate termcap.so, do I understand the issue correctly?"

    He also mentioned that "[a]nother more aggressive way is to make only ncursesw installed into the system which requires a recompilation of all ports that use ncurses (ncurses and ncursesw are source compatible, but in most cases they are binary compatible as long as application don't assume size of ncurses structures)."

    Which I fully support, it's something that I did on DragonFly BSD a long time ago already (for all I can remember).

    Your opinion?

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Apr 27, 2010

    Jeroen Ruigrok van der Werven <report@bugs.python.org> wrote:

    Stefan, I was emailing with Rong-En Fan, a FreeBSD committer, about this issue and he asked:

    "Basically, this is caused by

    a) our readline.so is linked against ncurses.so (via -ltermcap which is the same lib)
    b) wide-character enabled ncurses, ncursesw.so, is also loaded in the same process

    To solve that, we need to have a separate termcap.so, do I understand the issue correctly?"

    Yes, only that the separate termcap is called libtinfo.so. The approach of
    splitting out libtinfo from ncurses (used by Fedora) is the most flexible
    and allows the user to choose ncurses or ncursesw.

    [stefan@fedora-amd64 ~]$ ldd /lib64/libreadline.so.6.0
    linux-vdso.so.1 => (0x00007fff725ff000)
    libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00000036e4a00000)
    libc.so.6 => /lib64/libc.so.6 (0x00000036d9600000)
    /lib64/ld-linux-x86-64.so.2 (0x00000036d9200000)

    +ports that use ncurses (ncurses and ncursesw are source compatible, but in most cases they are binary compatible as long as application don't
    +assume size of ncurses structures)."

    Which I fully support, it's something that I did on DragonFly BSD a long time ago already (for all I can remember).

    Your opinion?

    I think the libtinfo approach is more flexible, and I'm not aware of any drawbacks.
    So, for FreeBSD, I'd use it.

    Stefan Krah

    @vstinner
    Copy link
    Member

    I tested bpo-7384-5-py3k.patch on FreeBSD 8.0: it fixes the crash.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 3, 2010

    I think it would be nice to get this into 2.7. I don't expect buildbot
    failures, since the 2.7 patch is essentially the same as the py3k version,
    which has been tested extensively.

    @mdickinson
    Copy link
    Member Author

    I think it would be nice to get this into 2.7.

    Agreed. I think you should go ahead and commit it.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 3, 2010

    Mark, thanks. Committed in r81669; I'll keep an eye on the buildbots.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 8, 2010

    Committed in r81669,r81672,r81683 (trunk) and r81830,81831 (py3k).

    What to do with the releases? To recap, the fix is:

    1. Detect if readline is already linked against ncurses and
      if so, skip any further selection. This must be done.

    2. Use the same version of ncurses for readline.so and _curses.so.

    3. should be done in any case. 2) could change the behavior for
      users who previously had readline/ncurses, cursesmodule/ncursesw,
      but only use the cursesmodule in an application.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jun 17, 2010

    Committed a conservative version implementing part 1) in r82017 (2.6) and
    r82019 (3.1). Part 2) can be enabled by uncommenting a couple of lines in
    setup.py.

    The buildbots look good, but I'm setting this to 'pending' in case
    someone would like part 2) of the fix in the releases.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Jul 13, 2010

    These changes break building of Python 3.* in some locales in Gentoo.

    running build
    running build_ext
    Traceback (most recent call last):
      File "./setup.py", line 1812, in <module>
        main()
      File "./setup.py", line 1807, in main
        "Tools/scripts/2to3"]
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/core.py", line 152, in setup
        dist.run_commands()
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 946, in run_commands
        self.run_command(cmd)
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 965, in run_command
        cmd_obj.run()
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/command/build.py", line 127, in run
        self.run_command(cmd_name)
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/cmd.py", line 315, in run_command
        self.distribution.run_command(command)
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 965, in run_command
        cmd_obj.run()
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/command/build_ext.py", line 393, in run
        self.build_extensions()
      File "./setup.py", line 151, in build_extensions
        missing = self.detect_modules()
      File "./setup.py", line 539, in detect_modules
        for ln in fp:
      File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 20: ordinal not in range(128)
    make: *** [sharedmods] Error 1

    In lt_LT.UTF-8 locale, readline_termcap_lib file contains:
    ne dinaminis paleidžiamasis failas

    In en_US.UTF-8 locale, this file would contain:
    not a dynamic executable

    do_readline is "/usr/lib64/libreadline.so".

    /usr/lib64/libreadline.so is a linker script with the following content:
    /* GNU ld script
    Since Gentoo has critical dynamic libraries in /lib, and the static versions
    in /usr/lib, we need to have a "fake" dynamic lib in /usr/lib, otherwise we
    run into linking problems. This "fake" dynamic lib is a linker script that
    redirects the linker to the real lib. And yes, this works in the cross-
    compiling scenario as the sysroot-ed linker will prepend the real path.

    See bug http://bugs.gentoo.org/4411 for more info.
    */
    OUTPUT_FORMAT ( elf64-x86-64 )
    GROUP ( /lib64/libreadline.so.6 )

    I think that using ldd is a wrong idea.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 13, 2010

    In Ubuntu I can build just fine with lt_LT.UTF-8. So perhaps this problem
    should be addressed in Gentoo.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Jul 13, 2010

    You shouldn't use ldd. I suggest that setup.py try to link a small executable, which would use a function from libcurses and would be linked against libreadline, but not libcurses. If linking succeeds, then you libreadline is linked against libcurses. If linking fails, then repeat this procedure with libcursesw, libncurses, libncursesw, libtinfo.

    @vstinner
    Copy link
    Member

    "In lt_LT.UTF-8 locale, readline_termcap_lib file contains:
    ne dinaminis paleidžiamasis failas"

    You can run ldd without LANG variable to get the original (english, ascii only) message.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 14, 2010

    So you have garbage from stderr in readline_termcap_lib. Since that's
    useless anyway (no matter what locale is set), let's check the return
    value of os.system().

    The attached patch skips readline linkage detection if ldd fails. In
    that case, linking will be done in the same manner as before r81830.

    Please report if the patch allows you to build py3k in the problematic
    locale.

    Your method of detecting readline linkage looks interesting, but I
    doubt that I'm going to implement it: These cross platform issues
    take an *immense* amount of time, since you have to test on all
    buildbot platforms (+ OpenBSD and OpenSolaris), with different
    compilers (icc, suncc).

    If you want that done, the best way is to open another issue, submit a
    patch (probably for configure.in) _and_ do all the testing.

    @Arfrever
    Copy link
    Mannequin

    Arfrever mannequin commented Jul 15, 2010

    This patch allows to build Python 3.* in this locale.

    It might be safer to open tmpfile in binary mode to avoid potential problems with non-ASCII characters in paths to libraries.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Jul 17, 2010

    ldd return value check committed in r82927, r82928, r82929 and r82930.

    Thanks for reporting this!

    @skrah skrah mannequin closed this as completed Jul 17, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants