New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
curses crash on FreeBSD #51633
Comments
test_curses is currently causing the test runs to abort on the FreeBSD 6.4 I can reproduce this on a FreeBSD 7.2 /amd64 machine by doing ./python Lib/test/regrtest.py -uall test___all__ test_curses This dumps core, and the traceback points at the call to delwin() in test_curses by itself doesn't crash, unless I add an 'import readline' or I expect to have access to the FreeBSD machine for a few more days. Any |
I've not had any success tracking the cause of this failure down, and no longer have the resources to do so. It does appear that curses itself is broken on FreeBSD: it's not just a problem with the tests. Adding Andrew Kuchling to the nosy in case he has any ideas what's wrong here. Since the test_curses crash is currently aborting the test run, and so preventing us from getting feedback from the other tests on the FreeBSD buildbots, I propose that test_curses be skipped with a "the curses module is broken on FreeBSD" message. |
Given your diagnosis so far, +1 on the skip. |
Rereading this, it doesn't say what I meant it to say: I meant that the Python curses module seems to be broken, not that the system-level curses library is broken (though that seems possible too). |
Applied the test_curses skip in r78281 (trunk); will merge to the other branches. Leaving this issue open, since the root cause isn't fixed. |
Merged to the other 3 branches in revisions r78282 (release26-maint), r78283 (py3k), r78284 (release31-maint). |
I'm looking at this again, after installing FreeBSD 8.0/amd64 in a VM. I've reduced Lib/test/test_curses.py to the following 9 lines: import rlcompleter
import curses
f = open('mytempfile', 'w+b')
stdscr = curses.initscr()
stdscr.putwin(f)
f.seek(0)
curses.getwin(f)
f.close()
curses.endwin() I then get: $ ./python Lib/test/regrtest.py test_curses
test_curses
Bus error (core dumped) From looking at the core dump, and tracing through with gdb, the core dump occurs when delwin is called (from PyCursesWindow_Dealloc) on the result of curses.getwin(f), as a result of garbage collection. The 'import rlcompleter' line appears to be necessary to cause this; I've no idea why. |
Here's the top of the backtrace. (Thanks asmodai for helping me out with working out how to build a FreeBSD system ncurses with debugging information.) #0 0x0000000801460714 in cannot_delete (win=0x80116b1d0) |
Could I get a login on the buildbot to make a fix? I bet the problem is with the stdscr object. PyCurses_InitScr() PyCursesWindow_Dealloc() does: I bet FreeBSD is clearing contents of the stdscr global variable. The condition in PyCursesWindow_Dealloc() is then true, and it tries to delwin() the old value, which is in wo->win. One fix might be to keep a reference to that PyCursesWindow object holding stdscr, and change dealloc to 'if (wo != saved_stdscr_object)'. Or maybe, since multiple calls to initscr() will create multiple window objects holding the value of stdscr, window objects should have a 'do_not_delwin' flag. |
Here's a possible patch; it at least doesn't seem to break the module on MacOS, though MacOS doesn't crash with the current code either. |
I think David Bolen (db3l) is the maintainer. David? |
Thanks. I'll give it a try on my FreeBSD VM and report back. |
With that patch, I'm still getting the core dump (with the traceback looking pretty much as it did before). When I traced through this with gdb, I didn't see stdscr getting set to 0 at any point. Unless I missed any, the only curses library calls made (in sequence) were:
And I'm at a complete loss to explain why importing rlcompleter makes a difference. (importing readline also causes the segfault). I don't think it's just to do with random memory changes, since if I replace the readline or rlcompleter import by any other randomly chosen python module then there's no segfault. |
For the record, this happens on FreeBSD 8 as well. It seems it is still the same bug as what I reported back in March 2009 on the Python-dev list. If you run the test stand-alone with ./python Lib/test/regrtest.py -uall test_curses it passes and prints "1 test OK". If you add something like test__all__ before it it will crash with a SIGSEGV: segmentation fault (core dumped). Mark's condensed test case switches to a SIGBUS, which is a bit different. Mark, did your initial backtrace look like this: #0 0x282e115e in memcpy () from /lib/libc.so.7 |
No; the segfault was definitely happening in delwin rather than putwin. But I did see something like your backtrace when I tried to use ncurses from ports (installed in /usr/local) rather than the system ncurses. This was all on FreeBSD 8.0/amd64, by the way, running in a VM on Parallels. I got the same results both when working directly within the VM terminal, and when ssh'ing to the VM from an OS X Terminal. Maybe running this through Valgrind or something similar might show what's going on. (Though it's not clear from a quick google whether Valgrind works on FreeBSD.) |
Valgrind can be installed by: cd /usr/ports/devel/valgrind && make install Then you can do (curses_test.py is your short test program):
Valgrind finds invalid writes. The problem with 1) is that the The best thing is probably to use 2) and wade through the unformatted ==12043== Invalid write of size 8 (I don't have time to do that right now, I might do it later.) |
One oddity: In Mark's test case, the error only shows if readline On FreeBSD 8.0 amd64, with the _default_ libcurses, the Valgrind output [...] Then I installed the curses from /usr/ports/devel/ncurses, and the |
I take that back. With the curses from /usr/ports/devel/ncurses, ./python Lib/test/regrtest.py -uall test_curses fails again. |
Alas, after installing curses from /usr/ports/devel/ncurses I did not So, after a proper build ./python Lib/test/regrtest.py -uall test_curses shows no errors. |
It seems that FreeBSD has problems with the fact that readline.so is With bpo-7384.patch I get no more errors using either Mark's test case |
That patch works for me, too. Nice!
Good question... |
To clarify a couple of things: On some systems (Redhat?) readline is not linked against ncurses in order to give the user the possibility to choose. This is why setup.py However, things can go wrong if readline is already linked against stefan@freebsd-amd64: bpo-7384.patch suppresses the selection, but is a little primitive. I've created a new patch, which does the following:
I'm not sure if 2) is necessary. With the previous patch, readline.so Any thoughts whether readline.so and _curses.so should link against |
Just to state the obvious: ncursesw is needed for wide character support (i.e. Unicode). Also, have you tried asking Thomas Dickey (dickey@invisible-island.net) about this? He might be able to give some clue about it since he's the main curses maintainer. |
Jeroen, thanks for the idea. I asked Thomas Dickey and he said that I think this means that if libreadline.so is already linked against If this affects users who want the wide character version, they could Thomas Dickey pointed out that there are two ways for a distro to
I'm attaching a new patch against py3k that makes sure that the (This does not apply to Darwin, but I don't want to touch that logic.) I'm going to test the patch on py3k-cdecimal to see if it works on |
This patch looks good to me, assuming that the buildbots are happy. I agree that this seems like a sensible solution for now, even if it means limiting users to ncurses rather than ncursesw. I was initially a bit surprised that it works on OS X, since OS X doesn't have 'ldd'; but in that case the os.system call simply outputs "sh: ldd: command not found" to stderr and (presumably) nothing to stdout; no Python exception is raised, so it's all okay. It might be worth adding code to avoid the os.system('ldd ...') call on OS X, just to avoid the unnecessary error message on the console. Apart from this, I say +1 to applying the patch. Many thanks for all the detective work! |
Instead to test in setup.py we could use result from configure script - just uncomment line and use it |
Mark, thanks for reviewing the patch. In the new patch, I added a skip Buildbot testing looks good. In particular, one FreeBSD bot passes For most bots nothing changes. The solaris bot has the same unrelated Roumen, I do not see a line in configure.in that tests for the |
I did some digging on my side, the fact you see ncurses referenced from readline is due to the build linking readline to libtermcap: cc -fstack-protector -shared -Wl,-x -o libreadline.so.8 -Wl,-soname,libreadline.so.8 And libtermcap is: % ll /usr/lib/libtermcap.so* That configuration option you referenced, Stefan, is that --with-termlib (generate separate terminfo library)? |
Yes, readline uses only the termcap part of ncurses. I think that http://www.mail-archive.com/util-linux-ng@vger.kernel.org/msg00273.html |
Actually this means that we should also look for -ltinfo in the ldd |
The test in configure is how to link application to readline libs. Platforms that support linking of shared libraries with unresolved Not all linux link readline to termcap compatible library:
As configure detect how to link readline we could uncomment Also detection of dependent libraries that use ldd is limited to I'm not familiar with python curses module to propose a patch . Or may be to try to link sample "int main() { readline(); }" and to ask Roumen |
Stefan Krah wrote:
Or may be this mean that in configure to add test with -ltinfo and if ldd - what about platforms without GNU libc ? Roumen |
I included the test for libtinfo in the latest patch. The patch is tested This means that the ldd method works on all buildbots, OpenBSD, OpenSolaris |
I'm not against sorting things out in configure.in, but I'm not quite On FreeBSD (the problem system!) I can't get this to work: [stefan@freebsd-i386 ~]$ echo 'int main() { readline(); }' > test_readline.c On OpenSolaris with suncc, ld does not have -warn-common. |
Sigh. xxx.c == test_readline.c in the previous comment. |
Yes , I understand . Also I'm not able to write C test case similar to python msg103231 by So if there is no way to write C test program that fail I could not see To write script that check platform and if is freebsd, suse link with a, P.S. Issue with readline library linked to termcap compatible library on Roumen |
Roumen Petrov <report@bugs.python.org> wrote:
No.
I didn't want to touch the termcap logic. There's potential for breakage, (There's a needless warning on Tiger about /usr/lib/termcap that could |
Stefan, I was emailing with Rong-En Fan, a FreeBSD committer, about this issue and he asked: "Basically, this is caused by a) our readline.so is linked against ncurses.so (via -ltermcap which is the same lib) To solve that, we need to have a separate termcap.so, do I understand the issue correctly?" He also mentioned that "[a]nother more aggressive way is to make only ncursesw installed into the system which requires a recompilation of all ports that use ncurses (ncurses and ncursesw are source compatible, but in most cases they are binary compatible as long as application don't assume size of ncurses structures)." Which I fully support, it's something that I did on DragonFly BSD a long time ago already (for all I can remember). Your opinion? |
Jeroen Ruigrok van der Werven <report@bugs.python.org> wrote:
Yes, only that the separate termcap is called libtinfo.so. The approach of [stefan@fedora-amd64 ~]$ ldd /lib64/libreadline.so.6.0 +ports that use ncurses (ncurses and ncursesw are source compatible, but in most cases they are binary compatible as long as application don't
I think the libtinfo approach is more flexible, and I'm not aware of any drawbacks. Stefan Krah |
I tested bpo-7384-5-py3k.patch on FreeBSD 8.0: it fixes the crash. |
I think it would be nice to get this into 2.7. I don't expect buildbot |
Agreed. I think you should go ahead and commit it. |
Mark, thanks. Committed in r81669; I'll keep an eye on the buildbots. |
Committed in r81669,r81672,r81683 (trunk) and r81830,81831 (py3k). What to do with the releases? To recap, the fix is:
|
Committed a conservative version implementing part 1) in r82017 (2.6) and The buildbots look good, but I'm setting this to 'pending' in case |
These changes break building of Python 3.* in some locales in Gentoo. running build
running build_ext
Traceback (most recent call last):
File "./setup.py", line 1812, in <module>
main()
File "./setup.py", line 1807, in main
"Tools/scripts/2to3"]
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/core.py", line 152, in setup
dist.run_commands()
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 946, in run_commands
self.run_command(cmd)
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 965, in run_command
cmd_obj.run()
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/command/build.py", line 127, in run
self.run_command(cmd_name)
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/cmd.py", line 315, in run_command
self.distribution.run_command(command)
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/dist.py", line 965, in run_command
cmd_obj.run()
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/distutils/command/build_ext.py", line 393, in run
self.build_extensions()
File "./setup.py", line 151, in build_extensions
missing = self.detect_modules()
File "./setup.py", line 539, in detect_modules
for ln in fp:
File "/var/tmp/portage/dev-lang/python-3.2_pre20100711/work/Python-3.2_pre20100711/Lib/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 20: ordinal not in range(128)
make: *** [sharedmods] Error 1 In lt_LT.UTF-8 locale, readline_termcap_lib file contains: In en_US.UTF-8 locale, this file would contain: do_readline is "/usr/lib64/libreadline.so". /usr/lib64/libreadline.so is a linker script with the following content: See bug http://bugs.gentoo.org/4411 for more info. I think that using ldd is a wrong idea. |
In Ubuntu I can build just fine with lt_LT.UTF-8. So perhaps this problem |
You shouldn't use ldd. I suggest that setup.py try to link a small executable, which would use a function from libcurses and would be linked against libreadline, but not libcurses. If linking succeeds, then you libreadline is linked against libcurses. If linking fails, then repeat this procedure with libcursesw, libncurses, libncursesw, libtinfo. |
You can run ldd without LANG variable to get the original (english, ascii only) message. |
So you have garbage from stderr in readline_termcap_lib. Since that's The attached patch skips readline linkage detection if ldd fails. In Please report if the patch allows you to build py3k in the problematic Your method of detecting readline linkage looks interesting, but I If you want that done, the best way is to open another issue, submit a |
This patch allows to build Python 3.* in this locale. It might be safer to open tmpfile in binary mode to avoid potential problems with non-ASCII characters in paths to libraries. |
ldd return value check committed in r82927, r82928, r82929 and r82930. Thanks for reporting this! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: