New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault in _zval_dtor_func when PHP built with ZTS #155
Conversation
A bunch of test in the PHP testsuite fail with an error message "tsrm_ls undefined", if PHP is compiled with zts enabled. The proposed change allows those tests to pass.
@ojwb I presume this is okay to merge in |
I had already seen this, but wanted to take a look into it rather than just applying it (unless 3.0.1 is imminent). The patch looks OK (aside from the unrelated whitespace change) and should work, but I suspect it's easy to just pass the ZTS context info through from the caller, which should be more efficient than using I'm slightly curious why this wasn't spotted before - ZTS is rarely used except on Windows (which is why we get issues with missing things like this), but we had a ZTS fix not that long ago I think. I guess that person didn't run the testsuite, and so only fixed the issues they saw in their own wrappers. |
Le 30/03/2014 05:48, Olly Betts a écrit :
You may not have seen my original post to swig-devel (sorry if I used the Some info: Regards |
The issue with TSRMLS_FETCH() is that it's relatively expensive, so when we already have the information it returns, it's much better to instead use TSRMLS_CC, TSRMLS_DC, etc. In a ZTS build, this just results in passing an extra void*** parameter to the function (and in a non-ZTS build, these macros expand to nothing). I've committed a fix (and also eliminated most of the existing places which call TSRMLS_FETCH()) - please can you retry with the latest git master? |
Here's a thread about why you want to avoid calling http://thread.gmane.org/gmane.comp.php.devel/59197 The only remaining uses in SWIG are in the |
Le 02/04/2014 12:43, Olly Betts a écrit :
There are quite a few other issues: some with the director tests, which I have not looked at:checking php testcase director_abstract (with run test) make[1]: *** [director_basic.cpptest] Error 2And similar errors for director_classic, director_enum Then there are segfaults for several tests. I have much studied the Pierre |
At least part of that is that just adding the TSRMLS_DC parameter to the end of the parameter list, which doesn't work when a constructor of a director class has default parameters - working on a fix for that. |
OK, I've pushed a fix for that, which should fix many if not all of the compilation errors above - can you please retest? I see a segfault for director_basic on Debian unstable with a non-ZTS build, and have for some time but haven't managed to get to the bottom of it. It only seems to happen with PHP >= 5.4. With the fix above, can you let me know which tests are segfaulting? |
Le 02/04/2014 23:43, Olly Betts a écrit :
The "test-suite/director_thread" has a more explicit error (and ends up
Coming to the variables test. If I suppress the 'char *strvar' variable,
and I run:
I get the segfault. Here is the gdb backtrace (not that PHP was compiled
Regards |
I'm dubious that director_thread is necessarily valid for all supported languages, as it assumes it's OK to call back into the interpreter from a different thread. It looks like it was originally Python only, where I assume this is OK, but I suspect those errors are because this isn't supported by PHP. I don't know enough about PHP internals to know what's going on with the segfault in Thanks for the reduced testcase though. |
Bump... Is anyone going to fix this patch? |
I can disable director_thread for PHP if you want. Or make it Python-only again - it seems dubious to expect it to work in general. The other issue only seems to affect a ZTS build of PHP, which is only really used on Windows. I don't use Windows, so it's not something I can easily investigate. It might be related to the segfault we see with recent PHP (I'd expect someone would have reported this before if it had been broken since director support was added). |
The director_thread looks generic enough to me for all languages. Peraps only run the test if you are sure threading works (such as when ZTS is not enabled). From what I can make out here though, director_thread isn't the main problem, it is the seg faults in the 20 odd testcases that @pierre-labastie mentioned and the 'char *' variables. BTW, I'd like to change this from a patch to an issue, but I can't see how to do this in Github :( |
@pierre-labastie Do you still see these segfaults with SWIG trunk? There was a fix for segfaults with directors in PHP >= 5.4 in 0dd7b61, and you reported using 5.5 in an earlier comment, so I wonder if that fix has also solved the remainder of this issue. |
BTW, you should only need to apply the single character change to Lib/php/phprun.swg from that commit to test, if that's simpler than building the latest version from git. |
Le 16/10/2014 02:08, Olly Betts a écrit :
Pierre |
@pierre-labastie Did you get a chance to re-test? |
Le 09/01/2015 01:50, Olly Betts a écrit :
Pierre |
It would be better to report those as separate issues, especially as others maintain those parts of SWIG. |
Le 11/01/2015 12:20, Olly Betts a écrit :
I have reported to swig-devel list. Do you think there is a better place? Pierre |
Opening a ticket is better for reporting a bug. |
Le 09/01/2015 01:50, Olly Betts a écrit :
While I was with SWIG-3.0.3, I recompiled php with zts support. PHP version is Here is what I get: That is, only the "extend" examples, and a few "director" tests in the test Pierre |
Thanks for retesting. Comparing with the list of failing tests from above, these are now passing (so some progress):
Leaving the following failures:
I notice that all the tests which still fail make use of subclassing via directors (though some passing tests do too). Rereading the discussion, it seems the originally proposed patch made all the test pass, or is that no longer true with newer versions of PHP? While avoiding calling |
Le 11/01/2015 22:04, Olly Betts a écrit :
What I have been able to investigate with gdb, is that segfaulting occurs Pierre |
I think I've found the problem - the upcall check is currently being performed on a Can you test with SWIG commit 682b4dd? |
682b4dd breaks the PHP Travis test case called 'director_finalizer' - https://travis-ci.org/swig/swig/jobs/46731933 |
I added some tracing code and this isn't a bug in my change, but a bug uncovered by my change.
So that last I'll see if I can work out what's up. Reverting seems a very unappealing option. |
Le 12/01/2015 01:57, Olly Betts a écrit :
checking php testcase director_finalizer (with run test) make[2]: *** [php_run] Segmentation faultWith ZTS: $ make check-php-test-suite [...] make[2]: *** [php_run] Segmentation faultThe good new is that director_abstract, director_nested and director_string now pass. Pierre |
The regression with testcase default_args has been introduced by:
|
@pierre-labastie Can you check you have a stable build as my local testing and Travis tests at https://travis-ci.org/swig/swig/jobs/46824614 don't show the extend example segfault, nor the director_xxxx tests. Travis does show the default_args breakage (which is down to additional tests for issue #294) and so isn't a regression, but I'd like to see it fixed so the test-suite passes for the release due out now. The most concerning thing is the segfault in director_finalizer. @ojwb, could you look at that please and I'll figure out a fix for the default_args testcase. |
Mmm, seems the new default arg tests aimed at testing Python default arg handling have highlighted shortcomings in the PHP default arg handling. There are quite a few problems, I'm wondering if the approach taken for Python coudl be re-used for PHP. @ojwb, I have you don't mind, but I'm going to leave this for you and disable the test for PHP. |
@wsfulton The extend example segfault, and the director_xxxx tests segfaults all occur when using PHP built with ZTS. I do not know whether this may be called a "stable" build, but this is how this thread started. The reasons why we used ZTS are in my above comment of March 30, 2014. Some comments above seem to imply that the ZTS is not the main target, but @ojwb asked to retest, and this is what I am doing now. |
SWIG ought to work with PHP built with ZTS enabled, but generally ZTS is only enabled on Windows which isn't a platform I can test on, and we have had the occasional regression, such as this. The PHP API has these macros which expand to pass the ZTS pointer around in a ZTS build, and expand to nothing otherwise, so if the use of these macros in a patch is wrong, you don't spot it until you build for ZTS. |
@wsfulton I've fixed the default_args issue, so no need to disable that testcase. |
@wsfulton And I'm already looking at the director_finalizer testcase. I'd like to better understand what's going on before we decide to just revert it, as I think we've uncovered a deeper problem from what I've seen so far. |
@pierre-labastie sorry, I read your post too quickly, when ZTS is not enabled, you have the same results as Travis. Is it worth finding or building an Ubuntu PPA with ZTS enabled and using this in the future on Travis? @ojwb Thanks for the default fix. @ojwb I'm more interested in releasing 3.0.4 asap without regressions than fixing bugs. If you can't fix in the next 18 hours or so, then I suggest we revert for the release and then you can apply a fixed patch at your leisure for 3.0.5. |
If you can find a ZTS build for automatically testing with, that'd be very handy. I've contemplated building a custom version to test with, but PHP is a bit of a beast to build, and getting a version on all the machines I work on SWIG on isn't an appealing task. The issue with reverting is really what we revert to that's actually a better state than where we are now. It's not as simple as that 682b4dd regressed us and pre-682b4dd8 is unregressed - there are different problems at different points in the increasinly epic saga that is this ticket. Just reverting 682b4dd leaves us invoking undefined behaviour every time we call a director method. It seems in a non-ZTS build we may get away with this, but that's at best brittle, and could be causing problems that just haven't been reported yet. We could revert to an earlier state, for example before any of the fixes related to this ticket, since the ZTS build is already broken with 682b4dd reverted anyway. I'd really like to understand what's going on to inform the decision about what is most sensible to do for 3.0.5, as the fix could easily be a change as small as the revert, and it would be a shame to release 3.0.4 in 18 hours and in 19 hours time to discover this is a serious problem than needs us to release 3.0.5 next week. |
OK, 0dd685b fixes the crash in director_finalizer in a simple way (just make that method static, since it only needs the object context for the ZTS pointer, and we can just pass that in from the caller). I think there's something screwy going on, as the object destruction seems to happen too early (compared to when I'd expect it to happen, and to when it happens under Python). I'm dubious about how disowning is implemented for PHP (it's nothing like the implementation for Python or Perl, but instead does stuff with the "newobject" flag), but attempting to reimplement that machinery for 3.0.4 seems extremely foolish. With this patch, we aren't calling a method on a NULL pointer or a deleted object, making it better than the situation either before or after 682b4dd. I'm hopeful this will pass the testsuite for a ZTS build too, but that has been broken for some time, so I don't think holding up 3.0.4 to test that is worthwhile. |
About building PHP, with or without ZTS, I give below the instructions I use to build it in /usr/local (so no interference with system PHP). You then just have to change your PATH so that it begins with /usr/local/bin.
If you do not want ZTS, just suppress the last line. I you want debugging symbols for using gdb, add --enable-debug |
@ojwb Great job: the last commits you have done allow all the tests to pass with ZTS enabled, except the director_thread one (all the tests pass without ZTS). |
How to build it isn't the issue - it's that it's a slow build (https://buildd.debian.org/status/logs.php?pkg=php5&arch=amd64 suggests ~30-90 minutes, though I bet it's longer on my netbook), plus I commonly work on SWIG on at least 4 machines and would need to rebuild a newer version periodically. I don't really want most of my SWIG hacking time to be spent building custom versions of PHP. I take it that the director_thread testcase still fail like this:
This error reads to me that PHP has noted down the current thread id when creating a data structure and then fails when it isn't the same on a later access. As I said above already, we're assuming that one can call back into the target language interpreter from a different thread, which isn't necessarily going to work for all target languages. I don't see that we can fix this - it just seems it inherently isn't supported by PHP (but in a non-ZTS build, it presumably doesn't bother to check thread_id in this way). Having a known failure like this isn't useful, so I've disabled this runme in ecf3ab5. Which means we can at long last close this ticket. Thanks to everyone for your help and patience. |
@ojwb : With a core i5 at 2.6 GHz, It takes a couple of minutes to build PHP with the instructions above (using make -j5). I do not know what the Debian timings mean. I guess they build several flavours of PHP, and maybe a lot of modules, which are not built (and not useful for swig tests) with the intructions above. Anyway, it is up to you, and I congratulate you again for resolving the bug. |
A bunch of test in the PHP testsuite fail with an error message "tsrm_ls undefined", if PHP is compiled with zts enabled.
The proposed change allows those tests to pass.