-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDOException thrown whenever SIGALARM occurs during a query #885
Comments
@mathieuk Thank you for this info - we will test out the repro you provided and get backto you shortly. |
Ok we can reproduce this behaviour. Do you see something similar happening with sqlsrv? |
It yields similar, yet subtly different, behaviour:
Results in a few variations of:
A few remarks/observations:
HTH |
@mathieuk please check your unixodbc version |
@mathieuk never mind. I could reproduce this by re-running the same php script in a different terminal. Will start investigating. |
@mathieuk looks like whenever SIGALRM arrives, the connection is broken, as indicated by SQLSTATE 08S01 as well as all the error messages you have listed in the sqlsrv example. When it fails to connect, the query fails to execute, so an exception is thrown. As you said, there is no exception after the signal handler is done and yields control. This is apparent when we keep trying to run the same php script again and again. Maybe it will help us understand the problem if you can provide more details? |
The first thing is about the signal handler. When SIGALRM comes in the connection is often immediately broken and an exception is thrown before the alarm handler can be called. Even if the exception is handled, the alarm handler is never called. This is confusing for the surrounding code (yay semi-concurrency). Things would be clearer if the signal handler would still be called; before the exception. Then the second thing is that the connection is completely terminated from the second SIGALRM comes in. That strikes me as odd; it seems to me that yes, the query may be interrupted and that might be unrecoverable for that query but it seems odd that the connection is completely broken at that point. Would it be possible to recover and issue new queries on the connection? |
I see your points @mathieuk |
@david-puglielli For what it's worth, I have experienced the same symptoms (random exceptions of 0x2714) when using Laravel's Horizon (Job Worker) stack. I cannot confirm it is specifically SIGALRM (or any other signal), but something is definitely at odds with the PDO SQLSrv connection inside a Laravel Worker. If I run the same code outside a worker, it works as expected and without any TCP errors. This is running with update: tagged repo dev instead :) |
After some investigation, disabling MARS seems to have reduced the likelihood of broken connections. To disable MARS, which is true by default, you need to set it to false, as shown below:
@mathieuk: we are still testing (with the latest ODBC drivers, post CTP preview), and please shed some light on how to reproduce this issue. If I see "I should shutdown" when running in a command terminal, I leave it running and launch another terminal to run the same php script. Yet @david-puglielli did it by canceling (CTRL+C) and re-ran the script in the same terminal window. |
@yitam unfortunately, adding that setting does not reduce the likelihood of receiving |
@ralphschindler when I tested with ODBC driver 17.2 the problem frequently occurred indeed but with an internal build (post CTP preview) it does seem to have improved. In the meantime, your patience is appreciated while odbc team is looking into this. |
I've updated my repo script to hopefully make things a bit clearer. It's attached below (test-pdo.php). When you run it it will eventually run into the error situation, every time you run the script. The repro script simulates a typical worker daemon like you'd see in Laravel. test-pdo.php
FWIW, it differs in that a worker in Laravel would set the alarm at the beginning of each loop instead of whenever the alarm went off. That way the last alarm would be replaced with a new one so it doesnt go off unnecessarily. In my case, I -want- it to go off unnecessarily as I want to see the effect on queries. When I run this I see two different outcomes:
In this case the alarm was successfully handled 2 times (read: we were not executing the query at that time) and then the third time it was executing the query and the exception was thrown. Note how the message "Received SIGALRM!" is absent that time. This is the worst case as now I wasn't able to clean-up after something went wrong. This can result in 'run-away' workers in the case of Laravel workers. In my case it also messes up some job housekeeping that I have to do. If I run the script a few times I'll also run into this variant:
In this case the signal handler WAS called but then the connection was completely broken. Does this help? (edited the output a bit to make it clearer in the github issue) |
Like @ralphschindler, I did not see any difference in behaviour when using |
Thanks @mathieuk for a more detailed repro scenario. It certainly helps with the investigation.
I should have mentioned that this makes a small difference but only with the internal ODBC build. Thanks for trying nonetheless. |
After some more investigation with ODBC team, we figure the culprit lies with pcntl's omission of SA_RESTART. Essentially, it is preventing SIGALRM from specifying SA_RESTART, which is necessary in order not to interrupt the ODBC driver operation. Specifying SA_RESTART (or using default from signal() which does) in a native ODBC application does not reproduce this bug, even running it overnight. FYI, please find it attached and this is a similar explanation of the bug I've modified this line (git diff shown below) and been running the pdo_sqlsrv repro script above with the latest stable version ODBC 17.2. There is no need to turn off MARS, and as of now it is still running (more than half an hour already) without any exception. I've also filed a bug report accordingly.
|
@yitam do you feel comfortable making a PR to the php project, or shall I to usher this along? |
hi @ralphschindler thanks for your reminder. Yes maybe I should make a pull request. |
👍 - let me know if you need any assistance :) |
Thanks @ralphschindler ! Done, please check PR 3717 |
Yes, can you think of a test case without using our PDO driver? |
@yitam I've tried the change shown in php/php-src#3717 and I have the basis for a test that you could use. For this comment I'm using http://man7.org/linux/man-pages/man7/signal.7.html as a reference. So I think what you're saying is that SIGALRM is triggering a EINTR return value for a It seemed to me that With your change, the function call is interrupted but it appears my signal handler is never called. This means that, while there were no exceptions from PDO_SQLSRV, other functionality may now be broken. If the restarted That behaviour seems to come from So, with None of these things are happening with SA_RESTART enabled because as soon as the signal handler returns (that is: when ext/pcntl/pcntl.c So now we have two issues. One might say the PCTNL way of calling signal handlers is flawed as they get called outside of the actual signal handler rendering SA_RESTART pretty useless? OTOH one might say the ODBC driver should just handle EINTR properly (retrying for EINTR instead of erroring out). What do you think ? Here is the flock-based test script I used:
|
Thanks @mathieuk for your detailed test scenario and script. I haven't had time to try it out yet, but as you can see in my earlier reply, there wasn't any issue using signal / alarm in the native ODBC program.
Yes that is exactly the case. My patch was merely an attempt to show that PCTNL way of calling signal handlers is flawed. The third argument does not make any difference with SIGALRM. For this reason we quoted the pgsql fix to justify why we should consider the following:
According to the above, I was tempted to make a drastic change like this (which I haven't tested):
That being said, to make the 3rd argument 'restart_syscalls' to pcntl_signal() false by default, that is, changing the API, might break others' existing scripts. As soon as I have time I will try your test case and/or continue some more testing. |
I also tried with async_signals=false and using The bigger fix indeed seems more appropriate but I totally understand your hesitation. I think it should be less of an issue when the argument That leaves us with PCNTL potentially not being able to (timely) call the userland signal handler if the primitive used is taking a long time or just blocks forever when SA_RESTART is enabled. I'm not sure what options there are for changing/improving that. I think we should either update your bug report to include the above; or maybe report a new one regarding the things I described above? |
hi @mathieuk
I checked your script and not exactly sure what you're trying to do:
I modified your script slightly as follows,
And I got the following output:
Makes no difference whether I made the new change or not:
|
@yitam you had to run 2 instances (one with 'alarm' as argument) at the same time for that to work. I've changed it into a new, single, script that tests 3 scenarios. It builds on the fact that the signal handler actually doesn't get called with SA_RESTART and flock(). You can find it here: https://gist.github.com/mathieuk/9495b76da54ead09297deb62915d4f46 Output is as follows:
Atleast it tests expected behaviour. |
@yitam judging by the comments of Nikic Popov it appears that PCNTL won't be able to work in such a way that SA_RESTART will be actually useful in my initial stated problem. With the SA_RESTART flag I cannot trust on my signal handler being called. In that situation not using SA_RESTART would actually be better as I can atleast catch the exception, reconnect and do my housekeeping then. It's not pretty but it works. It seems that while PGSQL (like you referenced) uses SA_RESTART MySQL actually fully implemented EINTR. Would it be possible to post handling EINTR as a feature request with the ODBC team? |
Thanks @mathieuk your latest script makes sense. However, when running a quick test without my patch (reverted to the original php way of handling alarm signal), my output is different from yours:
|
Can't seem to reproduce @yitam. What distro/versions are you using?
|
I think my env was a bit messed up @mathieuk |
@mathieuk we will leave this issue open and continue monitoring PHP issue 3742 and bug 77335 |
Tested the fix (see PHP issue 3742 for the detailed discussions) in PHP 7.4.0RC4 today. No more exception thrown from ODBC driver if the optional argument
to
Thus, I'm closing this issue now. |
+## PHP Driver version or file name
+## SQL Server version
SQL2016
+## Client operating system
CentOS 7
+## PHP version
PHP 7.2.11
+## Microsoft ODBC Driver version
5.3.0
+## Problem description
When you use PCNTL to protect execution time of a worker like with Laravel's Job Workers and a signal arrives (for instance SIGALRM) queries currently running will throw PDOExceptions:
This prevents the alarm handler from actually being called.
+## Expected behavior and actual behavior
I think I understand why this throws an exception. The whole call is ending up in an inconsistent state as you don't know what will happen in the signal handler function. It doesn't make sense to think the current query will successfully finish even if the signal handler doesn't do anything bad and yields control quickly enough.
I'm not entirely sure what should happen besides that I would expect the signal handler to be called; perhaps before the exception is thrown.
+## Repro code or steps to reproduce
Minimized example, obviously not real life code :)
The text was updated successfully, but these errors were encountered: