-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pconnect: Operation now in progress #1881
Comments
I've never seen that error before. Generally |
Is there something I can provide you with in order to investigate? We are still having this issue occurring everyday. |
Here are some information about the execution context:
|
An exact call stack and the output for the Redis extension via either If the error is coming from PhpRedis (and not a user-land PHP wrapper) I it must be coming from Line 2283 in a09b65b
That said, I'm not sure how this is possible given that |
Here it is:
$client->pconnect('xxx.redis.cache.windows.net', '6379', 0.0, null, 0, 0.0, [
'auth' => [
0 => "***"
]
]); Exception:
|
"CLIENT LIST" command (sample):
|
I have done some investigations and there is no issue with PHP-FPM, the connections are persisted until the child process is killed. The issue is with PHP, used by our workers and launched with a command-line. |
@kanidjar try to use non-zero timeout values while connecting |
Did you mange to fix this issue? |
Sorry, forgot to give you a feedback. We are still having the issue with the PHP cli. We tried to set a non-zero timout value. No effect. We are still running under PHP 7.4. |
Hi @kanidjar nice to meet you. We have the same issue on our services. AWS infrastructure + php-fpm 7.1. We have the problem with redis but also mysql. We opened a ticket on AWS support but no solution. Did you fixed the problem on your side? |
Ive fixed the issue by not using redis for queues. Using SQS for queues eliminated this particular error, while redis still handles sessions |
+1 seeing this issue, happens only a few times a day out of hundreds of thousands of jobs. |
I am seeing this while running on Heroku, starting to happen more frequently daily. UPDATE (9/22/2021): Still happening, much more frequently now. Anyone have any solutions or ideas on this? |
This has popped up out of nowhere for me and appears to be happening more frequently. Also on Heroku. |
I've been investigating this a bit (no way of reproducing, so just walking through the code), and in the process of that, came across someone claiming this may also happen with very low timeout values. The Laravel Phpredis connector explicitly sets timeouts to Lines 2336 to 2339 in 7c7f2fa
-However, I'd imagine that multiplications of Any other ideas, @michael-grunder? I haven't been able to dig into the Linux sources enough yet to determine if there are blocking socket cases where |
Same issue with AWS ECS. We have ECS tasks connecting to redis on EC2.
Client timeout was 5 seconds (default). I have tried 10 with no success. |
Ohhhh that's interesting @Tarasovych, is that an Do you have a reproduce case? The problem there is that it opens a blocking IPv6 socket, but a non-blocking IPv4 socket. I wonder if that's a bug somewhere in PHP or the OS, given how ext-redis definitely does not set the socket to non-blocking... Maybe it's intermittent because sometimes, hostnames resolve as IPv6 first, and sometimes, as IPv4 first, and it opens a socket for both? Can you share the hostname please? |
Yes.
No( It occurs from time to time.
Subnet is used for AWS ECS doesn't have IPv6 CIDR, and AWS ECS tasks don't receive IPv6 address on startup. Anyway, the VPC has IPv6 CIDR, so VPC subnet route tables have IPv6 route.
I can't share real hostname, it's Public IPv4 DNS for EC2, if you're familiar with AWS EC2. Something like ec2-11-22-33-44.eu-west-1.compute.amazonaws.com |
Ah so you're running Redis yourself in EC2, right? I meant the EC2 hostname. I have seen this happen in the wild, with the error being thrown for an IPv6 address connection attempt, so I think we're getting closer. Can you |
You're right.
Yes, I'll take a look |
Oh nevermind. It looks like the first two connects are just DNS queries (to It doesn't close the DNS socket, and re-connects to |
@dzuelke yes, that is the reason of |
Can you |
So, a bug in PHP then, @yatsukhnenko ? I've been digging through the PHP sources a bit to see where it does the name resolution during connects, but haven't found anything... or is it happening inside The streams subsystem is royally confusing; I think setting a particular option on a socket actually causes a connect, if I'm understanding that right, so it's a bit of a mess to untangle. |
(but why would all these |
Those traces I posted above are from different ECS containers, by the way. Each run is executed on "clean" env |
I've created a gist with readable strace diff between success and failed connections, take a look on 2nd revision |
Does anyone have any suggestions for this issue? |
@kanidjar Do you have a solution to this problem in the end? |
Hm, in my case I use IP addresses to connect, instead of hostnames, so there shouldn't be a DNS lookup afaik. This error has only happened on my servers at Vultr, not on Hetzner (though the Redis server is hosted at Hetzner too), so I might give Digitalocean a shot for the next worker to see if that one will throw this error. Just to try it out. |
Small update; even with Digitalocean it happens. However, it appears to be latency related. I have a server in Sydney whilst the Redis server is in Germany. Every other location (London, Amsterdam and NYC) is fine. Latency to Sydney is the highest, so I believe this is what might cause the issue for me personally. Not sure about others. |
I've seen "Operation now in progress" errors suddenly being returned after we moved from a locally installed instance of Redis Server to a cloud managed solution like Google memorystore. No changes in code other than connection configuration. The framework/client we use does not use phpredis though. It makes a connection via stream_socket_client(), which now (very intermittently. Probably once every other day) can return false with that error description. |
We are using Laravel Horizon, and AWS ElastiCache cache.t4g.medium. The situation in the last 6 months has been as shown in the image. Despite trying numerous different configurations, I still haven't found a fix |
Here in my case when we migrated AWS to Azure the errors appeared just sometimes, but it was something related to our infra architecture + phpredis issue. But now it happens just a few times per week. But anyway, now we are 100% sure that this is an issue with phpredis. |
Same issue here. Also Laravel Horizon + Redis :( |
So in our case, we realized that the issue occurs whenever letsencrypt/certbot runs renew on an affected server. We have one such server with a cron job that runs this once a day at midnight and this is also the exact time I start seeing "operation now in progress" errors. It might be related to certbot restarting/reloading apache or something else, but it seems to affect the server's ability to make outside connections (eg. connect to a mysql server, or connect to redis) for a second or two. |
We don't use Let's Encrypt on affected servers. Our theory is that this may be a memory issue. When the available memory is low on the server or pod, the issue occurs, but we are not entirely certain. If you have any ideas, please share your experiences. P.S. We want to move to Predis in production to see if there will be any issues. |
Appears to be a latency issue in my case. I'm connecting my worker to redis server hosted in another continent over tailscale network. My ping time to the server is 170ms. I'm consistently getting this error. |
I am seeing this just on localhost (MacOS) with local redis server, php 8.3. I have added a retry if I get that message, and gets by the issue for now. |
The bug is in PhpiredisSocketConnection.php . This line is wrong $selected = socket_select($selectable, $selectable, $null, $timeoutSecs, $timeoutUSecs); You can't send the same array twice, it needs to be a copy. |
@jeffdafoe that can't be it because at least the OP is using the Redis extension, not Predis. |
PHPRedis is more complicated. If it's OK for the PHP streams calls to emit EINPROGRESS, then PHPRedis needs to handle that, as "EINPROGRESS is not an error". Predis wouldn't have that issue, as they do handle EINPROGRESS properly, but the issue is caused by a different bug. It's not uncommon for people trying to implement sockets to make mistakes that only show under certain situations, usually when the sockets can't be handled as fast as their state changes. There's a lot of room for footgun in the underlying unix implementation. |
@jeffdafoe I think you're missing a crucial detail here: |
EINPROGRESS occurs on synchronous or async sockets, it's a byproduct of the socket connect process. Regardless, both PHPRedis and Predis use async sockets. |
There is no documentation anywhere that indicates And no, PHPRedis, doesn't use non-blocking sockets (also see the very first reply on this ticket from @michael-grunder). |
Non-blocking connect, then the socket is put in blocking mode for normal tx/rx. In any case, my earlier PRedis evaluation was wrong as that code is deprecated. It now uses sockets way more similar to PHPRedis. I've encountered the issue using PRedis, which is why I ended up here. In the end, it may end up being a PHP bug, there's a ticket opened but it doesn't have much traction. Your statement about EINPROGRESS only happening on non-blocking connects is correct. And that's the issue, you have to conclude that the socket generating that error was opened non-blocking and then switched to blocking. It's just a (painful) matter of figuring out where. |
This is right. PHP sets Sets non-block here I see an early return (when I'll take a look at the pconnect logic and see if there is a simple sanity check I can make |
I'm not familiar with PHP innards, but is this also the reason why I can get this error under the different circumstances I mentioned in this previous comment of mine?
|
Possibly, but I'd need to see it happen to be sure. It could happen if it is an edge case where the socket is not restored to blocking after the connect loop. |
I keep looking at php_network_connect socket and trying to figure out what's preventing this:
Might be able to mitigate by calling php_pollfd_for again, if it's just a wrapper around select it'll return the same results if the socket ready to read hasn't been drained. |
Shouldn't the https://github.com/php/php-src/blob/65f885738defe8c723e3d1131c0eb007cb71866d/main/network.c#L365 |
Yes, it sure looks like it would. Now I really have no idea how the EINPROGRESS is getting out of that function. I wonder whether or not the socket is connected when that happens. |
When connecting to socket, it is possible to get EINTR. In such case, there should be an another attempt to connect if we are not over the timeout. The timeout should be adjusted accordingly in that case. This fixes phpredis/phpredis#1881 Closes GH-16606
When connecting to socket, it is possible to get EINTR. In such case, there should be an another attempt to connect if we are not over the timeout. The timeout should be adjusted accordingly in that case. This fixes phpredis/phpredis#1881 Closes phpGH-16606
Hi,
We are using PHPRedis with Laravel. We push jobs into Redis and our workers pop jobs from it.
Our workers are long running processes which are not supposed to be restarted.
99% of our jobs are popped successfully. However, we sometimes face the following issue when trying to establish the connection, using pconnect:
I'm seeing this behaviour on
Steps to reproduce, backtrace or example script
Very hard to reproduce since it happens only a few times per day (we process around 50K jobs per day).
Any idea of what could go wrong or a way to fix it?
The text was updated successfully, but these errors were encountered: