Connection problems with 6.0.2 (Broken pipe, Redis server went away...) #2437

potsky · 2024-01-21T12:49:59Z

Expected behaviour

We would like to use version 6.0.2 as we used the 5.3.7 without connection problems.

Actual behaviour

After about 10 hours of operation, we start getting these kind of errors when hundreds of writes are done in a foreach loop on a single backend server:

Redis server went away
Redis::exec(): Send of 2504 bytes failed with errno=32 Broken pipe
Redis::hMset(): Send of 1431 bytes failed with errno=32 Broken pipe
...

We run about 30k jobs by day with Laravel Horizon, so we had hundreds of thousands exceptions with 6.0.2 in just a few days.

Found workaround

Restarting our web servers resolves temporarily the problem.
We rolled back on Scalingo PaaS to phpredis 5.3.7 and everything is fine.

Current investigations

There is no error on the Redis server logs
We have checked the count of connections from our backend to Redis cluster and we stay under 100, including cluster connections, which is a far cry from the default 10k connections.
The Redis servers metrics are ok, there is no memory leak, no CPU peak, ...
This is not a network problem in the datacenter given that it works fine with 5.3.7

I'm seeing this behaviour on

Infrastructure: Scalingo PaaS
OS: Linux app-sirenergies-one-off-8185 5.4.0-121-generic # 137-Ubuntu SMP Wed Jun 15 13:33:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Redis: 7.2.3 (Image 7.2.3-1)
PHP: 8.2.14
phpredis: 6.0.2
Cluster configuration:

Steps to reproduce, backtrace or example script

I have no idea on how to reproduce this behaviour because it's not systematic. For example, when we send SMS to our customers at 8am and reboot the servers 1h before at 7am, we don't have any problems. On the other hand, when we only rebooted them the day before at 8pm, they crash at 8am the next day, even though it's the same code, the same volume, ...

phpinfo on 5.3.7

phpinfo on 6.0.2

The Redis sentinel version hase changed from 0.1 to 1.0
redis.session.early_refresh is new on 6.0.2
The default value of redis.session.lock_retries has changed from 10 to 100
The default value of redis.session.lock_wait_time has changed from 2000 to 20000

I've checked

There is no similar issue from other users
Issue isn't fixed in develop branch

The text was updated successfully, but these errors were encountered:

michael-grunder · 2024-01-21T22:03:40Z

The broken pipe (errno 32) error often happens when sending huge payloads, such as massive multi-exec or pipeline blocks, which hit kernel limits.

However, your situation appears slightly different. The trick is going to be reproducing the behavior.

Laravel horizon are long-running job queues, right? Maybe we can simulate similar activity with runners that execute the same Redis commands and same general volume? Perhaps PhpRedis 6.0.2 has a bug where we continue to try and use a socket even after it's failed.

Another option would be to run one of your jobs under something like rr. If you replicate the problem that way it would almost certainly identify exactly what's going wrong. The debugging would need to be on your side though because it would record everything including all of the payloads to and from Redis.

potsky · 2024-01-22T19:54:08Z

HI @michael-grunder,

yep Laravel Horizon is a long-running job queue tool like Sidekiq for example. The possible problem with a dead socket is an idea but how to test this? Complicated if think.

Using rr is a very good idea. We need to check this with the Scalingo team:

what kind ov VM do they use because it seems to me that rr did not work on all VM hosts
how to get debug logs because given that we run on ephemeral servers, the local disk is not accessible

To be continued...

potsky mentioned this issue Jan 21, 2024

How to install a specific version of phpredis? Scalingo/php-buildpack#395

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection problems with 6.0.2 (Broken pipe, Redis server went away...) #2437

Connection problems with 6.0.2 (Broken pipe, Redis server went away...) #2437

potsky commented Jan 21, 2024

michael-grunder commented Jan 21, 2024

potsky commented Jan 22, 2024

Connection problems with 6.0.2 (Broken pipe, Redis server went away...) #2437

Connection problems with 6.0.2 (Broken pipe, Redis server went away...) #2437

Comments

potsky commented Jan 21, 2024

Expected behaviour

Actual behaviour

Found workaround

Current investigations

I'm seeing this behaviour on

Steps to reproduce, backtrace or example script

I've checked

michael-grunder commented Jan 21, 2024

potsky commented Jan 22, 2024