Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.2.0 Error while reading from socket: ('Connection closed by server.',) #1140

Closed
LucyWengCSS opened this issue Feb 26, 2019 · 55 comments
Closed

Comments

@LucyWengCSS
Copy link

Version:
Python: 3.6.7
Redis: 3.2.7 (Azure Redis)
Redis-py: 3.2.0
Django: 2.1.1

Description:
Hi Experts,

Our service met a similar issue with the issue #1127 3.1.0 causing intermittent connection closed by server error. By reviewing the whole discussion of the issue #1127, we upgraded redis-py to the version 3.2.0 and the issue has been mitigated but still happening. Due to the Azure Redis server will close the connections which are idle more than 10 mins, and the default redis-py behavior is to not close connections, recycling them when possible, could you please suggest how to avoid the exception "Redis ConnectionError: Error while reading from socket: ('Connection closed by server.',)" on our product?

Configuration settings:
{'default': {'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': 'redis://xx.x.x.xxx:6379/0',
'TIMEOUT': 60,
'OPTIONS': {'DB': 0,
'SOCKET_TIMEOUT': 120,
'SOCKET_CONNECT_TIMEOUT': 30,
'COMPRESSOR': 'django_redis.compressors.zlib.ZlibCompressor',
'IGNORE_EXCEPTIONS': True,
'REDIS_CLIENT_KWARGS': {'socket_keepalive': True},
'PASSWORD': 'xxxxxxxxxxxxxxx='}},
'cachalot': {'BACKEND': 'django_redis.cache.RedisCache',
'LOCATION': 'redis://xx.x.x.xxx:6379/1',
'TIMEOUT': 60,
'OPTIONS': {'DB': 1,
'SOCKET_TIMEOUT': 120,
'SOCKET_CONNECT_TIMEOUT': 30,
'COMPRESSOR': 'django_redis.compressors.zlib.ZlibCompressor',
'IGNORE_EXCEPTIONS': True,
'REDIS_CLIENT_KWARGS': {'socket_keepalive': True},
'PASSWORD': 'xxxxxxxxxxxxxxxxxxxxxx='}},
}

Error Message:
Redis ConnectionError: Error while reading from socket: ('Connection
closed by server.',)

Thanks a lot.

@andymccurdy
Copy link
Contributor

Hi @LucyWengCSS, thanks for the report. I'd be interested to know what selector implementation redis-py chose within your environment. The selectors are responsible for determining the health of a connection. redis-py attempts to choose a selector implementation that's most performant based on what's available in your environment.

It looks like you're running this in a web context. Are you running gunicorn or uwsgi? Do you know what worker type you're using? If you're using eventlet, there's a known issue #1136 that seems to be a problem with the eventlet implementation of select.

If you're not sure what worker type you're using or you want to dive deeper, we'd need to figure out what selector type redis-py has chosen for your environment. Running the following one-liner within your web process should tell us:

# assumes you have an a redis client instantiated as `r`
>>> r.connection_pool.get_connection('_')._selector
<redis.selector.PollSelector at 0x10f2ca1d0>

The classname above (in my case, PollSelector) is what's important. Could you let me know what selector is used on your system? Please make sure to run the one-liner above in the same context as your webserver.

@jagguli
Copy link

jagguli commented Mar 7, 2019

In [88]: r.connection_pool.get_connection('_')._selector
    ...:
Out[88]: <redis.selector.PollSelector at 0x7fc6f1d73390>

we are using uwsgi==2.0.18 with gevent==1.4.0

This is happening in our celery workers also which are using eventlet, I'll try switching to gevent so see if it's related.

@LucyWengCSS
Copy link
Author

Hi Andy,

Thanks for working on the issue.

May I ask is there any tests or information we can provide for the issue at present? Thanks again.

@alexandre-paroissien
Copy link

alexandre-paroissien commented Mar 14, 2019

I encountered the two following issues celery/kombu#1018 and celery/kombu#1019 which brought me here:

I am using:
Ubuntu 18 (Heroku-18)
Python 3.6.8 / Django 2.1.7
Celery 4.2.1
Gunicorn 19.9.0
Redis 3.2.12 (Redis To Go)
You can see more details on celery/kombu#1019
(I am not using eventlet)

I have similar exceptions happening (Connection timeout and Broken pipe) when switching from kombu 4.3.0 and redis 2.10.6 to kombu 4.4.0 and redis 3.2.0 (The environment and the other libraries remaining unchanged)

On the new redis-py 3.2.0 version here is what I get:
import os
import redis
r = redis.from_url(os.environ.get("REDISTOGO_URL"))
r.connection_pool.get_connection('_')._selector

<redis.selector.PollSelector object at 0x7f18fce04d30>

On the previous version 2.10.6 I get AttributeError: 'Connection' object has no attribute '_selector'

@thedrow
Copy link

thedrow commented Mar 18, 2019

@andymccurdy This issue is currently blocking Celery 4.3 from hitting GA.
Is there anything we can do to help?

@andymccurdy
Copy link
Contributor

@thedrow What would really help is creating a way to easily reproduce this issue :)

@thedrow
Copy link

thedrow commented Mar 19, 2019

I don't have an environment where this happens.
According to the original issue we should configure redis in the following fashion:

# Close the connection after a client is idle for N seconds (0 to disable)
timeout 5

We should connect to Redis, sleep for more than 5 seconds and attempt to set some key.

If this is an issue with Redis disconnecting idle clients, it should be exposed that way.

@jagguli
Copy link

jagguli commented Mar 19, 2019

BTW this issues seem to be resolved for us by switching to gevent workers on celery.

@thedrow
Copy link

thedrow commented Mar 20, 2019

We support eventlet and it was reproduced without it.

@3ddi
Copy link

3ddi commented Mar 24, 2019

I am experiencing that too, using gevent + direct use of redis-py 3.2.1. This is a stripped logic of what I try to do:

pubsub = redis_client.pubsub()
pubsub.subscribe(**{KEY: new_task_callback})
while True:
    for message in pubsub.listen():
        ...

The channel rarely gets triggered and it might pass days before the callback should be called. It worked fine with redis-py 2, but now, every exact 1 hour I get the exception:

  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 398, in read_response
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

@harrybiddle
Copy link

harrybiddle commented Mar 25, 2019

We are also seeing "Error 110 while writing to socket. Connection timed out." when trying to dispatch our Celery tasks. We are not using eventlet. We downgraded to Redis 2.10.6 / Kombu 4.3.0 / Celery 4.2 and our problems went away...

@DecisionSystems
Copy link

I'm having the same issue trying to connect using Python from Windows 10 Visual Studio Code and trying to connect to a Docker Container.

The code:

import redis
try:
conn = redis.StrictRedis(
host='127.0.0.1',
port=6379)
print(conn)
conn.ping()
print('Connected!')
except Exception as ex:
print('Error:', ex)
exit('Failed to connect, terminating.')

The error:
Redis<ConnectionPool<Connection<host=127.0.0.1,port=6379,db=0>>>
Error: Error while reading from socket: ('Connection closed by server.',)
Failed to connect, terminating.

@alexandre-paroissien
Copy link

Hi @3ddi and @harrybiddle I still have the issue, how about you? Any updates on your side?

@harrybiddle
Copy link

Hey @alexandre-paroissien, I'm sorry, I gave up and downgraded to Redis 2.10.6 / Kombu 4.3.0 / a forked Celery 4.2 with Python 3.7 support...!

@3ddi
Copy link

3ddi commented May 7, 2019

Hi @alexandre-paroissien, I caught the exception and reconnected. Not very elegant, but works for me till a proper fix will be released

@alexandre-paroissien
Copy link

Ok I confirm I still encounter this issue in the most recent versions of the libraries
celery 4.3.0, kombu 4.5.0, redis 3.2.1

I tested in a test app with no traffic apart from me, I launched a simple task manually, first time worked, second time gave the following output (and ending up working)


2019-05-21T07:51:02.022118+00:00 app[worker.1]: [2019-05-21 07:51:02,021: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (0/20) now.
2019-05-21T07:51:02.024750+00:00 app[worker.1]: [2019-05-21 07:51:02,024: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (1/20) in 1.00 second.
2019-05-21T07:51:03.028332+00:00 app[worker.1]: [2019-05-21 07:51:03,028: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (2/20) in 1.00 second.
2019-05-21T07:51:04.032513+00:00 app[worker.1]: [2019-05-21 07:51:04,032: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (3/20) in 1.00 second.
2019-05-21T07:51:05.037741+00:00 app[worker.1]: [2019-05-21 07:51:05,037: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (4/20) in 1.00 second.
2019-05-21T07:51:06.041513+00:00 app[worker.1]: [2019-05-21 07:51:06,041: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (5/20) in 1.00 second.
2019-05-21T07:51:07.045367+00:00 app[worker.1]: [2019-05-21 07:51:07,045: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (6/20) in 1.00 second.
2019-05-21T07:51:08.048339+00:00 app[worker.1]: [2019-05-21 07:51:08,048: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (7/20) in 1.00 second.
2019-05-21T07:51:09.052390+00:00 app[worker.1]: [2019-05-21 07:51:09,052: ERROR/ForkPoolWorker-5] Connection to Redis lost: Retry (8/20) in 1.00 second.

@andymccurdy
Copy link
Contributor

@alexandre-paroissien Hey, this is great. Do you happen to have the code for your test app published somewhere? If not, could you do so along with whatever other requirements you have installed (like eventlet/gevent/etc.)

@andymccurdy
Copy link
Contributor

@alexandre-paroissien I created a simple Celery app to hopefully track down what's going on. You can view it here: https://github.com/andymccurdy/celery-test

I'm installing it within a virtualenv with only the dependencies listed in requirements.txt.

Thus far I haven't seen any "Connection to Redis lost" type messages in the Celery logs. I adjusted my Redis server's timeout to 1 second in hopes of seeing connections break but everything seemed to work just fine.

Can you help figure out what's different in your test environment?

@alexandre-paroissien
Copy link

I'm not using eventlet nor gevent

Ubuntu 18 (Heroku-18)
Python 3.6 / Django 2.2 / Django Rest framework
Celery 4.2.1 or 4.3 and Django Celery Beat
Gunicorn 19.9.0
Redis 3.2.12 (Redis To Go)

I'll try to reproduce the issue in a test app this weekend

@JustinhoCHN
Copy link

same problem here, redis ver 3.2.1

def connect_to_redis():
    pool = redis.ConnectionPool(max_connections=100, host=args.ip, port=args.port, db=args.db, password=args.pwd)
    r = redis.Redis(connection_pool=pool)
    return r

redis_client = connect_to_redis()
sub_client = redis_client.pubsub(ignore_subscribe_messages=True)
sub_client.subscribe(args.channel)
while True:
    res = sub_client.get_message()
    if res:
        # do something with res
    time.sleep(0.001)

after 20 hours, error raised:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 398, in read_response
    raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/server/faiss_add.py", line 75, in <module>
    add_message = sub_client.get_message()
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3135, in get_message
    response = self.parse_response(block=False, timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3036, in parse_response
    return self._execute(connection, connection.read_response)
  File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3013, in _execute
    return command(*args)
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 409, in read_response
    (e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)

@andymccurdy
Copy link
Contributor

@3ddi @JustinhoCHN

Both of your errors look like the TCP connection between the client and server was disconnected. This can happen for a variety reasons outside the control of redis-py or the Redis server. Enabling TCP keepalive may help. You could also catch the error within your python code and reconnect to the server.

If you're only seeing these errors after upgrading to redis-py 3.1 or later, there was a bug in redis-py 2.x and 3.0.x that attempted to auto-reconnect when a ConnectionError was encountered. This caused these network errors to be hidden from users and could occasionally lead to data loss (missed pubsub messages, etc.)

@andymccurdy
Copy link
Contributor

All:

I've put together a patch that uses nonblocking sockets to test the health of connections. This patch completely removes the usage of selectors. I'm hoping this works better with gevent, eventlet and other async selectors.

I'd appreciate any help in testing this patch in different environments. The patch is in the "nonblocking" branch here: https://github.com/andymccurdy/redis-py/tree/nonblocking

@NullYing
Copy link

I still have the issue

celery: 4.2.1
redis: 3.2.1

ConnectionError: Error 104 while writing to socket. Connection reset by peer.
  File "kombu/connection.py", line 431, in _reraise_as_library_errors
    yield
  File "celery/app/base.py", line 755, in send_task
    self.backend.on_task_call(P, task_id)
  File "celery/backends/redis.py", line 294, in on_task_call
    self.result_consumer.consume_from(task_id)
  File "celery/backends/redis.py", line 136, in consume_from
    self._consume_from(task_id)
  File "celery/backends/redis.py", line 142, in _consume_from
    self._pubsub.subscribe(key)
  File "redis/client.py", line 3096, in subscribe
    ret_val = self.execute_command('SUBSCRIBE', *iterkeys(new_channels))
  File "redis/client.py", line 3009, in execute_command
    self._execute(connection, connection.send_command, *args)
  File "redis/client.py", line 3013, in _execute
    return command(*args)
  File "redis/connection.py", line 620, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "redis/connection.py", line 613, in send_packed_command
    (errno, errmsg))
ConnectionResetError: [Errno 104] Connection reset by peer
  File "redis/connection.py", line 600, in send_packed_command
    self._sock.sendall(item)

@harmant
Copy link

harmant commented Jun 16, 2019

We have the same issue with:

redis-py: 3.2.1
redis: 5.0.3
retry_on_timeout: True

redis-py 3.0.1 works without errors.

Stacktrace:

File "redis/client.py", line 1264, in get
    return self.execute_command('GET', name)
  File "redis/client.py", line 775, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "redis/client.py", line 789, in parse_response
    response = connection.read_response()
  File "redis/connection.py", line 637, in read_response
    response = self._parser.read_response()
  File "redis/connection.py", line 409, in read_response
    (e.args,))

These errors are very unstable. We can't reproduce them in a test case.

Additional data from stack trace (Sentry):

'_buffer_cutoff': 6000
'_parser': <redis.connection.HiredisParser object at 0x7f167c418790>
'_selector': <redis.selector.PollSelector object at 0x7f167c425310>
'_sock': None
'encoder': <redis.connection.Encoder object at 0x7f167c418810>
'retry_on_timeout': True
'socket_connect_timeout': None
'socket_keepalive': None
'socket_keepalive_options': 
'socket_timeout': None
'socket_type': 0

@jmc-rival
Copy link

Adding another voice to the mix here, we upgraded to py-redis 3.2.1 yesterday and ran into this issue with logs of ConnectionErrors showing up in our logs. We need the ZPOP functionality added in 3.x so we downgraded to 3.0.1 and are no longer seeing the issue. I think that change mentioned above in 3.1 is what broke this.

FWIW, we aren't using pubsub at all, we were experiencing the error on normal redis commands.

We are running in AWS Lambda against ElastiCache redis (through GhostTunnel) using SSL.

@andymccurdy
Copy link
Contributor

@marcomezzaro This is great info, thanks. It furthers my suspicion that these errors are the result of network services dropping connections, such as when they are idle for some period.

Do you happen to have a docker-compose.yml file for the celery-test/haproxy/redis-server setup? If you do, could you post it? I'd like to experiment a bit more.

@marcomezzaro
Copy link

Hi,
@andymccurdy I've forked your repo and I've added docker-compose file.
https://github.com/marcomezzaro/celery-test/tree/docker-compose

docker-compose up and just wait 30 seconds, you will see the error stacktrace.

if you change in file "celeryconfig.py" the broker_url/backend from haproxy to redis you will see no errors.

Let me know if you have any idea.

@andymccurdy
Copy link
Contributor

andymccurdy commented Jul 24, 2019

@marcomezzaro Thanks! This is very helpful. I can finally reproduce the issue.

I'm working on a fix for this here: https://github.com/andymccurdy/redis-py/tree/ping-health-checks

The good news is that I believe I have this fixed for workloads that don't need pubsub. Extending this concept to pubsub requires a little more code, but I think I'm close and should have something tomorrow.

The bad news is that celery's implementation bypasses a lot of the pubsub flow. They've created their own socket poller that looks for activity on the socket rather than asking the redis-py API if a message is available. This means that even once the pubsub health check works, celery won't be regularly invoking it.

Once our implementation is in place perhaps we can get a patch into celery to take advantage of the health check.

@thedrow
Copy link

thedrow commented Jul 24, 2019

Wonderful!
In Celery 5 we unfortunately won't be using redis-py anymore since it's a blocking client.
It's still very useful to everyone who's using Celery 4 and everyone else who is using redis-py.

@okomarov
Copy link

@thedrow Could you elaborate on "...it's a blocking client."? The default ConnectionPool is non-blocking.

@thedrow
Copy link

thedrow commented Jul 24, 2019

It does not use trio or asyncio which means we'll have to do something to add it ourselves or switch to a different client.

@andymccurdy
Copy link
Contributor

I just finished the code and tests for redis-py health checks. My intent is to merge this over the weekend or early next week. A new redis-py release will be made at that time. In the mean time you can find the branch here: https://github.com/andymccurdy/redis-py/tree/ping-health-checks

This patch introduces a new option: health_check_interval. By default, health_check_interval=0 which disables health checks. To enable health checks, set health_check_interval to a positive integer indicating the number of seconds that a connection can be idle before a health check is performed. For example, health_check_interval=30 will ensure that a health check is run on any connection that's been idle for 30 or more seconds just before a command is executed on that connection.

I recommend setting this option to a value less than the idle connection timeout value in the target system. For example, if you know that idle TCP connections are killed after 30 seconds in your environment then set the health_check_interval to 20-25 seconds.

This option also works on any PubSub connection that is created from a client with health_check_interval enabled. PubSub users just need to ensure that get_message() or listen() are called more frequently than health_check_interval seconds. I assume most workloads are already doing this.

Some advanced PubSub use cases don't regularly call get_message() or listen(). In these cases, the user must call pubsub.check_health() explicitly.

For Celery users, this change won't automatically fix ConnectionErrors encountered by Celery. Celery uses PubSub in a non-standard way which can not take advantage of the automatic health checks at this time. Once this code is released, we should be able to create a PR for Celery to regularly call pubsub.check_health().

If anyone has time to help test this in their own systems I would greatly appreciate it.

CC @thedrow

@andymccurdy
Copy link
Contributor

Version 3.3.0 has been released and is available on PyPI. The health_check_interval option is included.

@mlissner
Copy link

mlissner commented Aug 5, 2019

Any reason this issue is still open? The issue on Celery where I think this will be discussed is: celery/kombu#1019.

(I'm not savvy enough to do the fix, but I can at least help connect dots.)

@andymccurdy
Copy link
Contributor

I've kept this open in case anyone wanted to report back success/failure trying out the 3.3.x health checks.

@dtran320
Copy link

dtran320 commented Aug 8, 2019

It looks like _selector was removed in this commit if anyone else tried to run

r.connection_pool.get_connection('_')._selector

and was confused why it wasn't working in 3.3.6.

@shimk52
Copy link

shimk52 commented Jun 30, 2020

I've kept this open in case anyone wanted to report back success/failure trying out the 3.3.x health checks.

I'm getting the following with version 3.5.3 and kombu == 4.2.1:

2020-06-27 11:54:37,625 p8-t140655154120448 ERROR : Error 110 connecting to Redis-EU2.redis.cache.windows.net:6379. Connection timed out.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 559, in connect
    sock = self._connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 851, in _connect
    sock = super(SSLConnection, self)._connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 615, in _connect
    raise err
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 603, in _connect
    sock.connect(socket_address)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1801, in set
    return self.execute_command('SET', *pieces)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 898, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 1192, in get_connection
    connection.connect()
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 563, in connect
    raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 110 connecting to Redis-EU2.redis.cache.windows.net:6379. Connection timed out.

Downgrading to 3.3.0 gave the following error:

2020-06-30 15:02:26,351 p6-t140649877714688 ERROR: The operation did not complete (read) (_ssl.c:2309)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 1519, in set
    return self.execute_command('SET', *pieces)
  File "/usr/local/lib/python3.6/site-packages/redis/client.py", line 836, in execute_command
    conn = self.connection or pool.get_connection(command_name, **options)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 1049, in get_connection
    if connection.can_read():
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 666, in can_read
    return self._parser.can_read(timeout)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 280, in can_read
    return self._buffer and self._buffer.can_read(timeout)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 190, in can_read
    raise_on_timeout=False)
  File "/usr/local/lib/python3.6/site-packages/redis/connection.py", line 159, in _read_from_socket
    data = recv(self._sock, socket_read_size)
  File "/usr/local/lib/python3.6/site-packages/redis/_compat.py", line 58, in recv
    return sock.recv(*args, **kwargs)
  File "/usr/local/lib/python3.6/ssl.py", line 997, in recv
    return self.read(buflen)
  File "/usr/local/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.6/ssl.py", line 633, in read
    v = self._sslobj.read(len)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:2309)

I'm trying to avoid downgrading to 2.10.6 as I loose functionality like using this as a context manager.
I will probably add a reconnect logic to avoid this issue.

@andymccurdy
Copy link
Contributor

@shimk52 If you look at the traceback for both exceptions when running 3.5.3 you'll see that the client timed out attempting to connect to the server. This seems more like an issue with your client machine's connectivity to the server or the server itself.

@shimk52
Copy link

shimk52 commented Jul 2, 2020

@andymccurdy thank you for your reply!
I don't think this is related to the Redis server itself, as it happens in different environments where each environment uses a different redis instance.

Say the problem is with my client, which all it does is get and set to redis, after of course initiating a Redis instance with only host and password params.
How would one add a reconnect logic to redis?
From what I read in the docs, you are using a connection pool and a connection is created per request, meaning ping() and then re-init Redis is redundant from what I understand.
If there is a known issue or something that I can contribute to, please let me know.

@andymccurdy
Copy link
Contributor

@shimk52 Have you tried the health_check_interval option? Try setting health_check_interval=N, where N is the maximum number of idle seconds that a connection can remain connected without checking its own health. A health check includes a roundtrip ping/pong. If that check fails, the redis-py attempts to reestablish the connection exactly once. If the health check fails an error is raised. Otherwise things proceed as expected.

@shimk52
Copy link

shimk52 commented Jul 13, 2020

@andymccurdy Thank you for helping.
I have finally found the issue, is was a bad port when connecting to Redis Azure (AWS works fine with default port).
All looks good now, using 3.5.3 with kombu == 4.2.1.

@andymccurdy
Copy link
Contributor

Great, closing this as it has gone through several iterations of various issues. If anyone is still having issues that have to do with any part of this thread, please open a new issue. Thanks!

@mohit-chawla
Copy link

mohit-chawla commented Aug 12, 2020

@marcomezzaro I have a setup of the form: [celery worker, redis-py-client] --> [haproxy] --> [redis-master].
And i am sporadically facing this error

 File "/home/mohit/.local/lib/python3.7/site-packages/celery/result.py", line 387, in __del__
    self.backend.remove_pending_result(self)
  File "/home/mohit/.local/lib/python3.7/site-packages/celery/backends/asynchronous.py", line 175, in remove_pending_result
    self.on_result_fulfilled(result)
  File "/home/mohit/.local/lib/python3.7/site-packages/celery/backends/asynchronous.py", line 183, in on_result_fulfilled
    self.result_consumer.cancel_for(result.id)
  File "/home/mohit/.local/lib/python3.7/site-packages/celery/backends/redis.py", line 148, in cancel_for
    self._pubsub.unsubscribe(key)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/client.py", line 3280, in unsubscribe
    return self.execute_command('UNSUBSCRIBE', *args)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/client.py", line 3155, in execute_command
    self._execute(connection, connection.send_command, *args, **kwargs)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/client.py", line 3159, in _execute
    return command(*args, **kwargs)
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/connection.py", line 687, in send_command
    check_health=kwargs.get('check_health', True))
  File "/home/mohit/.local/lib/python3.7/site-packages/redis/connection.py", line 679, in send_packed_command
    (errno, errmsg))
**redis.exceptions.ConnectionError: Error 32 while writing to socket. Broken pipe.**

Any suggestions will be appreciated. cc : @andymccurdy

@Shivakumar2602
Copy link

Hello Everyone,

I am getting below error while performing the insert operation in azure cache redis.

File "/home/fmlstream/lsh/lshmodelpipeline/pipelines/lsh_pipeline.py", line 438, in _save_lsh\n lsh_name.insert(name_dict, batch)\n File "/home/fmlstream/lsh/lshmodelpipeline/lsh/lsh_insert.py", line 27, in insert\n logging.warn('{}: {}'.format(str(e), key))\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/lsh.py", line 317, in exit\n self.close()\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/lsh.py", line 320, in close\n self.lsh.keys.empty_buffer()\n File "/home/fmlstream/lsh/lshmodelpipeline/datasketch/storage.py", line 1010, in empty_buffer\n self._buffer.execute()\n File "/home/fmlstream/lshmodelvenv/lib/python3.6/site-packages/redis/client.py", line 3437, in execute\n self.shard_hint)\n File "/home/fmlstream/lshmodelvenv/lib/python3.6/site-packages/rediscluster/connection.py", line 196, in get_connection\n raise RedisClusterException("Only 'pubsub' commands can be used by get_connection()")\nrediscluster.exceptions.RedisClusterException: Only 'pubsub' commands can be used by get_connection()

any help will be much appreciated. @andymccurdy

thanks,
Shiva

@paramite
Copy link

Noting that using redis-py 3.5.3 with health_check_interval=30 and having haproxy timeout set to 60m, I still see the issue (see following traceback) every hour in log files:

2022-01-12 16:16:03,305 [42] ERROR gnocchi.cli.metricd: Error while listening for new measures notification, retrying
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/gnocchi/cli/metricd.py", line 186, in _fill_sacks_to_process
for sack in self.incoming.iter_on_sacks_to_process():
File "/usr/lib/python3.6/site-packages/gnocchi/incoming/redis.py", line 199, in iter_on_sacks_to_process
for message in p.listen():
File "/usr/lib/python3.6/site-packages/redis/client.py", line 3605, in listen
response = self.handle_message(self.parse_response(block=True))
File "/usr/lib/python3.6/site-packages/redis/client.py", line 3505, in parse_response
response = self._execute(conn, conn.read_response)
File "/usr/lib/python3.6/site-packages/redis/client.py", line 3479, in _execute
return command(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 739, in read_response
response = self._parser.read_response()
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 324, in read_response
raw = self._buffer.readline()
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 256, in readline
self._read_from_socket()
File "/usr/lib/python3.6/site-packages/redis/connection.py", line 201, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.

openstack-mirroring pushed a commit to openstack-archive/puppet-tripleo that referenced this issue Feb 14, 2022
When using py-redis for connecting to Redis via HAProxy the connection
is being closed even when alive by HAProxy. Unfortunately this is a know
issue on py-redis side (see [1]). This patch increases connection timeouts
to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every
2 minutes.

[1] redis/redis-py#1140
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373

Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76
openstack-mirroring pushed a commit to openstack/openstack that referenced this issue Feb 14, 2022
* Update puppet-tripleo from branch 'master'
  to e6b8f34049a9ab28c535dd6e291f36a7b3d2d5ef
  - Merge "Increase connection timeouts for Redis"
  - Increase connection timeouts for Redis
    
    When using py-redis for connecting to Redis via HAProxy the connection
    is being closed even when alive by HAProxy. Unfortunately this is a know
    issue on py-redis side (see [1]). This patch increases connection timeouts
    to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every
    2 minutes.
    
    [1] redis/redis-py#1140
    [2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373
    
    Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76
openstack-mirroring pushed a commit to openstack-archive/puppet-tripleo that referenced this issue Feb 17, 2022
When using py-redis for connecting to Redis via HAProxy the connection
is being closed even when alive by HAProxy. Unfortunately this is a know
issue on py-redis side (see [1]). This patch increases connection timeouts
to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every
2 minutes.

[1] redis/redis-py#1140
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373

Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76
(cherry picked from commit 209e954)
openstack-mirroring pushed a commit to openstack-archive/puppet-tripleo that referenced this issue Mar 7, 2022
When using py-redis for connecting to Redis via HAProxy the connection
is being closed even when alive by HAProxy. Unfortunately this is a know
issue on py-redis side (see [1]). This patch increases connection timeouts
to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every
2 minutes.

[1] redis/redis-py#1140
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373

Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76
(cherry picked from commit 209e954)
openstack-mirroring pushed a commit to openstack-archive/puppet-tripleo that referenced this issue May 4, 2022
When using py-redis for connecting to Redis via HAProxy the connection
is being closed even when alive by HAProxy. Unfortunately this is a know
issue on py-redis side (see [1]). This patch increases connection timeouts
to not pollute (for example) Gnocchi [2] logs with reconnect tracebacks every
2 minutes.

[1] redis/redis-py#1140
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1924373

Change-Id: Ie7ee7c90107cfe5bff08f5c778a6273ae9ffcc76
(cherry picked from commit 209e954)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests