Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hiredis hangs on redisBufferRead #263

Closed
saravanan-elangovan opened this issue Aug 13, 2014 · 23 comments
Closed

hiredis hangs on redisBufferRead #263

saravanan-elangovan opened this issue Aug 13, 2014 · 23 comments

Comments

@saravanan-elangovan
Copy link

We use hiredis to connect to redis server in block mode. Our application runs in RHEL 4 32 bit. It works fine and after some time while trying to do an operation, it hangs. On analysis, found it got hanged in the redisBufferRead. This application act like a switch which gets the data from TCP Socket and store the same in the Redis database for recoverability.

On netstat output, We found the SendQ of the our client has non-zero value which indicates that request has not reached redis server. This behaviour happens with INCR & INCRBY mostly.

We opened a redis-client and did the same operation and it works fine.

gdb output
#0 0x008787a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x001de69b in __read_nocancel () from /lib/tls/libpthread.so.0
#2 0x08073a42 in redisBufferRead (c=0x89c2fb8) at hiredis.c:1142
#3 0x08073c59 in redisGetReply (c=0x89c2fb8, reply=0x590dcbc) at hiredis.c:1228
#4 0x08073e99 in redisvCommand (c=0x89c2fb8, format=0x8077f40 "INCRBY %s %ld", ap=0x590dcf8 "t\021\225\b\001") at hiredis.c:1332
#5 0x08073ed2 in redisCommand (c=0x89c2fb8, format=0x8077f40 "INCRBY %s %ld") at hiredis.c:1349

@pietern
Copy link
Contributor

pietern commented Aug 13, 2014

It is blocked on a system call. Looks like a kernel issue to me. Have you tried replicating the issue on machines with newer kernel versions?

@saravanan-elangovan
Copy link
Author

I believe the client has sent the message which has not reached or not handled by redis server. As the client is in the block mode, it waits indefintely for the response from the redis server. Even netstat output indicates SendQ of the tcp socket having 43 bytes ( in our case ).

This behaviour is observed only between hiredis & redis communication. We have couple of program which uses tcp & http and its works fine.

@michael-grunder
Copy link
Collaborator

@saravanan-elangovan Are you connecting with a timeout? This just looks like it's blocking on the socket which doesn't mean there is any actual issue with hiredis, per se. Now, it's possible that there is some weird edge case where hiredis fails to send the complete command, but we use hiredis to send billions of commands to Redis over several hours and/or days and don't see this behavior.

The only obvious difference is that we're running 64 bit hiredis.

@shoutrain
Copy link

I met the same issue in 2012 and 2013. And in these days I met them again. Maybe it's due to the old lib, but why this issue is still keeping open status.

enviroment:

  1. Linux tesla001 2.6.32-504.12.2.el6.x86_64 memory leak when dealing with integer response ? #1 SMP Wed Mar 11 22:03:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux.
  2. libhiredis.so.0.10 (yum updated from default centos 6 repo)

thread stacks:

[Switching to thread 3 (Thread 0x7fdbee57b700 (LWP 7795))]#0 0x000000348c40e75d in read () fro m /lib64/libpthread.so.0
(gdb) bt
#0 0x000000348c40e75d in read () from /lib64/libpthread.so.0
#1 0x00007fdbeeddd7bb in redisBufferRead () from /usr/lib64/libhiredis.so.0.10
#2 0x00007fdbeeddefcf in redisGetReply () from /usr/lib64/libhiredis.so.0.10
#3 0x00007fdbeeddf085 in redisvCommand () from /usr/lib64/libhiredis.so.0.10
#4 0x00007fdbeeddf118 in redisCommand () from /usr/lib64/libhiredis.so.0.10
#5 0x0000000000405843 in CRedisOperator::handleGeneratorInfo (this=0x7e5a30, ownerId=3, data=0 x7e59b8) at src/database/CRedisOperator.cpp:146
#6 0x00000000004038a2 in CTransaction::onGeneratorInfo (this=0x7e4d40, sequence=5112, data=0x7 e59b8) at src/transaction/CTransaction.cpp:161
#7 0x0000000000403367 in CTransaction::onMessage (this=0x7e4d40, msg=0x7e59a8) at src/transact ion/CTransaction.cpp:58
#8 0x000000000040c802 in CNodeGroup::working (this=0x7e5858) at src/traffic/CNodeGroup.cpp:70
#9 0x000000000040b5b2 in CWorker::run (object=0x7e5860) at src/common/CWorker.cpp:68
#10 0x000000348c4079d1 in start_thread () from /lib64/libpthread.so.0
#11 0x000000348c0e88fd in clone () from /lib64/libc.so.6

@badboy
Copy link
Contributor

badboy commented Mar 27, 2015

No one investigated this in more detail yet, but as per the comments above this might be an issue with the network or the Redis contacting and not a bug in hiredis itself.
Are you using timeouts on the connection? Blocking commands? What's the network status, what does netstat say about the Send-Q buffer?

@shoutrain
Copy link

  1. I used timeout on the connection, it's 1 second. But I think the timeout only works with connect, not for redisCommand().
  2. Yes, it's a blocking command.
  3. Actually the redis-server and the application using hiredis are in the same machine. netstat outputs related to redis-server below:

[rafael@tesla001 softwares]$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:38836 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38839 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38838 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38842 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38845 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38852 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38854 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38851 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38847 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38849 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38837 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38848 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38840 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38841 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38844 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38843 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38855 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38850 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38853 localhost:6379 TIME_WAIT

@badboy
Copy link
Contributor

badboy commented Mar 27, 2015

So all connections to the Redis server are already half-closed, which might indicate a completely different problem.

@shoutrain
Copy link

sorry bad boy, I run netstat again, it looks I deleted some established connections last time. actually there are active connections. paste them all:

[rafael@tesla001 softwares]$ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:38836 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38839 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38838 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38842 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38845 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38852 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38854 localhost:6379 TIME_WAIT
tcp 0 0 120.24.169.81:10506 49.80.219.76:10174 ESTABLISHED
tcp 0 0 localhost:38851 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38847 localhost:6379 TIME_WAIT
tcp 0 0 localhost:6379 localhost:37590 ESTABLISHED
tcp 0 0 120.24.169.81:ssh 49.80.221.174:28720 ESTABLISHED
tcp 0 0 120.24.169.81:ssh 49.80.219.76:9326 ESTABLISHED
tcp 0 0 localhost:38849 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38837 localhost:6379 TIME_WAIT
tcp 0 0 localhost:35097 localhost:6379 ESTABLISHED
tcp 0 0 localhost:38848 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38840 localhost:6379 TIME_WAIT
tcp 0 0 localhost:6379 localhost:35097 ESTABLISHED
tcp 0 0 localhost:38841 localhost:6379 TIME_WAIT
tcp 0 0 localhost:37590 localhost:6379 ESTABLISHED
tcp 0 0 localhost:38844 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38843 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38855 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38850 localhost:6379 TIME_WAIT
tcp 0 0 localhost:38853 localhost:6379 TIME_WAIT
tcp 0 0 120.24.169.81:53500 42.156.166.25:http ESTABLISHED
tcp 0 0 localhost:38846 localhost:6379 TIME_WAIT
tcp 0 0 120.24.169.81:ssh 49.80.219.76:10273 ESTABLISHED

@echoma
Copy link

echoma commented Apr 26, 2015

your netstat output shows there is no none-zero sendq.

btw. you can use redisSetTimeout() to set read/write timeout value.

@shoutrain
Copy link

@echoma Thanks.

Actually I used redisSetTimeout() and I found the reason finally. I used multi-threads with the api, but it doesn't support.

@badboy
Copy link
Contributor

badboy commented May 4, 2015

So this can be closed?

@shoutrain
Copy link

@badboy Yes, I have solved the problem by move all api calls in the same thread. But @saravanan-elangovan opened this issue, I am not sure he met the same situation like me.

@abigbigbird
Copy link

shoutrain commented on 4 May 2015
@echoma Thanks.

Actually I used redisSetTimeout() and I found the reason finally. I used multi-threads with the api, but it doesn't support.

Question:hiredis doesn't support multi-threads? where i can see it? I used multi-threads also!

@badboy
Copy link
Contributor

badboy commented Jul 6, 2016

@abigbigbird Can see what?
I added a short note a while back, see the note under Connecting.
The context object has no synchronization of its internals, using it from multiple threads at once will most likely result in race conditions.

@abigbigbird
Copy link

Oh,I saw the note ! thanks

I use hiredis in my project,It works fine in most of the time,but there are several times ,it hangs。

## the gdb core stack as below:
#0 redisBufferRead (c=0x7f26c4013a10) at hiredis.c:800
#1 0x00007f27479e9cd0 in redisGetReply (c=0x7f26c4013a10, reply=0x7f26ea4e7ea0) at hiredis.c:903
#2 0x00007f27479ea02b in __redisBlockForReply (c=0x7f26c4013a10, argc=, argv=,
argvlen=) at hiredis.c:1010
#3 redisCommandArgv (c=0x7f26c4013a10, argc=, argv=, argvlen=)
at hiredis.c:1035
#4 0x00007f27479d31cc in bfd::codis::CodisClient::RedisCommand (this=0x200ceb0, command=..., tt=0) at src/CodisClient.cpp:995
#5 0x00007f27479d3662 in bfd::codis::CodisClient::RedisCommand (this=0x200ceb0, command=..., tt=0) at src/CodisClient.cpp:1035
#6 0x00007f27479c9f64 in bfd::codis::CodisClient::get (this=0x200ceb0, key=..., tt=0) at src/CodisClient.cpp:152
#7 0x00007f27479e5a18 in bfd::codis::BfdCodis::get (this=0x2130120, key=..., tt=0) at src/BfdCodis.cpp:60
#8 0x000000000050d89e in RedisClientProxy::get (this=0x20e34f0, key=...)
(gdb) p c
$1 = {err = 0, errstr = '\000' <repeats 127 times>, fd = 72, flags = 3, obuf = 0x7f26b40087c8 "", reader = 0x7f26c4013af0,
connection_type = REDIS_CONN_TCP, timeout = 0x0, tcp = {host = 0x7f26c4013d50 "192.168.
*", source_addr = 0x0, port = 19000},
unix_sock = {path = 0x0}}
(gdb) p *buf
Cannot access memory at address 0x7f26ea4e3d80

os:Red Hat Enterprise Linux Server release 6.4 (Santiago)
gcc version:4.6.3
hiredis version:0.13.3

@badboy
Copy link
Contributor

badboy commented Jul 6, 2016

@abigbigbird Is this the same as #442? If so, please re-open there. This issue here seems to be another problem.

@abigbigbird
Copy link

oh,sorry,i saw it core at the same code。my fault!

@naugaria7
Copy link

I am running into a similar backtrace on gdb. I am using hiredis for mainly get/set queries. It works fine most of the times but once in a while it hangs on RedisBufferRead().
I am using a connection pool which has Redis Contexts being created and given to threads as and when required. Could this be an issue?

@ryancheung
Copy link

ryancheung commented Sep 6, 2017

The same issue here:

(gdb) bt
#0  0x00007f84e681a51d in read () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007f84e5432f8e in redisBufferRead () from /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13
#2  0x00007f84e5433280 in redisGetReply () from /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13
#3  0x00007f84e54332b5 in ?? () from /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13
#4  0x00007f84e5433634 in redisCommand () from /usr/lib/x86_64-linux-gnu/libhiredis.so.0.13
#5  0x00000000006bd86f in DisqueClient::AddJob (this=0x7f84dc0008c0, queueName="umbrella_api", job="device_connected:862670225013751", retry=5, ttl=2592000) at /root/device_manager/src/managerd/DisqueClient.cpp:17
#6  0x0000000000666c79 in WorldSocket::HandleAuth (this=0x7f84c0112f40, recvPacket=...) at /root/device_manager/src/managerd/WorldSocket.cpp:235

The server is not redis, but similar as redis: https://github.com/antirez/disque

EDIT: I use a single instance for redisContext. But I just read Note: A redisContext is not thread-safe. in README of hiredis repo. So this should be a multithreading issue.

@xinquanatcolezhu
Copy link

same issue here if i run it in a multi- threads

@mfmain
Copy link

mfmain commented Apr 19, 2018

sometimes we are even unaware of threading usage, like being in an callback routine, why dont we introduce a mutex inside redisCommand? I believe there are many products in the market shipped with this bug and the authors have not noticed

@michael-grunder
Copy link
Collaborator

Making hiredis thread safe would require a lot more than just adding a mutex to redisCommand. Besides, if we added thread safety to hiredis proper it would reduce flexibility.

The solution, I think, is to handle synchronization in each individual application in the best way for that application. There are also thread-safe wrappers to hiredis available but I have not used any of them personally.

@michael-grunder
Copy link
Collaborator

Closing this old issue, since it's not clear there is actually a bug being described.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests