Redis Connections on constant increase #754

cchatham · 2015-09-21T13:02:27Z

Hey guys!

We are seeing a constant increase of connections with Redis that eventually causes our app servers to fail (in AWS). I wanted to see if anyone else had seen this issue.

We are looking into a fix but any help would be appreciated. Thanks!

brianhyder · 2015-09-21T13:15:14Z

Thanks for reporting. It is possible to create your own cache connections however the way it is setup the connection should be shared: https://github.com/pencilblue/pencilblue/blob/0.5.0/include/dao/cache.js#L47

I would look to see where the "getInstance" and "createInstance" functions are being called. The code that uses it should be isolated to the instances of cache_entity_service, session storage, and server registry. That is all dependent on your configuration but I think for you guys that is probably close to accurate.

It also appears that the driver for Redis has been updated to a stable "1.0.0" version. I would suggest, as a plan of attack, that you check for obvious places where connections are being created instead of reused. Look for a configuration option for the driver to ensure that connections auto re-connect. Look for stack traces in logs to see if connections are hanging around after workers die off. Update driver and see if behavior is the same.

I can also help take a look at this tonight.

cchatham · 2015-09-21T16:02:32Z

Hey Brian!

We only leverage redis for session that we know of and we call it by using pb.cache in our custom code.

We managed to get a list of all the clients connected to redis at a moment in time. I removed the IPs for security reason, but if you look at the idle seconds, it's very curious. I am also including our average connections over the past day to so the increasing issue we are seeing.

Redis:

Connections:

brianhyder · 2015-09-21T16:11:52Z

Thanks for the additional info. Can you confirm what command broker
implementation you are using? I'd also like to get the time frame from the
log snippet you sent. I wouldn't expect that many "subscribes" unless you
have that many workers. The subscribe command is used to listen for
commands and jobs from other members of the cluster.
On Sep 21, 2015 12:02 PM, "cchatham" notifications@github.com wrote:

Hey Brian!

We only leverage redis for session that we know of and we call it by using
pb.cache in our custom code.

We managed to get a list of all the clients connected to redis at a moment
in time. I removed the IPs for security reason, but if you look at the idle
seconds, it's very curious. I am also including our average connections
over the past day to so the increasing issue we are seeing.

Redis:
[image: redisclients]
https://cloud.githubusercontent.com/assets/4976408/9997095/530a24fc-6058-11e5-966f-400caa004830.png

Connections:
[image: image]
https://cloud.githubusercontent.com/assets/4976408/9997144/9c4b6112-6058-11e5-88f4-ef5550c20268.png

—
Reply to this email directly or view it on GitHub
#754 (comment)
.

cchatham · 2015-09-21T18:35:06Z

We are definitely using the default RedisCommandBroker. We have 2 workers running according to our global home view. The time frame for the client list was around 11:30am this morning EST.

brianhyder · 2015-09-21T20:41:57Z

It is weird why it would be spawning so many new connections. The node redis package was updated today, coincidentally enough v2 was released. I'll update it tonight and play with it to see if I can get the connection count to rise. There are also a few options we can tweak to optimize the connection.

cchatham · 2015-09-21T20:52:53Z

So it looks like when AWS Beanstalk spins down servers the connections still survive. I did the client list command again and it returned a lot of IPs that are no longer on servers. So either quit is not getting called when it is supposed to or Beanstalk isn't letting us clean up...

brianhyder · 2015-09-21T20:57:20Z

Interesting. There should be log statements (not sure what log level has to be active) that say XYZ shutting down. It could either be that the instance isn't signaling properly or that PB isn't catching signals appropriately. I'll try and double check that tonight. Now that we know it isn't the driver I'll hold off on the upgrade so we don't introduce another variable.

brianhyder · 2015-09-21T20:57:41Z

Thanks again for additional information. It is extremely helpful.

cchatham · 2015-09-21T21:00:15Z

Still not 100% sure it isn't the driver...

Let me know if you want more information. We are racking our brains over here. The code looks solid. We may need a timeout on the redis server side? Lots of possibilities!!!

brianhyder · 2015-09-22T02:27:46Z

OK, I took a look at the code tonight. There was an issue with the platform responding appropriately to process signals (#755). That has been resolved and merged into 0.5.0. Hopefully, at a minimum, it will eliminate one variable.

Are y'all using ElastiCache or true redis instances? I've seen instances where TCP connections are kept alive in an ELB but are actually dead. It just seems weird that the redis server is the one holding onto the connection. My thought is that the connection would be dropped unless a heartbeat was received.

cchatham · 2015-09-22T13:08:48Z

Yea we are using redis via Elasticache. We will update our fork and see if that helps. Next step we want to try is updating the driver because we are running out of ideas and I don't want to put a timeout on our connections.

brianhyder · 2015-09-22T13:10:59Z

Sounds like a plan. The release notes for the latest version of the driver are pretty good. The maintainer outlines the breaking changes and other modifications to defaults. One of which is connection timeout iirc.

btidwell · 2015-09-22T13:47:21Z

FYI, I just merged the latest from 0.5.0 and noticed all of the localizations missing specific to site management and global plugins.

brianhyder · 2015-09-22T14:40:54Z

Yup, you are correct. I botched the merge. I'll get that corrected this evening. My apologies.

brianhyder · 2015-09-23T00:41:29Z

@btidwell Just to confirm, y'all only added en-US translations for multi-site correct? I have added the missing translations for English back into the 0.5.0 branch.

brianhyder · 2015-09-23T01:11:26Z

@cchatham I updated the driver on a local branch to v2.0.0. It appeared to work as desired. I will test a bit more tomorrow but it would most likely be safe to update the driver and test to see if that fixes the connection issue.

cchatham · 2015-10-05T13:30:29Z

@brianhyder We updated our driver and pulled your bug fix. It has seemed to help the severity of the incline but we are still steadily increasing over the past 2 weeks. We will continue to keep an eye on it.

cchatham · 2015-11-25T18:02:07Z

Since we upgraded we haven't had any further issues. We could probably close out this issue.

brianhyder added the help wanted label Sep 21, 2015

brianhyder self-assigned this Sep 21, 2015

brianhyder closed this as completed Dec 2, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis Connections on constant increase #754

Redis Connections on constant increase #754

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 22, 2015

cchatham commented Sep 22, 2015

brianhyder commented Sep 22, 2015

btidwell commented Sep 22, 2015

brianhyder commented Sep 22, 2015

brianhyder commented Sep 23, 2015

brianhyder commented Sep 23, 2015

cchatham commented Oct 5, 2015

cchatham commented Nov 25, 2015

Redis Connections on constant increase #754

Redis Connections on constant increase #754

Comments

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 21, 2015

brianhyder commented Sep 21, 2015

cchatham commented Sep 21, 2015

brianhyder commented Sep 22, 2015

cchatham commented Sep 22, 2015

brianhyder commented Sep 22, 2015

btidwell commented Sep 22, 2015

brianhyder commented Sep 22, 2015

brianhyder commented Sep 23, 2015

brianhyder commented Sep 23, 2015

cchatham commented Oct 5, 2015

cchatham commented Nov 25, 2015