Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Page times out when dalli can't connect #293

Closed
BRMatt opened this Issue · 10 comments

5 participants

@BRMatt

We're hosting an app on heroku, and when our memcache provider (memcachier) goes down we often end up with lots of logs that looks like this:

Nov 21 18:41:01 app/web.1:  dev2.ec2.memcachier.com:11211 failed (count: 0)
Nov 21 18:41:01 app/web.1:  Dalli/SASL authenticating as mUA9
Nov 21 18:41:01 app/web.1:  Dalli/SASL: mUA9
Nov 21 18:41:02 app/web.1:  dev2.ec2.memcachier.com:11211 failed (count: 0)
Nov 21 18:41:02 app/web.1:  Dalli/SASL authenticating as mUA9
Nov 21 18:41:02 app/web.1:  Dalli/SASL: mUA9
Nov 21 18:41:02 app/web.1:  dev2.ec2.memcachier.com:11211 failed (count: 0)
Nov 21 18:41:03 app/web.1:  Dalli/SASL authenticating as mUA9
Nov 21 18:41:03 app/web.1:  Dalli/SASL: mUA9
Nov 21 18:41:03 app/web.1:  dev2.ec2.memcachier.com:11211 failed (count: 0)
Nov 21 18:41:03 app/web.1:  Dalli/SASL authenticating as mUA9
Nov 21 18:41:03 app/web.1:  Dalli/SASL: mUA9
Nov 21 18:41:04 app/web.1:  dev2.ec2.memcachier.com:11211 failed (count: 0)
Nov 21 18:41:04 app/web.1:  Dalli/SASL authenticating as mUA9
Nov 21 18:41:04 app/web.1:  Dalli/SASL: mUA9
Nov 21 18:41:04 app/web.1:  dev2.ec2.memcachier.com:11211 failed (count: 0)
Nov 21 18:41:04 app/web.1:  Dalli/SASL authenticating as mUA9
Nov 21 18:41:04 app/web.1:  Dalli/SASL: mUA9
....

Which go on for a good 20-30 odd seconds until the request times out. Every so often a block of no server available errors:

Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available
Nov 21 18:41:17 app/web.2:  DalliError: No server available

Which I'm guessing occur every time rails tries to use the gem?

One thing I noticed is that the failed count doesn't appear to be increasing, from having a quick look through the source is that the failed count should be increasing, but it rarely does so.

Nov 21 18:41:44 app/web.2:  dev2.ec2.memcachier.com:11211 failed (count: 0)
Nov 21 18:41:44 app/web.2:  dev2.ec2.memcachier.com:11211 failed (count: 1)
Nov 21 18:41:44 app/web.2:  dev2.ec2.memcachier.com:11211 is down

Has anyone seen this kind of behaviour before, and is there any extra data I can provide?

dalli 2.5.0
ruby 1.9.3p286 (2012-10-12 revision 37165) [x86_64-linux]
rails 3.2.9
thin 1.5.0
@tibbon

Did you ever figure this out? I'm having some not all that dissimilar problems.

2013-02-07T23:54:31+00:00 app[web.1]: mc10.ec2.memcachier.com:11211 failed (count: 0)
2013-02-07T23:54:31+00:00 app[web.1]: Dalli/SASL authenticating as b8dfd6
2013-02-07T23:54:31+00:00 app[web.1]: Dalli/SASL: b8dfd6
2013-02-07T23:54:31+00:00 app[web.1]: mc10.ec2.memcachier.com:11211 failed (count: 0)
2013-02-07T23:54:31+00:00 app[web.2]: Started GET "/" for 174.129.174.36 at 2013-02-07 23:54:31 +0000
@BRMatt

Nope, seems we also had a splurge of errors on the 7th, albeit from 00:15 through 00:42

@mperham
Owner

That looks totally normal - when the network dies, you will see lots of logging like that.

@mperham mperham closed this
@BRMatt

True, it's down to the network dying, but when this occurs rails blocks trying to connect to the cache, causing page timeouts. As far as I understand it dalli's timeout/retry count should prevent that from happening?

@mperham
Owner

The timeout defaults to 0.5 sec and one retry, I think. That means you'll see one second of latency so page timeouts shouldn't be caused by Dalli unless your web server settings are very aggressive.

@BRMatt

Our server's setup to timeout after 30 seconds, often when memcache is down the app stalls for longer than that.
Looking at the logs posted above it looks like dalli might be stuck in some sort of reconnect loop, as when this bug occurs the count rarely goes above 0.

@lukewendling

any resolution to this? on occasion we get similar 'DalliError: No server available' log messages corresponding to H12 Request Timeout responses on Heroku.

the dalli configuration options related to timeouts don't seem to handle this problem. you can see below the request lasted for 54 secs.

sample log entries:

code=H12 desc="Request timeout" method=GET path="/locations/new-york" host=mysite.com connect=507ms service=54965ms status=503
us-east-3.heroku.prod.memcachier.com:11211 failed (count: 8) Timeout::Error: execution expired
DalliError: No server available
@petergoldstein
Collaborator

@lukewendling I'm not sure that your problem is related. If you provide the configuration for your Dalli client or ActiveSupport::Cache::DalliStore, including your ttl values and server list, that would help clarify your issue.

@lukewendling

@petergoldstein - Thanks, we use basic Rails config for the cache store, save for turning on compression. memcachier server names are taken from env vars (heroku).

config.cache_store = :dalli_store, nil, { compress: true }
@petergoldstein
Collaborator

@lukewendling Some followup questions:

  1. Are you using the latest Dalli release? If not, what are you using?
  2. How many servers are in your server list in the env var?
  3. Why are you expecting a timeout behavior at the cache layer when you're not setting the ttl configuration? Default ttl is 0, which is forever. What happens if you set an explicit ttl in your cache_store config?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.