Skip to content
This repository has been archived by the owner. It is now read-only.

handle memcached connection events #83

Closed
jrgm opened this issue Jun 2, 2015 · 3 comments
Closed

handle memcached connection events #83

jrgm opened this issue Jun 2, 2015 · 3 comments

Comments

@jrgm
Copy link
Contributor

@jrgm jrgm commented Jun 2, 2015

In production, the customs server on a single instance of fxa-auth-server got into a state where it was failing every request with this error:

{"name":"customs-server","hostname":"ip-172-31-0-155","pid":2612,
 "level":50,"op":"memcachedError",
 "err":{
   "name":"RejectionError",
   "message":"Server not available",
   "cause":{},"stack":"Error: Server not available
    at Client.memcachedCommand [as command] (/data/fxa-customs-server/node_modules/memcached/lib/memcached.js:297:70)
    at Client.setters (/data/fxa-customs-server/node_modules/memcached/lib/memcached.js:916:10)
    at Client.bowlofcurry [as set] (/data/fxa-customs-server/node_modules/memcached/lib/utils.js:126:15)
    at Client.setAsync (eval at makeNodePromisifiedEval (/data/fxa-customs-server/node_modules/bluebird/js/main/promisify.js:195:12), <anonymous>:2:321)
    at setRecords (/data/fxa-customs-server/bin/customs_server.js:78:10)
    at fetchRecords.spread.then.log.info.op (/data/fxa-customs-server/bin/customs_server.js:118:18)
    at tryCatchApply (/data/fxa-customs-server/node_modules/bluebird/js/main/util.js:83:19)
    at Promise$_callSpread [as _callSpread] (/data/fxa-customs-server/node_modules/bluebird/js/main/promise.js:683:12)
    at Promise$_callHandler [as _callHandler] (/data/fxa-customs-server/node_modules/bluebird/js/main/promise.js:691:18)
    at Promise$_settlePromiseFromHandler [as _settlePromiseFromHandler] (/data/fxa-customs-server/node_modules/bluebird/js/main/promise.js:711:18)
    at Promise$_settlePromiseAt [as _settlePromiseAt] (/data/fxa-customs-server/node_modules/bluebird/js/main/promise.js:868:14)
    at Promise$_settlePromises [as _settlePromises] (/data/fxa-customs-server/node_modules/bluebird/js/main/promise.js:1006:14)
"},"msg":"","time":"2015-05-28T02:55:35.054Z","v":0}

There is no hint in the logs as to what triggered this server to enter this state. The log was completely normal until all connection would fail. However, the AWS memcached service was up and healthy and could be connected to over TCP from the same box that was reporting this error. A look at the customs server process with lsof looked normal, except there were no TCP sockets in any state for the memcached port number 11211. An strace showed that upon receiving a /check HTTP request from the fxa-auth-server, no attempt would be made to initiate a TCP connection to the memcached service. Quite strange.

I was able to get a customs server to stage to produce the same stack trace by doing sudo iptables -A OUTPUT -d <memcached IP> -j DROP and sending a steady stream of /auth/login requests to the auth-server. But when I would remove the iptables block, the server would eventually return to normal, in my tests.

However, I wonder if we need to directly handle some of the connection events in https://github.com/3rd-Eden/memcached#events, for post-mortem analysis, if nothing else. Or maybe add an internal self-check that forces a restart if we can't connect to memcached.

@dannycoates
Copy link
Member

@dannycoates dannycoates commented Jun 2, 2015

Thanks @jrgm, I'll take a look

@dannycoates dannycoates self-assigned this Jun 2, 2015
@rfk rfk modified the milestones: train-39, train-40 Jun 3, 2015
@rfk rfk modified the milestones: train-41, train-40 Jun 24, 2015
@rfk rfk removed this from the train-41 milestone Aug 18, 2015
@rfk
Copy link
Member

@rfk rfk commented Nov 25, 2015

This has not re-ocurred, but we should keep the bug open and follow up on the suggestion to handle connection events from the memcached driver. We can at least log them for debugging purposes.

@rfk rfk changed the title production server failing to connect to memcached service handle memcached connection events Apr 12, 2016
@rfk
Copy link
Member

@rfk rfk commented May 31, 2016

We're going to try to move away from memcached for this service, so I'm closing memcached-related bugs in the backlog

@rfk rfk closed this May 31, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants