Skip to content

Handle an outage

Brian Clozel edited this page Jul 10, 2015 · 5 revisions

Getting notifications and certain key legacy web properties such as are monitored by Pingdom and will send alerts after 3 consecutive minutes of downtime.

Alerts are sent:

  • via email, iOS or Twitter for configured users
  • to the Sagan Slack room

When an outage occurs

  • Post a message in the Sagan Slack room that lets folks know you're aware of the problem and looking into it. Specifically @-mention any other admins you know may be on duty (especially those that may be getting woken up).
  • Notice whether other properties, e.g. are down as well if multiple properties are down, this may indicate a problem with itself. See "Contacting the CF team" below.
  • Verify that the site is actually down:
curl -I    # 404? 200?
  • Verify CloudFlare's status since some outages can be caused by our CDN
  • If the outage was significant in length, post a message from the @SpringOps account letting folks know we're back up after a bit of downtime.
You can’t perform that action at this time.