#30

Book: Effective Monitoring and Alerting

Recommended in this reading list.

#29

Book: The Field Guide to Understanding Human Error

Recommended in this reading list. Maybe this turns out to be a better fit than #4.

#27

Presentation: Practicalities of Productionizing Distributed Systems

Video is here, need to find the slides though.

  • Opened by mmcgrana Dec 29, 2013
  • 1 comment
#25

Added Micro Service presentation

This presentation about designing and building micro services is fantastic. One of the best explanations I've come across.

  • Opened by kyleboon Dec 15, 2013
  • 1 comment
#23

Post: How to lose $172,222 a second for 45 minutes

hello, educational article and post-mortem document http://pythonsweetness.tumblr.com/post/64740079543/how-to-lose-172-222-a-second-for-45-minutes

#22

Book: Managing the Unexpected

Managing the Unexpected (Weick and Sutcliffe)

#21

Add 'The Art of Capacity Planning'

This was recommended to me by @chooper and is a tremendous help getting started with capacity planning.

  • Opened by naaman Oct 18, 2013
  • 3 comments
#19

Paper: Crew Resource Management

Crew Resource Management: a Positive Change for the Fire Service Best article-length resource I've been able to find so far, probably can replace the current Wikipedia link.

#17

Paper: Highly Available Transactions: Virtues and Limitations

Highly Available Transactions: Virtues and Limitations (Bailis et al.) A very recent but excellent paper.

#16

Book: Security Engineering

Security Engineering (Anderson)

  • Opened by mmcgrana Oct 9, 2013
  • 2 comments