We’ve found 37 issues

#22

Book: Managing the Unexpected

Managing the Unexpected (Weick and Sutcliffe)

#17

Paper: Highly Available Transactions: Virtues and Limitations

Highly Available Transactions: Virtues and Limitations (Bailis et al.) A very recent but excellent paper.

#19

Paper: Crew Resource Management

Crew Resource Management: a Positive Change for the Fire Service Best article-length resource I've been able to find so far, probably can replace the current Wikipedia link.

#30

Book: Effective Monitoring and Alerting

Recommended in this reading list.

#29

Book: The Field Guide to Understanding Human Error

Recommended in this reading list. Maybe this turns out to be a better fit than #4.

#37

Kafka, The Log

Some good URLs around this that I know of: Kafka: A Distributed Messaging System for Log Processing (Kreps et al.) The Log: What every software engineer should know about real-time data's unifying ...

#10

Dynamo database paper

Dynamo is easily understandable and a good intro to distributed eventually consisted databases. http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf

#51

Paper: Automatic Management of Partitioned, Replicated Search Services

Automatic Management of Partitioned, Replicated Search Services (Leibert et al.) Nice short, practical paper on managing a replicated search service in production.

#66

High Performance Browser Networking

small typo in "High Performance Browser Networking".

#23

Post: How to lose $172,222 a second for 45 minutes

hello, educational article and post-mortem document http://pythonsweetness.tumblr.com/post/64740079543/how-to-lose-172-222-a-second-for-45-minutes