Book: Service Availability Principles (by the SA Forum)
http://www.amazon.co.uk/Service-Availability-Principles-Maria-Toeroe-ebook/dp/B007KGE02G
Paper: Automatic Management of Partitioned, Replicated Search Services
Automatic Management of Partitioned, Replicated Search Services (Leibert et al.) Nice short, practical paper on managing a replicated search service in production.
Paper: A brief history of Consensus, 2PC and Transaction Commit
A brief history of Consensus, 2PC and Transaction Commit (Mc Keown) From #40.
Book: The Field Guide to Understanding Human Error
Recommended in this reading list. Maybe this turns out to be a better fit than #4.
http://www.amazon.com/Inviting-Disaster-James-R-Chiles-ebook/dp/B0018ND83Y This book was in one of Nygard's footnotes in Release It! It's an awesome collection of research into catastrophic failures ...