Could be a good survey / practical resource on operable apps. I think I've read this but it was a while ago, so I need to review.
http://en.wikipedia.org/wiki/CAP_theorem I don't have any recommendations for specific papers currently, but I think it's an important concept for engineers to learn!
http://odbms.org/download/dean-keynote-ladis2009.pdf (Design, Lessons, and Advice from Building Distributed Systems at Google) leads to 404 page.
https://zvzzt.wordpress.com/2012/08/16/a-note-on-uptime/ (candidate resource)
Book: Service Availability Principles (by the SA Forum)
http://www.amazon.co.uk/Service-Availability-Principles-Maria-Toeroe-ebook/dp/B007KGE02G
Raft is the algorithm implemented on etcd and by extension CoreOS. I think it's a great addition to this list. Nice list BTW !
Dotscale conference which just closed yesterday is just about that. It is in my view a nice addition to the conference list, and the talks there are independently curated (not sponsored). http://dotscale.eu ...
Scaling Big Data Mining Infrastructure: The Twitter Experience (Lin and Rayboy) The Unified Logging Infrastructure for Data Analytics at Twitter (Lee et al.)