Firedrills and failure simulations
Firedrills, failure simulations, chaos monkeys, and before/after failure testing.
Everything should have an explicit limit, even if very high, backpressure everywhere, etc.
Paper: Your Server as a Function
by Marius Eriksen from Twitter Available from: http://monkey.org/~marius/funsrv.pdf Abstract: Building server software in a large-scale setting, where systems exhibit a high degree of concurrency and ...
Basic material on capacity planning. Best suggestion so far is 'The Art of Capacity Planning' as discussed in #21.
Courseware: Computer and Network Security
https://engineering.purdue.edu/kak/compsec/Lectures.html
The Spark work is really interesting and there are some good papers on it: http://people.csail.mit.edu/matei/papers/2010/hotcloud_spark.pdf https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf ...
Some good URLs around this that I know of: Kafka: A Distributed Messaging System for Log Processing (Kreps et al.) The Log: What every software engineer should know about real-time data's unifying ...
General paper- or book-length resources on security engineering practices.