Some good URLs around this that I know of: Kafka: A Distributed Messaging System for Log Processing (Kreps et al.) The Log: What every software engineer should know about real-time data's unifying ...
The Spark work is really interesting and there are some good papers on it: http://people.csail.mit.edu/matei/papers/2010/hotcloud_spark.pdf https://www.usenix.org/system/files/conference/nsdi12/nsdi12-final138.pdf ...
Courseware: Computer and Network Security
https://engineering.purdue.edu/kak/compsec/Lectures.html
Basic material on capacity planning. Best suggestion so far is 'The Art of Capacity Planning' as discussed in #21.
Paper: Your Server as a Function
by Marius Eriksen from Twitter Available from: http://monkey.org/~marius/funsrv.pdf Abstract: Building server software in a large-scale setting, where systems exhibit a high degree of concurrency and ...
Everything should have an explicit limit, even if very high, backpressure everywhere, etc.
Firedrills and failure simulations
Firedrills, failure simulations, chaos monkeys, and before/after failure testing.
Book: The Field Guide to Understanding Human Error
Recommended in this reading list. Maybe this turns out to be a better fit than #4.
How to know where to start? Knowledge map?
How well does this content fit a knowledge map structure? The list is great but it's also large-ish and growing. Having a logical starting point (perhaps per high-level topic) might be interesting. ...