Permalink
Please sign in to comment.
| @@ -0,0 +1,65 @@ | |||
| +## Services Engineering Reading List | |||
| + | |||
| +#### Overview | |||
| + | |||
| +* A reading list for services engineering, with a focus on cloud infrastructure services | |||
| +* Most content is on applied distributed systems and systems operations | |||
| +* Goals | |||
| + * Collect relevant information in a single place | |||
| + * Provide insight into how organizations running services think about systems engineering | |||
| +* WIP, please send suggestions to [@mmcgrana](https://twitter.com/mmcgrana) | |||
| + | |||
| +#### Posts, Pages, and Decks | |||
| + | |||
| +* [Resilience Engineering: Part I](http://www.kitchensoap.com/2011/04/07/resilience-engineering-part-i/), [Part II](http://www.kitchensoap.com/2012/06/18/resilience-engineering-part-ii-lenses/) (Allspaw) | |||
| +* [Systems Engineering: a Great Definition](http://www.kitchensoap.com/2011/07/18/systems-engineering-great-definition/) (Allspaw) | |||
| +* [Some Rules for Engineering and Operations](http://blog.b3k.us/2012/01/24/some-rules.html) (Black) | |||
| +* [Service Level Disagreements Part I](http://blog.b3k.us/2009/07/15/service-level-disagreements.html), [Part II](http://blog.b3k.us/2009/07/16/service-level-disagreements-2.html) (Black) | |||
| +* [Design, Lessons, and Advice from Building Distributed Systems at Google](http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf) (Dean) | |||
| +* [You Can’t Sacrifice Partition Tolerance](http://codahale.com/you-cant-sacrifice-partition-tolerance/) (Hale) | |||
| +* [Customer Trust](http://perspectives.mvdirona.com/2013/01/15/CustomerTrust.aspx) (Hamilton) | |||
| +* [On Designing and Deploying Internet Scale Services](http://mvdirona.com/jrh/talksAndPapers/JamesRH_Lisa.pdf) (Hamilton) | |||
| +* [Observations on Errors, Corrections, & Trust of Dependent Systems](http://perspectives.mvdirona.com/2012/02/26/ObservationsOnErrorsCorrectionsTrustOfDependentSystems.aspx) (Hamilton) | |||
| +* [Service Design Best Practices](http://www.mvdirona.com/jrh/TalksAndPapers/JamesHamilton_POA20090226.pdf) (Hamilton) | |||
| +* [Life Beyond Distributed Transactions: An Apostate’s Opinion](http://cs.brown.edu/courses/cs227/archives/2012/papers/weaker/cidr07p15.pdf) (Helland) | |||
| +* [Notes on Distributed Systems for Young Bloods](http://www.somethingsimilar.com/2013/01/14/notes-on-distributed-systems-for-young-bloods/) (Hodges) | |||
| +* [The Network is Reliable](http://aphyr.com/posts/288-the-network-is-reliable) (Kingsbury) | |||
| +* [Call Me Maybe: Final Thoughts](http://aphyr.com/posts/286-call-me-maybe-final-thoughts) (Kingsbury) | |||
| +* [Getting Real About Distributed Systems Reliability](http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability) (Kreps) | |||
| +* [On HTTP Load Testing](http://www.mnot.net/blog/2011/05/18/http_benchmark_rules) (Nottingham) | |||
| +* [Observability at Twitter](https://blog.twitter.com/2013/observability-at-twitter) (Watson) | |||
| +* [Crew Resource Management](http://en.wikipedia.org/wiki/Crew_resource_management) (Wikipedia) | |||
| +* [Stevey’s Google Platforms Rant](https://plus.google.com/112678702228711889851/posts/eVeouesvaVX) (Yegge) | |||
| + | |||
| +#### Books | |||
| + | |||
| +* [Agile Retrospectives: Making Good Teams Great](http://www.amazon.com/Agile-Retrospectives-Making-Teams-Great/dp/0977616649) (Derby et al.) | |||
| +* [Better: A Surgeon’s Notes on Performance](http://www.amazon.com/dp/0312427654) (Gawande) | |||
| +* [The Checklist Manifesto: How to Get Things Right](http://www.amazon.com/The-Checklist-Manifesto-ebook/dp/B0030V0PEW) (Gawande) | |||
| + | |||
| +#### Papers | |||
| + | |||
| +* [MapReduce: Simplified Data Processing on Large Clusters](http://research.google.com/archive/mapreduce-osdi04.pdf) (Dean and Ghemawat) | |||
| +* [The Chubby Lock Service for Loosely Coupled Distributed Systems](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/chubby-osdi06.pdf) (Burrows) | |||
| +* [Dappar, A Large Scale Distributed Systems Tracing Infrastructure](http://research.google.com/pubs/archive/36356.pdf) (Sigelman et al.) | |||
| +* [The Google File System](http://research.google.com/archive/gfs-sosp2003.pdf) (Ghemawat et al.) | |||
| +* [Spanner: Google’s Globally-Distributed Database](http://research.google.com/archive/spanner-osdi2012.pdf) (Corbett et al.) | |||
| +* [Making Reliable Distributed Systems in the Presence of Software Errors](http://www.erlang.org/download/armstrong_thesis_2003.pdf) (Armstrong) | |||
| +* [Failure Trends in a Large Disk Drive Population](http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/disk_failures.pdf) (Pinheiro et al.) | |||
| +* [Fallacies of Distributed Computing Explained](http://www.rgoarchitects.com/Files/fallacies.pdf) (Arnon Rotem-Gal-Oz) | |||
| +* [F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business](http://research.google.com/pubs/archive/38125.pdf) (Shute et al.) | |||
| + | |||
| +#### Research Groups | |||
| + | |||
| +* [Berkeley Database Group](http://db.cs.berkeley.edu/w/) | |||
| +* [Google Research](http://research.google.com/) | |||
| +* [Microsoft Systems Research](http://research.microsoft.com/en-US/groups/sr/default.aspx) | |||
| + | |||
| +#### Conferences | |||
| + | |||
| +* [Ricon](http://ricon.io/) | |||
| +* [Velocity](http://velocityconf.com/) | |||
| + | |||
| +#### Courseware | |||
| + | |||
| +* [University of Illinois CS 525: Advanced Distributed Systems](http://courses.engr.illinois.edu/cs525/sp2011/sched.htm) | |||
0 comments on commit
7dec478