Papers
- Making Reliable Distributed Systems in the Presence of Software Errors (Armstrong)
- Highly Available Transactions: Virtues and Limitations (Bailis et al.)
- The Chubby Lock Service for Loosely Coupled Distributed Systems (Burrows)
- Bigtable: a Distributed Storage System for Structured Data (Chang et al.)
- Spanner: Google’s Globally-Distributed Database (Corbett et al.)
- Dynamo: Amazon’s Highly Available Key-Value Store (DeCandia et al.)
- MapReduce: Simplified Data Processing on Large Clusters (Dean and Ghemawat)
- The Google File System (Ghemawat et al.)
- Out of the Tar Pit (Moseley and Marks)
- In Search of an Understandable Consensus Algorithm (Ongaro and Ousterhout)
- Failure Trends in a Large Disk Drive Population (Pinheiro et al.)
- Fallacies of Distributed Computing Explained (Rotem-Gal-Oz)
- F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business (Shute et al.)
- Dapper, A Large Scale Distributed Systems Tracing Infrastructure (Sigelman et al.)
-
Crew Resource Management: a Positive Change for the Fire Service
Crew Resource Management: a Positive Change for the Fire ServicePosts
Resilience Engineering: Part I, Part II (Allspaw)
- Systems Engineering: a Great Definition (Allspaw)
- Some Rules for Engineering and Operations (Black)
- Service Level Disagreements Part I, Part II (Black)
- Design, Lessons, and Advice from Building Distributed Systems at Google (Dean)
- My Philosophy on Alerting (Ewaschuk)
- You Can’t Sacrifice Partition Tolerance (Hale)
- Customer Trust (Hamilton)
- Observations on Errors, Corrections, & Trust of Dependent Systems (Hamilton)
- Life Beyond Distributed Transactions: An Apostate’s Opinion (Helland)
- Notes on Distributed Systems for Young Bloods (Hodges)
- The Network is Reliable (Kingsbury)
- Call Me Maybe: Final Thoughts (Kingsbury)
- Getting Real About Distributed Systems Reliability (Kreps)
- On HTTP Load Testing (Nottingham)
- Observability at Twitter (Watson)
- Stevey’s Google Platforms Rant (Yegge)
Posts
- Resilience Engineering: Part I, Part II (Allspaw)
- Systems Engineering: a Great Definition (Allspaw)
- Some Rules for Engineering and Operations (Black)
- Service Level Disagreements Part I, Part II (Black)
- Design, Lessons, and Advice from Building Distributed Systems at Google (Dean)
- My Philosophy on Alerting (Ewaschuk)
- You Can’t Sacrifice Partition Tolerance (Hale)
- Customer Trust (Hamilton)
- Observations on Errors, Corrections, & Trust of Dependent Systems (Hamilton)
- Life Beyond Distributed Transactions: An Apostate’s Opinion (Helland)
- Notes on Distributed Systems for Young Bloods (Hodges)
- The Network is Reliable (Kingsbury)
- Call Me Maybe: Final Thoughts (Kingsbury)
- Getting Real About Distributed Systems Reliability (Kreps)
- On HTTP Load Testing (Nottingham)
- Observability at Twitter (Watson)
- Stevey’s Google Platforms Rant (Yegge)
Presentations
- On Designing and Deploying Internet Scale Services (Hamilton)
- Service Design Best Practices (Hamilton)
0 comments on commit
c21f0ef