Skip to content

Commit

Permalink
cap followup article response
Browse files Browse the repository at this point in the history
  • Loading branch information
atbarker committed Oct 3, 2018
1 parent 1fc9079 commit e9bd08e
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions responses/9711-response-2018-10-03.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
This isn't quite a paper but a magazine article. It is a sort of retrospective on the CAP theorem written in 2012. The article first defines CAP and acknowledges it as a contributing factor to the rise of NoSQL database systems in the last few decades. It starts off by stating that what it calls the "2 of 3" conclusion from the CAP theorem where one can only have two of the three components of the cap theorem without a grey area of stronger/weaker forms of consistency, availability, etc is an oversimplification. It describes the strong consistency of ACID versus the high availability of something called BASE (basically available-soft state-eventually consistent, which I hadn’t heard of until now) and how that lead to a way to formally explore different system design options. It goes on to mention how one cannot discuss CAP without addressing issues of latency through how it relates to how network partitions are seen by the system. A system could decide that a partition exists due to abnormally high latency. Formally the author calls a partition a time bound on communication. The article brings up the good point of offline mode and how that relates to the CAP theorem. I have to wonder how browser login sessions relate to the CAP theorem? If offline functionality is present then they clearly favor availability but then that can create consistency concerns later down the road when the network partition is resolved. This seems similar to the web caching mentioned in the Lynch paper.

The article presents a basic process for managing partitions in a system where availability is maintained and consistency is recovered when the partition is resolved. After just reading the Chord paper I notice that they sacrifice availability in order to better maintain consistency in the event that a node enters or leaves a cluster. The author explores how to limit operations of a system in the event of a partition in order to maintain system invariants. Partition recovery seems to be more difficult due to two problems. How to get both sides of the partition consistent and how to compensate for mistakes. The article mentions how systems such as Bayou and CVS version control handle this problem by replaying events starting from when the partition was created. I am wondering why the article didn’t mention Git. Other examples of conflict resolution given are Google Docs and commutative replicated data types. In the end the author uses an ATM as a framework for exploring partition resolution in a system that seems to favor availability over consistency.

Overall I learned more from this paper than the previous one. This makes clear many of the ambiguities and easy pitfalls of the original CAP paper. For instance where exactly network latency comes into play and providing some stronger real world examples of a CAP tradeoff. This article overall was very understandable and I don’t think there is anything major that I could not comprehend.

My research question is how many systems are in use today in which the CAP theorem had an influence on their design? Also more interestingly are there any that didn’t? How do they reason about what appears to be a relatively obvious set of trade-offs in distributed system design? First concrete step would be to read through distributed systems papers and see which don’t mention the CAP theorem.

0 comments on commit e9bd08e

Please sign in to comment.