Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Reading List in Data Systems

A list of papers, articles, and online resources I have found essential to understanding data-intensive systems and building new data systems. The list is curated and maintained by Sujith Jay Nair (@sujithjay). If you think a paper should be part of this list, please submit a pull request. I will add it to the list once I peruse the paper. Please make sure the subject-matter of the paper is within the realm of either i) understanding data systems, or ii) building data systems.

Data systems are defined to include:

  • Database systems
  • Data processing systems

This list is inspired by Reynold Xin's list on Database Readings, and is a work in progress.

Table of Contents

  1. Consistency and Consensus
  2. Query Processing
  3. State and Stream
  4. Database Design

Consistency and Consensus

Query Processing

State and Stream

  • Data in Flight (2010): Introduces a model of streams as a superset of the relational model. Streams introduce a notion of time (processing-time, IMO) to the relational model. I explore a similar idea in this post. In a relational table, data is persistent and query is transient; in a stream, query is persistent and data is transient.

Database Design

  • Dynamo: Amazon’s Highly Available Key-value Store (2007): This paper on Dynamo (not to be confused with DynamoDB, which is 'built on the principles of Dynamo') is an excellent primer on understanding concepts behind high-availability storage systems; concepts such as Consistent Hashing, Sloppy Quorum, Anti-entropy processes, and Gossip.

  • Cassandra - A Decentralized Structured Storage System (2009): Cassandra is one of many data storage systems heavily influenced by Dynamo. However, important differences exist. I have written about it in this post.


No releases published


No packages published