Skip to content

sujithjay/data-readings

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

Reading List in Data Systems

A list of papers, articles, and online resources I have found essential to understanding data-intensive systems and building new data systems. The list is curated and maintained by Sujith Jay Nair (@sujithjay). If you think a paper should be part of this list, please submit a pull request. I will add it to the list once I peruse the paper. Please make sure the subject-matter of the paper is within the realm of either i) understanding data systems, or ii) building data systems.

Data systems are defined to include:

  • Database systems
  • Data processing systems

This list is inspired by Reynold Xin's list on Database Readings, and is a work in progress.

Table of Contents

  1. Consistency and Consensus
  2. Query Processing
  3. State and Stream
  4. Database Design

Consistency and Consensus

Query Processing

State and Stream

  • Data in Flight (2010): Introduces a model of streams as a superset of the relational model. Streams introduce a notion of time (processing-time, IMO) to the relational model. I explore a similar idea in this post. In a relational table, data is persistent and query is transient; in a stream, query is persistent and data is transient.

Database Design

  • Dynamo: Amazon’s Highly Available Key-value Store (2007): This paper on Dynamo (not to be confused with DynamoDB, which is 'built on the principles of Dynamo') is an excellent primer on understanding concepts behind high-availability storage systems; concepts such as Consistent Hashing, Sloppy Quorum, Anti-entropy processes, and Gossip.

  • Cassandra - A Decentralized Structured Storage System (2009): Cassandra is one of many data storage systems heavily influenced by Dynamo. However, important differences exist. I have written about it in this post.

Releases

No releases published

Packages

No packages published