This git repository includes distributed system papers. It's important to read some papers for you to understand distributed system.
- Batch Computing
- Streaming Computing
- Distributed File System
- Distributed Memory System
- Key-Value System
- Resource Sharing and Scheduling System
- Distributed Machine Learning
- Deep Learning
- Other Resources
-
MapReduce: Simplified Data Processing on Large Clusters(2003 OSDI): This paper is a classical paper for distributed computing from Google cooperation. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key
Authors: Jeffrey Dean , Sanjay Ghemawa -
MapReduce Online(2010 NSDI): A modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. It alse supports online aggregation, which allows users to see "early returns" from a job as it is being computed.
Authors: Tyson Condie, Neil Conway, Peter Alvaro etc al.
- Discretized Streams: Fault-Tolerant Streaming Computation at Scale(2013 SOSP): A programing model. Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of the stream data.
Authors: Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, Ion Stoica - Storm @Twitter(2014 SIGMOD): Storm is a real time fault-tolerant and distributed stream data processing system. The basic execuation unit is called topology, which includes spout and bolt. It can return result at intermidiately time, which makes it different from other distributed data processing system
- The Google File System(2003 SOSP): Google File System is a scalable distributed file system for data-intensive applications. It provides fault tolerance while running inexpensive commandity hardware, and it delivers aggregate performance to a lot of clients. The main differences among others
Authors: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing(2012 NSDI): Resilient Distributed DataSets, whose short name is RDD, is a distributed storage system for Spark Distributed Computing System. It's useful to execute iterative computing and cache the intermidiate result in memory. The main difference between RDD between DSM is that it's created through coarse-fined transformation. It also provides fault tolerance and scalability.
Authors: Matei Zaharia, Mosharaf Chowdhury, Tathagata Das etc al.
- Succinct: Enabling Queries on Compressed Data(2015 NSDI): Succinct is a data store that enables directly queries on a compressed presentation of the input data. Succinct uses a compress technology, which is
archived through compressed suffix array, to make query more quickly. In addition, Succinct supports a range of queries including count and search of architary string. What differentiates Succinct from previous storage system is that it doesn't store index at all.
Authors: Rachit Agarwal, Anurag Khandelwal, Ion Stoica - Bigtable: A Distributed Storage System for Structured Data(2006 OSDI): Bigtable is a distributed storage system for managing structured data to scale to a very large size: Petabytes of data across thousands of commondity servers. A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. This map is indexed by row key, column key and timestamp key. The value is an uninterpreted array of bytes. It's a single-master distributed storage system. The underlying data structure for data storage is Log-Structued Merge Tree.
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center(2011 NSDI): Mesos is a platform for sharing commondity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI. Sharing improves cluster utilization and avoid per-framework data replication. Mesos shares resouces in fined-grained manner, allowing frameworks to achieve data locality by take turns reading data stored on each machine.
- Operating Systems Reading Group: Here is Operating Systems reading group from University of Cambrige. There are many interesting and fantastic papers in here, including traditional operation system and trends on operation system, such as libraos, plan9 etc al. You can choose what you are interested in to dive into it.
- Readings in Databases: A list of papers essential to understanding databases and building new data processing systems.
- Stream Processing Papers: Here includes stream processing and stream algorithm. You can find many interesting papers focused on streaming processing.