Skip to content

qzhong0605/distributed-computing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 

Repository files navigation

Readings of Distributed System

This git repository includes distributed system papers. It's important to read some papers for you to understand distributed system.

  1. Batch Computing
  2. Streaming Computing
  3. Distributed File System
  4. Distributed Memory System
  5. Key-Value System
  6. Resource Sharing and Scheduling System
  7. Distributed Machine Learning
  8. Deep Learning
  9. Other Resources
  • MapReduce: Simplified Data Processing on Large Clusters(2003 OSDI): This paper is a classical paper for distributed computing from Google cooperation. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key
    Authors: Jeffrey Dean , Sanjay Ghemawa

  • MapReduce Online(2010 NSDI): A modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. It alse supports online aggregation, which allows users to see "early returns" from a job as it is being computed.
    Authors: Tyson Condie, Neil Conway, Peter Alvaro etc al.

Table of Content

Table of Contents

  • The Google File System(2003 SOSP): Google File System is a scalable distributed file system for data-intensive applications. It provides fault tolerance while running inexpensive commandity hardware, and it delivers aggregate performance to a lot of clients. The main differences among others
    Authors: Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung

Table of Contents

Table of Contents

  • Succinct: Enabling Queries on Compressed Data(2015 NSDI): Succinct is a data store that enables directly queries on a compressed presentation of the input data. Succinct uses a compress technology, which is archived through compressed suffix array, to make query more quickly. In addition, Succinct supports a range of queries including count and search of architary string. What differentiates Succinct from previous storage system is that it doesn't store index at all.
    Authors: Rachit Agarwal, Anurag Khandelwal, Ion Stoica
  • Bigtable: A Distributed Storage System for Structured Data(2006 OSDI): Bigtable is a distributed storage system for managing structured data to scale to a very large size: Petabytes of data across thousands of commondity servers. A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. This map is indexed by row key, column key and timestamp key. The value is an uninterpreted array of bytes. It's a single-master distributed storage system. The underlying data structure for data storage is Log-Structued Merge Tree.

Table of Contents

  • Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center(2011 NSDI): Mesos is a platform for sharing commondity clusters between multiple diverse cluster computing frameworks, such as Hadoop and MPI. Sharing improves cluster utilization and avoid per-framework data replication. Mesos shares resouces in fined-grained manner, allowing frameworks to achieve data locality by take turns reading data stored on each machine.

Table of Contents

  • Operating Systems Reading Group: Here is Operating Systems reading group from University of Cambrige. There are many interesting and fantastic papers in here, including traditional operation system and trends on operation system, such as libraos, plan9 etc al. You can choose what you are interested in to dive into it.
  • Readings in Databases: A list of papers essential to understanding databases and building new data processing systems.
  • Stream Processing Papers: Here includes stream processing and stream algorithm. You can find many interesting papers focused on streaming processing.

Table of Contents

Releases

No releases published

Packages

No packages published