Skip to content

smileysim01/HDFS_replication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDFS_replication

HDFS can partition large files into blocks to share the storage across many workers, and it can replicate those blocks so that data is not lost even if some workers die.

In this project, we deployed a small HDFS cluster and uploaded a large file to it, with different replication settings. We've written Python code to read the file. When data is partially lost (due to a node failing), our code will recover as much data as possible from the damaged file.

Learning objectives:

use the HDFS command line client to upload files;

use the webhdfs API (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html) to read files;

measure the impact buffering has on read performance;

relate replication count to space efficiency and fault tolerance

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published