HDFS_replication

HDFS can partition large files into blocks to share the storage across many workers, and it can replicate those blocks so that data is not lost even if some workers die.

In this project, we deployed a small HDFS cluster and uploaded a large file to it, with different replication settings. We've written Python code to read the file. When data is partially lost (due to a node failing), our code will recover as much data as possible from the damaged file.

Learning objectives:

use the HDFS command line client to upload files;

use the webhdfs API (https://hadoop.apache.org/docs/r1.0.4/webhdfs.html) to read files;

measure the impact buffering has on read performance;

relate replication count to space efficiency and fault tolerance

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
image		image
nb		nb
.DS_Store		.DS_Store
README.md		README.md
docker-compose.yml		docker-compose.yml
main.sh		main.sh
workers.sh		workers.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HDFS_replication

About

Releases

Packages

Languages

smileysim01/HDFS_replication

Folders and files

Latest commit

History

Repository files navigation

HDFS_replication

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages