Project Learnings:
-
Performed some Analytics on the 1.5 GB Dataset by writing various queries on HBASE.
-
Understood the Hadoop Ecosystem and their components
-
Set up Hbase on Cloudera Platform.
-
Performed CRUD Operations (Create, Read, Update and Delete) on the Hbase
-
Investigated various Distributed System Aspects including a. Scalability: By adding multiple Hbase Region servers b. Fault Tolerance: By examining Hadoop fault-tolerance and replication mechanism c. Data Replication: By investigating HDFS data replication mechanism d. Data Consistency: By performing concurrent CRUD operations
-
Set Up HBase on Amazon EMR instance and performed the Big Data Analytics on the dataset by writing various queries on Hbase set up on AWS.