Skip to content

rsgaikwad/big_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simplify - Big Data Analytics

This repository contains the code for building Big Data Analytics

Workshop Outline

Day 1 - Big Data and Hadoop

Section 1: Understanding - Big Data

        1.1 Intorduction to Big Data 

        1.2 Characteristics of Big Data

        1.3 Big Data Technology

        1.4 Summary

Section 2: Building - Hadoop

        2.1 Introduction to Hadoop

        2.2 Application of Hadoop

        2.3 HDFS Features

        2.4 Architecture

        2.5 Read-Write Operations

        2.6 Setup  and Configure Hadoop 

Section 3: HDFS

        3.1 Command Line Utilities 

        3.2 Hadoop Commands

        3.3 Practicals

        3.4 Summary

Section 4: MapReduce

        4.1 Case Study

        4.2 Introduction to Map Reduce

        4.3 Practicals

        4.4 Summary           

Day 2 - Hadoop Ecosystem

Section 5: Hadoop Ecosystem - Hive

        5.1 Intorduction to Hive

        5.2 Why Hive?

        5.3 Case Study

        5.4 Summary

Section 6: Hadoop Ecosystem - Pig

        6.1 Intorduction to Pig

        6.2 Pig Components

        6.3 Case Study

        6.4 Summary

Section 7: Hadoop Ecosystem - Sqoop

        7.1 Intorduction to Sqoop

        7.2 Sqoop Workflow

        7.3 Commands

        7.4 Summary  

Section 8: Hadoop Ecosystem - Impala

        8.1 Intorduction to Impala

        8.2 Components of Impala

        8.3 Impala Integrations

        8.4 Summary  

Day 3 - Analytics Tools

Section 9: Hadoop Ecosystem - Spark

        9.1 Intorduction to Spark

        9.2 Understanding of Real Time Analytics

        9.3 Spark Features

        9.4 Summary  

Section 10: Python

        10.1 Intorduction to Python

        10.2 Programming Basics

        10.3 Application of Python

        10.4 Summary   

Section 11: NoSQL Database - Elastic Search

        11.1 Intorduction to Elastic Search

        11.2 ES Key Concepts

        11.3 ES Operations

        11.4 Analytics using ES            

Installation Instructions

System Requirement:

OS : Centos 6 / MacOs

RAM : 8+ GB

HDD : 50 GB (non root)

Software requirement:

Please download and activate below softwares or VM's

Latest Virtual Box : https://www.virtualbox.org/wiki/Downloads

Latest VM : https://my.vmware.com/en/web/vmware/free#desktop_end_user_computing/vmware_workstation_player/15_0

VMware Fusion (For Mac) : https://my.vmware.com/en/web/vmware/info/slug/desktop_end_user_computing/vmware_fusion/11_0

Putty (for windows) / Terminal : https://www.putty.org/

Browser - FireFox / Chrome

Cloudera VM & VirtualBox : https://www.cloudera.com/downloads/quickstart_vms/5-13.html

Download ElasticSeacrh Virtual Box

https://bitnami.com/stack/elasticsearch/virtual-machine

https://docs.bitnami.com/virtual-machine/faq/get-started/connect-ssh/

https://docs.bitnami.com/virtual-machine/faq/get-started/find-credentials/

Create Git Account:

https://github.com/

Setup Elastic Search on Cloud

https://www.elastic.co/cloud/elasticsearch-service/signup

About

Big Data Analytics

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages