- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1
Course Content
        Mathan raj edited this page Oct 19, 2016 
        ·
        1 revision
      
    #Big Data Course Syllabus# This is taken from sources in internet and compiled -- just to serve as a skeleton for what we are going to learn (not intend to any copyright infringements.
Introduction to Hadoop and Big Data:
- What is Big Data.
- What are the challenges for processing big data?
- What technologies support big data?
- What is Hadoop?
- Why Hadoop?
- History of Hadoop
- Use cases of Hadoop
- RDBMS vs Hadoop
- When to use and when not to use Hadoop
- Ecosystem tour
- Vendor comparison
- Hardware Recommendations & Statistics
HDFS: Hadoop Distributed File System: — Significance of HDFS in Hadoop
- Features of HDFS
- 5 daemons of Hadoop
- Name Node and its functionality
- Data Node and its functionality
- Secondary Name Node and its functionality
- Job Tracker and its functionality
- Task Tracker and its functionality
 
- Data Storage in HDFS
- Introduction about Blocks
- Data replication
 
- Accessing HDFS
- CLI (Command line Interface) and admin commands
- Java Based Approach
 
- Fault tolerance
- Download Hadoop
- Installation and set-up of Hadoop
- Start-up & Shut down process
 
- HDFS Federation
YARN
Map Reduce:
- Map Reduce Story
- Map Reduce Architecture
- How Map Reduce works
- Developing Map Reduce
- Map Reduce Programming Model*
- Different phases of Map Reduce Algorithm.
- Different Data types in Map Reduce.
- how Write a basic Map Reduce Program.
- Driver Code
- Mapper
- Reducer
 
 
- Creating Input and Output Formats in Map Reduce Jobs
- Text Input Format
- Key Value Input Format
- Sequence File Input Format 1. Data localization in Map Reduce 2. Combiner (Mini Reducer) and Partitioner 3. Hadoop I/0 4. Distributed cache
 
PIG
- Introduction to Apache Pig
- Map Reduce Vs. Apache Pig
- SQL vs. Apache Pig
- Different data types in Pig
- Modes of Execution in Pig
- Grunt shell
- Loading data
- Exploring Pig
- Latin commands
HIVE
- Hive introduction
- Hive architecture
- Hive vs RDBMS
- HiveQL and the shell
- Managing tables (external vs managed)
- Data types and schemas
- Partitions and buckets HBASE
- Architecture and schema design
- HBase vs. RDBMS
- HMaster and Region Servers
- Column Families and Regions
- Write pipeline
- Read pipeline
- HBase commands
FLUME
SQOOP