Skip to content

A secondary sorting algorithm comparo between mapreduce and spark

Notifications You must be signed in to change notification settings

stdatalabs/secondarysort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce VS Spark - Secondary Sort Example

Comparing MapReduce to Spark using Secondary Sort example.

Requirements

  • IDE
  • Apache Maven 3.x
  • JVM 6 or 7

General Info

The repository contains both MapReduce and Spark projects MRSecondarySort and SparkSecondarySort

  • com/stdatalabs/SparkSecondarySort
    • Driver.scala -- Spark code to perform Secondary Sorting
  • com/stdatalabs/MRSecondarySort
    • PersonMapper.java -- Reads lastname and firstname and outputs (Person, firstname) as key-value pair
    • PersonReducer.java -- Reads the list of (Person, firstname) key-value pair and outputs sorted list of (lastname, firstname) in 2 output files
    • PersonPartitioner.java -- Partitions the Person composite key based on lastname
    • PersonSortingComparator.java -- Sorts the mapper output based on lastname and then firstname
    • PersonGroupingComparator.java -- Groups keys with its list of values before sending to reducer
    • Driver -- Driver program for MapReduce jobs

Description

More articles on hadoop technology stack at stdatalabs

About

A secondary sorting algorithm comparo between mapreduce and spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published