Working:
Compilation:
Compile the program to get the class file and create a JAR to run on top of HDFS in Hadoop
Preferred JAR creation - Eclipse IDE
Extract input.zip to get input.csv
Execution:
A sample input file is provided from input/input.csv
Upload the JAR and the input file to HDFS
[Load the JAR to either the home directory or the bin directory in HDFS]
At the terminal, run the following
{
hadoop -jar [jar name] input/input.csv output/output.csv
}
An output directory is automatically created
When the job is completed, the output directory will have output.csv
Analysis:
Based on the need, a program like Microsoft Excel may be used to do analysis.
Future:
Other Hadoop ecosystems will be added for analysis and the program itself will be more generic and abstract such that any input can create any output efficiently on any given data set.
Nice to know:
Hadoop version : 1.5.x [stable]
Map Reduce version : 2.7
Eclipse IDE version : Kepler
OS Developed On : Mac OSX Yosemite
OS Tested On : RHEL
Cluster Config :
Hadoop-
Name Node : 1
Data Nodes : 2
Map Reduce-
Job Trackers : 1
Task Trackers : 2
Replication Factor : 2
Number of Mappers : 10
Number of Reducers : 2
-
Notifications
You must be signed in to change notification settings - Fork 0
Group input splits based on city and calculate total wages using Hadoop Map Reduce
License
gangodu/mapreduce
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Group input splits based on city and calculate total wages using Hadoop Map Reduce
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published