Skip to content

Java program that uses Hadoop Map-Reduce for calculating the number of products and sales by country

Notifications You must be signed in to change notification settings

nikopetr/Hadoop-MapReduce-Calculating-Sales-by-Country

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Hadoop-MapReduce-Calculating-Sales-by-Country

The implementation consists of a Java program that utilizes the Hadoop Map-Reduce framework for calculating the number of products and sales by country

Author: Nikolas Petrou, MSc in Data Science

Task and Data

Specifically this task focuses on finding the number of products and the sum of sales per country given the input file SalesJan2009.csv

You can download & upload the aforementioned UNIX dictionary file to your own HDFS filesystem using the following commands:

Implementation

Output file example (part-r-00000):

  • Argentina 1 1200
  • Australia 38 64800
  • Austria 7 10800

The main idea of this problem's solution is to use the same Key for every row with the same country name. In addition, the value used at each mapper will be the price (sales) of that row, which corresponds to the Key country.

In addition, since for this task we would like to output multiple values for each key, the code utilizes a custom made class that implements the Writeable Interface. A custom hadoop writable data type which needs to be used as value field in Mapreduce programs must implement Writable interface org.apache.hadoop.io.Writable.

The desired output of the program is located in the part-r-00000 file, while the code file is located in the Sales.java file. There are more than enough comments which explain the whole implementation very analytically.

Helpful Material-Links

If you are not very familiar with the Hadoop Map-Reduce framework, the following sites provide useful information for understanding some basic concepts, as well as some of the ideas of this task:

Fundamentals of MapReduce with MapReduce Example

Creating Custom Hadoop Writable Data Type

MSc in Data Science Programme

About

Java program that uses Hadoop Map-Reduce for calculating the number of products and sales by country

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages