Hadoop-MapReduce-Python

Description

suppose we have the following two datasets :

Users (id, email, language, location)
Transactions (transaction-id, product-id, user-id, purchase-amount, item-description)

Example

Users (id, email, language, location)

user_id	email	language	location
1001	A@gmail.com	English	Germany
1002	B@gmail.com	Arabic	Egypt
1003	C@yahoo.com	French	Canada
1004	D@fcih.edu.eg	Englsih	Egypt

Transactions (transaction-id, product-id, user-id, purchase-amount,item-description)

t_id	prod_id	user_id	pru_amount	item_desc
1001	5	1003	500	Description
1002	6	1004	700	Description
1003	9	1004	400	Description
1004	2	1001	100	Description

Aim

is to find the number of unique locations in which each product has been sold. to do that we need to join the two datasets and start to extract informations.

create a new data set that contains only three columns (user_id,prod_id,location) but How? here we will start our fist JOB which is responsible for mapping the two tables to the following style

Merging:

user_id	prod_id	location
1001	-	Germany
1001	5	-
1002	-	Egypt
1002	6	-
1003	-	Canada
1003	2	-
1001	-	Egypt
1001	2	-

and then reducing it to get each prod_id wiht it's location as following

Reducing:

prod_id	location
5	Germany
6	Egypt
2	Canada
2	Egypt

Now we had finished JOB ONE what is remains? we will start the SECOND JOB to start counting each location a product has purshuaces in. And her the mapper is doing nothing except getting data and pass it to the reducer. And then the reducer should start counting to get the following resutl.

Running second Job:

prod_id	no of locations
5	1
6	1
2	2

system requirments:

Python 2.7
Hadoop 2.8.1

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
Job1		Job1
Job2		Job2
input		input
README.md		README.md
commands.txt		commands.txt
generator.py		generator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hadoop-MapReduce-Python

Description

Example

Aim

Merging:

Reducing:

Running second Job:

system requirments:

About

Uh oh!

Releases

Packages

Languages

Pavi-2000/Hadoop-MapReduce-Python

Folders and files

Latest commit

History

Repository files navigation

Hadoop-MapReduce-Python

Description

Example

Aim

Merging:

Reducing:

Running second Job:

system requirments:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages