GitHub - gajrajgchouhan/Hadoop-Assignment: Hadoop Assignment of CS309 course (2021) at IIT Mandi.

Assignment

A dataset contains transaction id followed by price of items purchased. e.g. 1: 100 200 400 500 The syntax is txid: p1 p2 p3 where txid is transaction id and pi denotes price. Each transaction can have variable number of items.

Write a sequential program to generate the input data: create 10million transaction records each containing a variable number of items randomly generated between 1 and 50, the price of each item is another random variable whose range is 100 to 5000.
Store the text file in HDFS.
Write a mapreduce program that partitions the text file into 5 classes as follows: class 1 contains transactions s.t. the item count is between 1-10, class2 for 11-20, and so on till class5 having 41-50 item count; and computes the total amount obtained for each class.

input:
1: 100 200 400 500 2: 10 50 5 25 89 20 35 91 78 82 150 125 3: 100 300

Here tx 1 and tx 3 belong to class 1 as the item count for each is between 1 - 10, while tx 2 belongs to class 2. The total amount from the sales for class 1 is the sum of the costs of all the items from tx 1 and tx 3.

e.g. output class1, 1600
class2, 760

Output

file - part-r-0000

Class1  28037128510
Class2  79033028706
Class3  130212783758
Class4  180928360907
Class5  232005308376

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
java_classes/org/myorg		java_classes/org/myorg
LICENSE		LICENSE
Lab7.java		Lab7.java
README.md		README.md
a.out		a.out
lab7.jar		lab7.jar
part-r-00000		part-r-00000
random_Data.cpp		random_Data.cpp
run_hadoop.sh		run_hadoop.sh
run_lab7.sh		run_lab7.sh
stop_hadoop.sh		stop_hadoop.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment

Output

Random Data Generation - cpp file.

Hadoop Files - .java file.

File Tree

About

Releases

Packages

Languages

License

gajrajgchouhan/Hadoop-Assignment

Folders and files

Latest commit

History

Repository files navigation

Assignment

Output

Random Data Generation - cpp file.

Hadoop Files - .java file.

File Tree

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages