Big Data Analysis Project 1

First project for Big Data Analysis course CIIC5995 @ UPRM. The project consists of analysing data from a .json file composed of tweets (which is not provided in this repository) with hadoop map-reduce. There are 5 Map Reduce programs:

MR1 - Find Specific Key words in the tweets. In our case the words chosen were : {Trump, Sika, Ebola, Headache, Measles, diarrhea, flu}
MR2 - Find all different words
MR3 - Find all the unique @s
MR4 - Find all RTs, left unimplemented because of no RT data
MR5 - Find all message IDs with their replies
MR6 - Find all messages by User ID

Compilation

Make sure maven is installed in your machine and run

mvn clean install

in the project folder where the pom.xml is located

Running the Map-Reduce

Make sure hadoop is installed in your machine. For the purpose of testing, we ran it locally and is the set of instructions we will provide. Go to the /target file and write the folowing command in the terminal in the target folder where the .jar file is located:

hadoop <JARNAME>.jar <MR_#> <INPUT_FILE_LOCATION> <OUTPUT_FILE_LOCATION>

where <MR_#> is one of the map-reduce listed above. Make sure to have the .json file stored in <INPUT_FILE_LOCATION>. The output file must not exist in the system.

Visualization of pre-provided data

For the purpose of this project we have provided some .tsv files located in the MRVisuals folder with information obtained from our map-reduce program. Alongside these .tsv files, we provided some .html files for visualization. To view, an http server must be setup. We utilized:

python3 -m http.server

and simply opened the .html files on our favorite browser.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
MRVisuals		MRVisuals
src		src
target		target
README.md		README.md
dependency-reduced-pom.xml		dependency-reduced-pom.xml
hs_err_pid3730.log		hs_err_pid3730.log
mr01.iml		mr01.iml
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Analysis Project 1

Compilation

Running the Map-Reduce

Visualization of pre-provided data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Big Data Analysis Project 1

Compilation

Running the Map-Reduce

Visualization of pre-provided data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages