Skip to content

eliseobao/MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MapReduce

Exercises

Exercise 0: Word count.

Tutorial from: https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/.

Exercise 1: Search for maximun and minimun temperature.
  • Find the place where it was the hottest and the coldest in 2017. Indicate the name of the city along with the temperature.

  • We will use data from daily measurements provided by the NCDC (National Climate Center - NOAA).

  • Using those from the following directory: data/exercise_1 (source ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2017/).

  • It is considered hot if the temperature is above 27ºC and cold if the temperature is below -1º C.

  • The following file explains the organization of the files: docs/exercise_1.txt (source ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/README.txt).

  • Perform the exercises forcing 2 or more reducers.

  • A simple code must be implemented to obtain the final result from the files generated by from the files generated via the reducers.

Exercise 2: Web Client Logs.
  • The files contain the HTTP request logs.

  • The organization of the data is detailed in the document: docs/exercise_2.txt (source ftp://gaia.cs.umass.edu/pub/zhzhang/Traces-More/html/BU-Web-Client.html).

  • Using the files in the folder: data/exercise_2 (source ftp://ftp.town.hall.org/pub/ITA/traces/BU-www-client-traces.tar.gz).

    1. Extract the user who accessed the most files in .ps format. Show user and number of files accessed (in .ps format).

    2. Determine the most visited URL, indicating the total number of visits received.

Exercise 3: Wine quality.

Usage

Docker container up:

make up

Run exercise n, where n in [0, 1, 2, 3] and i denotes the number of reduce tasks per job (default 1):

make run exercise=n [reduce_tasks=i]

Attach to Docker container:

make attach

Detach from Docker container:

CTRL-p CTRL-q

Docker container down:

make down

Demo and results

The following video demonstrates the realization of the practice. It shows the commands necessary for the execution of the exercises, as well as the corresponding results. It should be noted that some parts have been sped up to reduce the overall duration.

demo.webm

Note: For the sake of completeness, obtained results have been pushed to docs/results.txt.

License

GNU General Public License v3.0