Exercise 0: Word count.
Tutorial from: https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/.
Exercise 1: Search for maximun and minimun temperature.
-
Find the place where it was the hottest and the coldest in 2017. Indicate the name of the city along with the temperature.
-
We will use data from daily measurements provided by the NCDC (National Climate Center - NOAA).
-
Using those from the following directory:
data/exercise_1
(source ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2017/). -
It is considered hot if the temperature is above 27ºC and cold if the temperature is below -1º C.
-
The following file explains the organization of the files:
docs/exercise_1.txt
(source ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/README.txt). -
Perform the exercises forcing 2 or more reducers.
-
A simple code must be implemented to obtain the final result from the files generated by from the files generated via the reducers.
Exercise 2: Web Client Logs.
-
The files contain the HTTP request logs.
-
The organization of the data is detailed in the document:
docs/exercise_2.txt
(source ftp://gaia.cs.umass.edu/pub/zhzhang/Traces-More/html/BU-Web-Client.html). -
Using the files in the folder:
data/exercise_2
(source ftp://ftp.town.hall.org/pub/ITA/traces/BU-www-client-traces.tar.gz).-
Extract the user who accessed the most files in
.ps
format. Show user and number of files accessed (in.ps
format). -
Determine the most visited URL, indicating the total number of visits received.
-
Exercise 3: Wine quality.
-
Use the data for the two types of wine:
winequality-white.csv
andwinequality-red.csv
fromdata/exercise_3
(source http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/). -
The organization of the data is detailed in the document:
docs/exercise_3.txt
(source https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality.names). -
The attributes of the columns are:
- Fixed acidity.
- Volatile acidity.
- Citric acid.
- Residual sugar.
- Chlorides.
- Free sulfur dioxide.
- Total sulfur dioxide.
- Density.
- pH.
- Sulfates.
- Alcohol.
- Quality.
-
For each type of wine, extract the average of all the attributes collected in the files.
Docker container up:
make up
Run exercise n
, where n
in [0, 1, 2, 3] and i
denotes the number of reduce tasks per job (default 1):
make run exercise=n [reduce_tasks=i]
Attach to Docker container:
make attach
Detach from Docker container:
CTRL-p CTRL-q
Docker container down:
make down
The following video demonstrates the realization of the practice. It shows the commands necessary for the execution of the exercises, as well as the corresponding results. It should be noted that some parts have been sped up to reduce the overall duration.
demo.webm
Note: For the sake of completeness, obtained results have been pushed to docs/results.txt
.
GNU General Public License v3.0