# Procesamiento de datos con Hive

* *30 min* | Última modificación: Junio 22, 2019

Este tutorial esta basado en https://es.hortonworks.com/tutorial/how-to-process-data-with-apache-hive/

El objetivo de este tutorial es implemetar consultas en Hive para analizar, procesar y filtrar los datos existentes en una bodega de datos, usando lenguaje SQL estándar.


## Preparación

En este tutorial se usa el magic `bigdata` para usar interactivamente Hive desde un notebook de Jupyter. El parámetro `timeout` es el tiempo máximo de espera de procesamiento antes de que se reporte un error por procesamiento.

In [1]:
%load_ext bigdata
%timeout 300

Los datos se encuentran almacenados en la carpeta `drivers` del directorio actual. A continución se procede a crear la carpeta `/tmp/drivers` en el sistema de archivos de Hadoop (HDFS). 

In [3]:
!apt-get update && apt-get install wget

Get:1 https://deb.nodesource.com/node_13.x bionic InRelease [4584 B]
Get:2 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]    
Get:3 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]               
Get:4 https://deb.nodesource.com/node_13.x bionic/main amd64 Packages [764 B]  
Get:5 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [12.6 kB]
Get:6 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [782 kB]
Get:7 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]      
Get:8 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [700 kB]
Get:9 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]    
Get:10 http://archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [186 kB]
Get:11 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [5944 B]
Get:12 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages [11.3 MB]
Get:13 htt

In [4]:
!wget https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/drivers/drivers.csv
!wget https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/drivers/timesheet.csv

--2019-11-08 10:12:22--  https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/drivers/drivers.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2043 (2.0K) [text/plain]
Saving to: 'drivers.csv'


2019-11-08 10:12:22 (1.32 MB/s) - 'drivers.csv' saved [2043/2043]

--2019-11-08 10:12:23--  https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/drivers/timesheet.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 199.232.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|199.232.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 26205 (26K) [text/plain]
Saving to: 'timesheet.csv'


2019-11-08 10:12:24 (934 KB/s) - 'timesheet.csv' saved [26205/26205]



In [6]:
##
## Crea la carpeta drivers en el HDFS
##
!hdfs dfs -mkdir /tmp/drivers

##
## Copia los archivos al HDFS
##
!hdfs dfs -copyFromLocal drivers.csv  /tmp/drivers/
!hdfs dfs -copyFromLocal timesheet.csv  /tmp/drivers/

##
## Lista los archivos al HDFS para verificar
## que los archivos fueron copiados correctamente.
##
!hdfs dfs -ls /tmp/drivers/*

mkdir: `/tmp/drivers': File exists
-rw-r--r--   1 root supergroup       2043 2019-11-08 10:13 /tmp/drivers/drivers.csv
-rw-r--r--   1 root supergroup      26205 2019-11-08 10:13 /tmp/drivers/timesheet.csv


El contenido de un archivo puede ser visualizado parcialmente usando el comando `tail`. Se usa para realizar una inspección rápida del contenido de los archivos.

In [7]:
##
## Se imprime el final del archivo drivers
##
!hdfs dfs -tail /tmp/drivers/drivers.csv

Box 213- 8948 Nec Ave,Y,hours
27,Mark Lochbihler,392603159,8355 Ipsum St.,Y,hours
28,Olivier Renault,959908181,P.O. Box 243- 6509 Erat. Avenue,Y,hours
29,Teddy Choi,185502192,P.O. Box 106- 7003 Amet Rd.,Y,hours
30,Dan Rice,282307061,Ap #881-9267 Mollis Avenue,Y,hours
31,Rommel Garcia,858912101,P.O. Box 945- 6015 Sociis St.,Y,hours
32,Ryan Templeton,290304287,765-6599 Egestas. Av.,Y,hours
33,Sridhara Sabbella,967409015,Ap #477-2507 Sagittis Avenue,Y,hours
34,Frank Romano,391407216,Ap #753-6814 Quis Ave,Y,hours
35,Emil Siemes,971401151,321-2976 Felis Rd.,Y,hours
36,Andrew Grande,245303216,Ap #685-9598 Egestas Rd.,Y,hours
37,Wes Floyd,190504074,P.O. Box 269- 9611 Nulla Street,Y,hours
38,Scott Shaw,386411175,276 Lobortis Road,Y,hours
39,David Kaiser,967706052,9185 At Street,Y,hours
40,Nicolas Maillard,208510217,1027 Quis Rd.,Y,hours
41,Greg Phillips,308103116,P.O. Box 847- 5961 Arcu. Road,Y,hours
42,Randy Gelhausen,853302254,145-4200 In- Avenue,Y,hours
43,Dave Patton,977706052,3028 A- St.,

In [8]:
!hdfs dfs -tail /tmp/drivers/timesheet.csv

42,36,56,2612
42,37,48,2550
42,38,55,2527
42,39,57,2723
42,40,55,2728
42,41,50,2557
42,42,53,2773
42,43,55,2786
42,44,54,2638
42,45,57,2542
42,46,48,2526
42,47,50,2795
42,48,53,2609
42,49,58,2584
42,50,48,2692
42,51,50,2566
42,52,48,2735
43,1,46,2622
43,2,47,2688
43,3,50,2544
43,4,56,2573
43,5,54,2691
43,6,52,2796
43,7,53,2564
43,8,58,2624
43,9,50,2528
43,10,57,2721
43,11,51,2722
43,12,59,2681
43,13,52,2683
43,14,46,2663
43,15,53,2579
43,16,56,2519
43,17,54,2584
43,18,47,2665
43,19,55,2511
43,20,60,2677
43,21,52,2585
43,22,60,2719
43,23,48,2655
43,24,48,2641
43,25,53,2512
43,26,48,2612
43,27,58,2614
43,28,60,2551
43,29,55,2682
43,30,49,2504
43,31,51,2701
43,32,57,2554
43,33,52,2730
43,34,54,2783
43,35,51,2681
43,36,51,2655
43,37,46,2629
43,38,58,2739
43,39,47,2535
43,40,50,2512
43,41,51,2701
43,42,55,2538
43,43,58,2775
43,44,56,2545
43,45,46,2671
43,46,57,2680
43,47,50,2572
43,48,52,2517
43,49,56,2743
43,50,59,2665
43,51,58,2593
43,52,48,2764

## Creación de la tabla `temp_drivers`

A continuación se crea la tabla `temp_drivers`, que es almacenada en el disco como un archivo de texto, para almacenar la información de los conductores.

In [9]:
%%hive
DROP TABLE IF EXISTS temp_drivers;
CREATE TABLE temp_drivers (col_value STRING) STORED AS TEXTFILE
TBLPROPERTIES ("skip.header.line.count"="1");

DROP TABLE IF EXISTS temp_drivers;
OK
Time taken: 6.917 seconds
CREATE TABLE temp_drivers (col_value STRING) STORED AS TEXTFILE
TBLPROPERTIES ("skip.header.line.count"="1");
OK
Time taken: 1.094 seconds


Seguidamente, se visualizan las tablas en la base de datos actual que empiezan por t para verificar que la tabla fue creada.

In [10]:
%%hive
SHOW TABLES LIKE 't*';

SHOW TABLES LIKE 't*';
OK
temp_drivers
Time taken: 0.155 seconds, Fetched: 1 row(s)


## Carga de datos para la tabla `temp_drivers`

La siguiente consulta realiza la carga de los datos del archivo `drivers.csv` en la tabla `temp_drivers`. 

In [11]:
%%hive
LOAD DATA INPATH '/tmp/drivers/drivers.csv' OVERWRITE INTO TABLE temp_drivers;

ers; DATA INPATH '/tmp/drivers/drivers.csv' OVERWRITE INTO TABLE temp_driv 
Loading data to table default.temp_drivers
OK
Time taken: 1.093 seconds


Hive consume los datos, es decir, mueve los datos a la bodega de datos, de tal forma que el archivo `drivers.csv` es eliminado de la carpeta `/tmp/drivers`.

In [12]:
!hdfs dfs -ls /tmp/drivers

Found 1 items
-rw-r--r--   1 root supergroup      26205 2019-11-08 10:13 /tmp/drivers/timesheet.csv


Se obtiene los primeros 10 registros de la tabla para realizar una inspección rápida de los datos y verificar que los datos fueron cargados correctamente.

In [13]:
%%hive
SELECT * FROM temp_drivers LIMIT 10;

SELECT * FROM temp_drivers LIMIT 10;
OK
10,George Vetticaden,621011971,244-4532 Nulla Rd.,N,miles
11,Jamie Engesser,262112338,366-4125 Ac Street,N,miles
12,Paul Coddin,198041975,Ap #622-957 Risus. Street,Y,hours
13,Joe Niemiec,139907145,2071 Hendrerit. Ave,Y,hours
14,Adis Cesir,820812209,Ap #810-1228 In St.,Y,hours
15,Rohit Bakshi,239005227,648-5681 Dui- Rd.,Y,hours
16,Tom McCuch,363303105,P.O. Box 313- 962 Parturient Rd.,Y,hours
17,Eric Mizell,123808238,P.O. Box 579- 2191 Gravida. Street,Y,hours
18,Grant Liu,171010151,Ap #928-3159 Vestibulum Av.,Y,hours
19,Ajay Singh,160005158,592-9430 Nonummy Avenue,Y,hours
Time taken: 2.294 seconds, Fetched: 10 row(s)


## Creación de la tabla `drivers`

A continuación se crea la tabla `drivers` en donde se colocará la información extraída de la tabla `temp_drivers`. 

In [14]:
%%hive

DROP TABLE IF EXISTS drivers;

CREATE TABLE drivers (driverId  INT, 
                      name      STRING, 
                      ssn       BIGINT,
                      location  STRING, 
                      certified STRING, 
                      wageplan  STRING)

TBLPROPERTIES ("skip.header.line.count"="1");

DROP TABLE IF EXISTS drivers;
OK
Time taken: 0.008 seconds
CREATE TABLE drivers (driverId  INT, 
                      name      STRING, 
                      ssn       BIGINT,
                      location  STRING, 
                      certified STRING, 
                      wageplan  STRING)
TBLPROPERTIES ("skip.header.line.count"="1");
OK
Time taken: 0.074 seconds


Ya que cada registro de la tabla `temp_drivers` es una línea de texto, se aplica una expresión regular (`regexp_extract`) para realizar la división del texto por las comas. La parte `{1}` representa la primera cadena de caracteres después de realizar la partición, `{2}` la segunda y así sucesivamente. Después de la llamada a la función `regexp_extract` se indica el nombre de la columna en la tabla `drivers`.

In [15]:
%%hive
INSERT OVERWRITE TABLE drivers
SELECT
    regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) driverId,
    regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) name,
    regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) ssn,
    regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) location,
    regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) certified,
    regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) wageplan
FROM 
    temp_drivers;

INSERT OVERWRITE TABLE drivers
SELECT
    regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) driverId,
    regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) name,
    regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) ssn,
    regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) location,
    regexp_extract(col_value, '^(?:([^,]*),?){5}', 1) certified,
    regexp_extract(col_value, '^(?:([^,]*),?){6}', 1) wageplan
FROM 
    temp_drivers;
Query ID = root_20191108101459_e2544eb9-0944-4166-b56a-1a6d34c91757
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1573207742881_0001, Tracking URL = http://dd8f0caea87b:8088/proxy/application_1573207742881_0001/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1573207742881_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-08 10:15:15,481 Stage-1 map = 0%,  reduce = 0%
2019-11-08 10:15:23,934 Stage-1 map = 100%,  reduce = 0%,

Se aplica la instrucción `SELECT` para revisar el resultado de la carga de los datos.

In [16]:
%%hive
SELECT * FROM drivers LIMIT 10;

SELECT * FROM drivers LIMIT 10;
OK
11	Jamie Engesser	262112338	366-4125 Ac Street	N	miles
12	Paul Coddin	198041975	Ap #622-957 Risus. Street	Y	hours
13	Joe Niemiec	139907145	2071 Hendrerit. Ave	Y	hours
14	Adis Cesir	820812209	Ap #810-1228 In St.	Y	hours
15	Rohit Bakshi	239005227	648-5681 Dui- Rd.	Y	hours
16	Tom McCuch	363303105	P.O. Box 313- 962 Parturient Rd.	Y	hours
17	Eric Mizell	123808238	P.O. Box 579- 2191 Gravida. Street	Y	hours
18	Grant Liu	171010151	Ap #928-3159 Vestibulum Av.	Y	hours
19	Ajay Singh	160005158	592-9430 Nonummy Avenue	Y	hours
20	Chris Harris	921812303	883-2691 Proin Avenue	Y	hours
Time taken: 0.144 seconds, Fetched: 10 row(s)


## Creación de la tabla `temp_timesheet`

Se procede a crear la tabla y cargar los datos para el archivo `time_sheet`.

In [17]:
%%hive

DROP TABLE IF EXISTS temp_timesheet;

CREATE TABLE temp_timesheet (col_value string) 
STORED AS TEXTFILE
TBLPROPERTIES ("skip.header.line.count"="1");

LOAD DATA INPATH '/tmp/drivers/timesheet.csv' OVERWRITE INTO TABLE temp_timesheet;

SELECT * FROM temp_timesheet LIMIT 10;

DROP TABLE IF EXISTS temp_timesheet;
OK
Time taken: 0.009 seconds
CREATE TABLE temp_timesheet (col_value string) 
STORED AS TEXTFILE
TBLPROPERTIES ("skip.header.line.count"="1");
OK
Time taken: 0.06 seconds
mesheet;A INPATH '/tmp/drivers/timesheet.csv' OVERWRITE INTO TABLE temp_ti 
Loading data to table default.temp_timesheet
OK
Time taken: 0.414 seconds
SELECT * FROM temp_timesheet LIMIT 10;
OK
10,1,70,3300
10,2,70,3300
10,3,60,2800
10,4,70,3100
10,5,70,3200
10,6,70,3300
10,7,70,3000
10,8,70,3300
10,9,70,3200
10,10,50,2500
Time taken: 0.131 seconds, Fetched: 10 row(s)


## Creación de la tabla `timesheet`

Se procede igual que en las tablas anteriores.

In [18]:
%%hive

DROP TABLE IF EXISTS timesheet;

CREATE TABLE timesheet (driverId INT, week INT, hours_logged INT , miles_logged INT)
TBLPROPERTIES ("skip.header.line.count"="1");

INSERT OVERWRITE TABLE timesheet
SELECT
    regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) driverId,
    regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) week,
    regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) hours_logged,
    regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) miles_logged
FROM 
    temp_timesheet;

SELECT * FROM timesheet LIMIT 10;

DROP TABLE IF EXISTS timesheet;
OK
Time taken: 0.008 seconds
ogged INT)LE timesheet (driverId INT, week INT, hours_logged INT , miles_l 
TBLPROPERTIES ("skip.header.line.count"="1");
OK
Time taken: 0.065 seconds
INSERT OVERWRITE TABLE timesheet
SELECT
    regexp_extract(col_value, '^(?:([^,]*),?){1}', 1) driverId,
    regexp_extract(col_value, '^(?:([^,]*),?){2}', 1) week,
    regexp_extract(col_value, '^(?:([^,]*),?){3}', 1) hours_logged,
    regexp_extract(col_value, '^(?:([^,]*),?){4}', 1) miles_logged
FROM 
    temp_timesheet;
Query ID = root_20191108101549_2497d5ff-edf4-4076-84de-aa5034d5ebf2
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1573207742881_0002, Tracking URL = http://dd8f0caea87b:8088/proxy/application_1573207742881_0002/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1573207742881_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-11-08 1

## Cantidad de horas y millas de cada conductor por año.

En la siguiente consulta se desea obtener para cada conductor la cantidad de horas y millas por año.

In [19]:
%%hive 
SELECT 
    driverId, 
    sum(hours_logged), 
    sum(miles_logged) 
FROM 
    timesheet 
GROUP BY 
    driverId;

SELECT 
    driverId, 
    sum(hours_logged), 
    sum(miles_logged) 
FROM 
    timesheet 
GROUP BY 
    driverId;
Query ID = root_20191108101620_a7136990-cc28-4594-b59e-bdce97b50c0f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1573207742881_0003, Tracking URL = http://dd8f0caea87b:8088/proxy/application_1573207742881_0003/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1573207742881_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-11-08 10:16:31,957 Stage-1 map = 0%,  reduce = 0%
2019-11-08 10:16:39,327 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.97 sec
2019-11

## Consulta para unir las tablas

El paso final consiste en crear una consulta que agregue el nombre del conductor de la tabla `drivers` con la cantidad de horas y millas por año.

In [20]:
%%hive
SELECT 
    d.driverId, 
    d.name, 
    t.total_hours, 
    t.total_miles 
FROM 
    drivers d
JOIN (
    SELECT 
        driverId, 
        sum(hours_logged)total_hours, 
        sum(miles_logged)total_miles 
    FROM 
        timesheet 
    GROUP BY 
        driverId 
    ) t
ON 
    (d.driverId = t.driverId);

SELECT 
    d.driverId, 
    d.name, 
    t.total_hours, 
    t.total_miles 
FROM 
    drivers d
JOIN (
    SELECT 
        driverId, 
        sum(hours_logged)total_hours, 
        sum(miles_logged)total_miles 
    FROM 
        timesheet 
    GROUP BY 
        driverId 
    ) t
ON 
    (d.driverId = t.driverId);
Query ID = root_20191108101653_b0d3d8c5-7207-4371-bffe-3056c25e3a12
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1573207742881_0004, Tracking URL = http://dd8f0caea87b:8088/proxy/application_1573207742881_0004/
Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1573207742881_0004
Hadoop job information 

## Almacenamiento de los resultados

Finalmente, se agrega una porción de codigo adicional a la consulta anterior para almacenar la tabla final obtenida en la carpeta `/tmp/drivers/summary` del HDFS para que otras aplicaciones puedan usar estos resultados.

In [21]:
%%hive
INSERT OVERWRITE DIRECTORY '/tmp/drivers/summary' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
SELECT 
    d.driverId, 
    d.name, 
    t.total_hours, 
    t.total_miles 
FROM 
    drivers d
JOIN (
    SELECT 
        driverId, 
        sum(hours_logged)total_hours, 
        sum(miles_logged)total_miles 
    FROM 
        timesheet 
    GROUP BY 
        driverId 
    ) t
ON 
    (d.driverId = t.driverId);

INSERT OVERWRITE DIRECTORY '/tmp/drivers/summary' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
SELECT 
    d.driverId, 
    d.name, 
    t.total_hours, 
    t.total_miles 
FROM 
    drivers d
JOIN (
    SELECT 
        driverId, 
        sum(hours_logged)total_hours, 
        sum(miles_logged)total_miles 
    FROM 
        timesheet 
    GROUP BY 
        driverId 
    ) t
ON 
    (d.driverId = t.driverId);
Query ID = root_20191108101759_fcf02464-be59-4e0b-bea8-c46ec3eb05f0
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1573207742881_0006, Tracking URL = http://dd8f0caea87b:8088/proxy/application_1573207742881_0006/
Kil

In [22]:
!hdfs dfs -ls /tmp/drivers/summary/

Found 1 items
-rwxr-xr-x   1 root supergroup        928 2019-11-08 10:18 /tmp/drivers/summary/000000_0


In [23]:
!hdfs dfs -tail /tmp/drivers/summary/000000_0

11,Jamie Engesser,3642,179300
12,Paul Coddin,2639,135962
13,Joe Niemiec,2727,134126
14,Adis Cesir,2781,136624
15,Rohit Bakshi,2734,138750
16,Tom McCuch,2746,137205
17,Eric Mizell,2701,135992
18,Grant Liu,2654,137834
19,Ajay Singh,2738,137968
20,Chris Harris,2644,134564
21,Jeff Markham,2751,138719
22,Nadeem Asghar,2733,137550
23,Adam Diaz,2750,137980
24,Don Hilborn,2647,134461
25,Jean-Philippe Playe,2723,139180
26,Michael Aube,2730,137530
27,Mark Lochbihler,2771,137922
28,Olivier Renault,2723,137469
29,Teddy Choi,2760,138255
30,Dan Rice,2773,137473
31,Rommel Garcia,2704,137057
32,Ryan Templeton,2736,137422
33,Sridhara Sabbella,2759,139285
34,Frank Romano,2811,137728
35,Emil Siemes,2728,138727
36,Andrew Grande,2795,138025
37,Wes Floyd,2694,137223
38,Scott Shaw,2760,137464
39,David Kaiser,2745,138788
40,Nicolas Maillard,2700,136931
41,Greg Phillips,2723,138407
42,Randy Gelhausen,2697,136673
43,Dave Patton,2750,136993


In [24]:
!rm *.csv *.log