# Project Title
### Data Engineering Capstone Project

#### Project Summary
--describe your project at a high level--

The project follows the follow steps:
* Step 1: Scope the Project and Gather Data
* Step 2: Explore and Assess the Data
* Step 3: Define the Data Model
* Step 4: Run ETL to Model the Data
* Step 5: Complete Project Write Up


In [1]:
## NOTES
# Install a pip package in the current Jupyter kernel

#import sys
#!{sys.executable} -m pip install s3fs
#!{sys.executable} -m pip install boto
#!{sys.executable} -m pip install boto3
#!{sys.executable} -m pip install pyspark

In [1]:
# IMPORTS AND INSTALLS

import pandas as pd

from datetime import datetime

from s3_local_io import *
from create_parquet_tables import *

from pyspark.sql import SparkSession
from pyspark.sql.functions import udf, col, count, lit, when, max


In [2]:
# Global names

##This is now in the s3_local_io file
##config = configparser.ConfigParser()
##config.read('dl.cfg')
##os.environ['AWS_ACCESS_KEY_ID']=config['KEYS']['AWS_ACCESS_KEY_ID']
##os.environ['AWS_SECRET_ACCESS_KEY']=config['KEYS']['AWS_SECRET_ACCESS_KEY']


# URL and PATHS to data
bucket_name = 'raul-udacity'
bucket_parquet_path ='/parquet/'
bucket_path = 's3a://'+bucket_name+'/'
local_path = './input_files/'
local_parquet_path = './input_files/parquet_files/'
#S3_URI = "s3a://raul-udacity/"
#s3a vs s3 explanation https://stackoverflow.com/questions/33356041/technically-what-is-the-difference-between-s3n-s3a-and-s3


# Filenames
data_bares = 'bares.csv'
data_restaurantes = 'restaurantes.csv'
data_cafeterias = 'cafeterias.csv'

data_asociaciones = 'AsociacionesJCyL.csv'
data_clubes_deportivos = 'Clubes deportivos.csv'

data_bibliotecas = 'Directorio de Bibliotecas de Castilla y León.json'
data_museos = 'Directorio de Museos de Castilla y León.json'

data_poblacion = 'Cities population per gender age.csv'

# Other available data/filenames we decided not to use
# Poblacion municipio sexo relacion nacimiento residencia.json
# Municipios Origen Nacimiento.csv
# 

# Step 1: Scope the Project and Gather Data

## Scope 
Explain what you plan to do in the project in more detail. What data do you use? What is your end solution look like? What tools did you use? etc>
Scope.md file

## Describe and Gather Data 
Describe the data sets you're using. Where did it come from? What type of information is included? 
https://github.com/rantoncuadrado/udacity_capstone_project/blob/main/Datasources%20Description.md
Datasources Description.md file

### COPY FILES FROM s3 TO LOCAL

In [None]:
## WE COPY FILES FROM s3 TO LOCAL
## This step is not needed if working with s3 files

# Commented as we don't need to copy them anytime we run the process
# copy_files_s3_to_local(bucket_name, local_path)

# Step 2: Explore and Assess the Data
## Explore the Data 
Identify data quality issues, like missing values, duplicate data, etc.

## Cleaning Steps
Once we have the files in local filesystem, I'll use dataframes to clean the data and later SPARK to manipulate them.

In [3]:
# Create an SPARK SESSION 

spark_session = SparkSession \
        .builder \
        .appName("Castilla y Leon -> Fact Tables") \
        .getOrCreate()


# This is needed just if we use spark on s3
#spark_session.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider","org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider")
#spark_session.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.access.key",os.environ['AWS_ACCESS_KEY_ID'])
#spark_session.sparkContext._jsc.hadoopConfiguration().set("fs.s3a.secret.key",os.environ['AWS_SECRET_ACCESS_KEY'])


### CLEANING BAR, RESTAURANT, CAFE and CREATING GARITOS TABLE
These 3 files share same schema

In [4]:
sparkdf_garitos=create_garitos(spark_session,local_path,[data_bares,data_restaurantes,data_cafeterias])

### CREATION OF CITY / POSTALCODE TABLE


In [5]:
sparkdf_postal_codes=create_postal_code(sparkdf_garitos)

In [6]:
df=sparkdf_postal_codes.toPandas()
df.describe(include='all')

Unnamed: 0,county,city,postal_code
count,2881,2881,2881
unique,9,1730,2025
top,León,Zamora,24000
freq,519,33,23


In [7]:
toparquet_postal_codes(spark_session,local_parquet_path,sparkdf_postal_codes)

In [None]:
xxxx

### GARITOS CLEANUP

In [65]:
# I want to practice with both dataframes and sparkdfs
# df shows here that there are addressless and postal_codeless
# garitos (garito= bar | restaurant | cafe) but no countyless or cityless
df=Sparkdf_garitos.toPandas()
df.describe(include='all')

Unnamed: 0,name,address,county,city,postal_code,garito_kind
count,22487,22426,22487,22487,22472,22487
unique,16100,21087,9,1730,2025,3
top,LA PLAZA,"PLAZA MAYOR, 2",León,Valladolid,24003,bar
freq,87,20,4942,2668,369,15080


In [69]:
# Playing with Spark Data Frames. Most repeated names.
garitos_name_top = Sparkdf_garitos \
    .select("name",'address') \
    .groupBy("name") \
    .agg(count("address").alias("Total")) \
    .orderBy("Total", ascending=False)
garitos_name_top.head(30)


[Row(name='LA PLAZA', Total=87),
 Row(name='AVENIDA', Total=55),
 Row(name='PLAZA', Total=47),
 Row(name='CENTRAL', Total=43),
 Row(name='TELEPIZZA', Total=41),
 Row(name='EL PASO', Total=36),
 Row(name='PISCINAS MUNICIPALES', Total=35),
 Row(name='LA TABERNA', Total=32),
 Row(name='EL RINCON', Total=32),
 Row(name='BURGER KING', Total=31),
 Row(name='LA TERRAZA', Total=31),
 Row(name='CASTILLA', Total=31),
 Row(name='LOS ARCOS', Total=30),
 Row(name='EL CRUCE', Total=30),
 Row(name='LA PARADA', Total=29),
 Row(name='EL PUENTE', Total=28),
 Row(name='LA BODEGUILLA', Total=27),
 Row(name='LA FUENTE', Total=27),
 Row(name='LOS ANGELES', Total=23),
 Row(name='EL MOLINO', Total=23),
 Row(name='EL PARQUE', Total=22),
 Row(name='MANOLO', Total=22),
 Row(name='EL CASTILLO', Total=21),
 Row(name='LA BODEGA', Total=21),
 Row(name='LAS PISCINAS', Total=21),
 Row(name='PISCINA MUNICIPAL', Total=21),
 Row(name='LA CASONA', Total=21),
 Row(name='LA POSADA', Total=20),
 Row(name='EL REFUGIO', Total=

In [86]:
# Playing with Spark Data Frames. 
restaurante_name_top = Sparkdf_garitos \
    .select("name",'address','garito_kind') \
    .where("garito_kind='restaurante'") \
    .groupBy("name",) \
    .agg(count("address").alias("Total")) \
    .orderBy("Total", ascending=False)

print(restaurante_name_top.head(30))



[Row(name='BURGER KING', Total=27), Row(name='TELEPIZZA', Total=18), Row(name='LA POSADA', Total=12), Row(name='LA CASONA', Total=10), Row(name='EL MOLINO', Total=10), Row(name='LA TABERNA', Total=10), Row(name='AVENIDA', Total=10), Row(name="FOSTER'S HOLLYWOOD", Total=9), Row(name='EL CRUCE', Total=8), Row(name='EL CASTILLO', Total=7), Row(name='PLAZA', Total=7), Row(name='LOS ARCOS', Total=7), Row(name="DOMINO'S PIZZA", Total=7), Row(name='CASTILLA', Total=7), Row(name="MC DONALD'S", Total=7), Row(name='LA PARADA', Total=7), Row(name='BURGUER KING', Total=7), Row(name='LA TERRAZA', Total=7), Row(name='EL JARDIN', Total=6), Row(name='CENTRAL', Total=6), Row(name='LAS NIEVES', Total=6), Row(name='LA GRAN MURALLA', Total=6), Row(name='LA ENCINA', Total=6), Row(name='EL CAPRICHO', Total=6), Row(name='CASA PACO', Total=6), Row(name='LA MURALLA', Total=6), Row(name='EL MESON', Total=6), Row(name='EL MIRADOR', Total=6), Row(name='EL PASO', Total=6), Row(name='EL REFUGIO', Total=5)]


In [87]:
# Playing with Spark Data Frames. 
cafe_name_top = Sparkdf_garitos \
    .select("county",'address') \
    .where("garito_kind='cafeteria'") \
    .groupBy("county",) \
    .agg(count("address").alias("Total")) \
    .orderBy("Total", ascending=False)

print(cafe_name_top.head(30))

[Row(county='León', Total=317), Row(county='Salamanca', Total=315), Row(county='Valladolid', Total=230), Row(county='Burgos', Total=179), Row(county='Ávila', Total=141), Row(county='Zamora', Total=91), Row(county='Segovia', Total=59), Row(county='Soria', Total=59), Row(county='Palencia', Total=54)]


In [109]:
# Playing with Spark Data Frames. 
burgos_top = Sparkdf_garitos \
    .select("city",'address',
           when(Sparkdf_garitos['garito_kind'] == 'cafeteria', 1).alias("is_cafe"),
           when(Sparkdf_garitos['garito_kind'] == 'bar', 1).alias("is_bar"),
           when(Sparkdf_garitos['garito_kind'] == 'restaurante', 1).alias("is_restaurante")
           ) \
    .where("county='Burgos'") \
    .groupBy("city") \
    .agg(count("is_cafe").alias("cafes"), 
         count("is_bar").alias("bars"),
         count("is_restaurante").alias("restaurants"),
         count("address").alias("total"),
        ) \
    .orderBy("Total", ascending=False)


burgos_top.head(30)

[Row(city='Burgos', cafes=102, bars=765, restaurants=244, total=1110),
 Row(city='Aranda de Duero', cafes=19, bars=168, restaurants=62, total=249),
 Row(city='Miranda de Ebro', cafes=14, bars=177, restaurants=42, total=233),
 Row(city='Medina de Pomar', cafes=3, bars=67, restaurants=20, total=90),
 Row(city='Villarcayo de Merindad de Castilla la Vieja', cafes=7, bars=42, restaurants=16, total=65),
 Row(city='Briviesca', cafes=3, bars=42, restaurants=13, total=58),
 Row(city='Lerma', cafes=2, bars=19, restaurants=22, total=43),
 Row(city='Valle de Mena', cafes=2, bars=29, restaurants=13, total=41),
 Row(city='Espinosa de los Monteros', cafes=0, bars=25, restaurants=11, total=36),
 Row(city='Salas de los Infantes', cafes=0, bars=20, restaurants=9, total=29),
 Row(city='Belorado', cafes=3, bars=12, restaurants=10, total=25),
 Row(city='Roa', cafes=1, bars=18, restaurants=6, total=25),
 Row(city='Quintanar de la Sierra', cafes=0, bars=18, restaurants=7, total=24),
 Row(city='Melgar de Fern

In [114]:
## we need a correspondence city - xxx- postal code so checking empty postal_code cases

Sparkdf_garitos_cp_null=Sparkdf_garitos.select(
            'name',
            'address',
            'county',
            'city',
            'postal_code'
            ).where(col('postal_code').isNull())

Sparkdf_garitos=Sparkdf_garitos.select(
            'name',
            'address',
            'county',
            'city',
            'postal_code'
            ).where(col('postal_code').isNotNull())

Sparkdf_garitos_null.head(25)

[Row(name='LA PLAZA', address='PZA. MAYOR, S/N', county='León', city='Vegas del Condado', postal_code=None),
 Row(name='RAMSES', address='CARRETERA ESTACION S/N', county='Ávila', city='Sanchidrián', postal_code=None),
 Row(name='MESON LA BARRACA', address='AVDA. RODRÍGUEZ PANDIELLA, 42', county='León', city='León', postal_code=None),
 Row(name='MERCADO REGIONAL DE GANADOS', address='CTRA. BURGOS-PORTUGAL, KM.2', county='Salamanca', city='Salamanca', postal_code=None),
 Row(name='OASIS', address='LA VICTORIA, 4', county='León', city='Valencia de Don Juan', postal_code=None),
 Row(name='LA VUELTA', address='TRAVESIA DE SANTA TERESA S/N', county='Ávila', city='Hoyo de Pinares (El)', postal_code=None),
 Row(name='PUB EBANO', address='C/ JUAN FERRERO Nº 80', county='León', city='Valderrueda', postal_code=None),
 Row(name='AVENIDA', address='GONZÁLEZ DE LAMA, 10', county='León', city='León', postal_code=None),
 Row(name='VIFER', address='MAESTRO URIARTE, 25', county='León', city='León', post

In [126]:
# Extracting cities with unique postal code

cities_with_unique_postal_codes=Sparkdf_postal_code_table.select(
            'city',
            'postal_code'
            ).groupBy("city") \
            .agg(count('postal_code').alias('postal_codes'),
                 max('postal_code').alias('postal_code')) \
            .orderBy('city', ascending=True) \
            .where("postal_codes=1")


cities_with_unique_postal_codes.show(15)   



+--------------------+------------+-----------+
|                city|postal_codes|postal_code|
+--------------------+------------+-----------+
|              Abades|           1|      40141|
|    Abarca de Campos|           1|      34338|
|              Abejar|           1|      42146|
|             Abusejo|           1|      37640|
|      Adrada de Haza|           1|      09462|
|     Adrada de Pirón|           1|      40192|
|             Adrados|           1|      40354|
|        Aguilafuente|           1|      40340|
|   Aguilar de Campos|           1|      47814|
|Ahigal de los Ace...|           1|      37248|
|     Alamedilla (La)|           1|      37554|
|              Alaraz|           1|      37312|
|     Alba de Cerrato|           1|      34219|
|      Alba de Tormes|           1|      37800|
|      Alba de Yeltes|           1|      37478|
+--------------------+------------+-----------+
only showing top 15 rows



In [130]:
# Completing postalcodeless garitos with postal code when there is only one / city

Sparkdf_garitos_null = Sparkdf_garitos_null.join(
    cities_with_unique_postal_codes,
    Sparkdf_garitos_null.city == cities_with_unique_postal_codes.city,
    'left').select(
        'name',
        'address',
        'county',
        Sparkdf_garitos_null.city,
        cities_with_unique_postal_codes.postal_code
    )


print(Sparkdf_garitos_null.show(15)


+--------------------+--------------------+---------+--------------------+-----------+
|                name|             address|   county|                city|postal_code|
+--------------------+--------------------+---------+--------------------+-----------+
|            LA PLAZA|     PZA. MAYOR, S/N|     León|   Vegas del Condado|       null|
|              RAMSES|CARRETERA ESTACIO...|    Ávila|         Sanchidrián|       null|
|    MESON LA BARRACA|AVDA. RODRÍGUEZ P...|     León|                León|       null|
|MERCADO REGIONAL ...|CTRA. BURGOS-PORT...|Salamanca|           Salamanca|       null|
|               OASIS|      LA VICTORIA, 4|     León|Valencia de Don Juan|       null|
|           LA VUELTA|TRAVESIA DE SANTA...|    Ávila|Hoyo de Pinares (El)|       null|
|           PUB EBANO|C/ JUAN FERRERO N...|     León|         Valderrueda|       null|
|             AVENIDA|GONZÁLEZ DE LAMA, 10|     León|                León|       null|
|               VIFER| MAESTRO URIARTE, 25|

In [132]:
print("garitos without null postalcodes",Sparkdf_garitos.count())


Sparkdf_garitos = (
        Sparkdf_garitos.union(Sparkdf_garitos_null)
    )

Sparkdf_garitos.describe()


print("total garitos",Sparkdf_garitos.count())


garitos without null postalcodes 22472
total garitos 22487


In [133]:
## CLEANED UP GARITOS TO PARQUET 

Sparkdf_garitos.write.partitionBy("county","postal_code").parquet(local_parquet_path + "garitos/", mode="overwrite")

### CLEANING ASOCIACIONES Y CLUBES DEPORTIVOS
These 3 files share same schema

In [21]:
## One from s3 (To test) ant the other from local folder

print('s3a://'+bucket_name+'/'+data_asociaciones)

Sparkdf_association = spark_session.read.options(inferSchema='true',\
                                delimiter=';',\
                                header='true',\
                                encoding='ISO-8859-1')\
                                .csv('s3a://'+bucket_name+'/'+data_asociaciones)

Sparkdf_sports_club = spark_session.read.options(inferSchema='true',\
                                delimiter=';',\
                                header='true',\
                                encoding='ISO-8859-1')\
                                .csv(local_path+data_clubes_deportivos)


print(Sparkdf_association.describe())

print('\n')
print(Sparkdf_sports_club.describe())


s3a://raul-udacity/AsociacionesJCyL.csv
DataFrame[summary: string, Num_Asoc: string, Ambito: string, Asociación: string, Domicilio: string, Municipio: string, Provincia: string, C_Postal: string, Web: string, Fines: string, Fines_Específicos: string, F_Registro: string]


DataFrame[summary: string, Nº registro: string, Nombre: string, Domicilio: string, Provincia: string, Localidad: string, C.Postal: string, Teléfono: string, Fax: string, Email: string, Web: string, F.Fundación: string, F.Inscripción: string, Deportes: string, _c13: string]


In [22]:
Sparkdf_association.head(2)

[Row(Num_Asoc='05/1/0000002', Ambito='COMARCAL', Asociación='ASOCIACION DE MADRES Y PADRES DE ALUMNOS DEL INSTITUTO DE BACHILLERATO EULOGIO FLORENTINO SANZ', Domicilio='Avda. Emilio Romero, 22', Municipio='ARÉVALO', Provincia='AVILA', C_Postal='05200', Web=None, Fines='-Asistir a los padres/madres o tutores en todo aquello que concierne a la educación de sus hijos o pupilos. -Colaborar en las actividades del Instituto. -Promover la participación de los padres/madres o tutores de los alumnos/as en la gestión del Instituto. -Asistir a los padres/madres o tutores en el ejercicio de su derecho a intervenir en el control y gestión del Instituto. -Facilitar la representación y la participación de los padres/madres o tutores en el Consejo Escolar del Instituto. -Promover la integración de los padres/madres o tutores en el proceso educativo. -Promover el transporte escolar de los alumnos/as no residentes en Arévalo, pero dentro de la zona del ámbito territorial de la Asociación. -Fomentar la c

In [23]:
Sparkdf_sports_club.head(2)

[Row(Nº registro='CYA/000009', Nombre='CLUB DEPORTIVO AREVALO DO.SA.', Domicilio='C/ ADOVERAS 35 A, 2º B', Provincia='Ávila', Localidad='AREVALO', C.Postal=5200, Teléfono=None, Fax=None, Email=None, Web=None, F.Fundación='17/01/1974', F.Inscripción='12/11/1984', Deportes='AT001#ATLETISMO - PISTA#|AT002#ATLETISMO - CAMPO A TRAVES#|AT003#ATLETISMO - RUTA#|AT004#ATLETISMO - MARCHA ATLÉTICA#|BC001#BALONCESTO - BALONCESTO#|FU001#FÚTBOL - FÚTBOL#|VB001#VOLEIBOL - VOLEIBOL#|VB002#VOLEIBOL - VOLEY-PLAYA#|VB003#VOLEIBOL - MINIVOLEY#|', _c13=None),
 Row(Nº registro='CYA/000018', Nombre='"CLUB DEPORTIVO GALGUERO ""LA CASTELLANA"""', Domicilio='C/ GENERAL PRIMO DE RIVERA, 14-1', Provincia='Ávila', Localidad='FONTIVEROS', C.Postal=5310, Teléfono=None, Fax=None, Email=None, Web=None, F.Fundación='01/03/1982', F.Inscripción='12/11/1984', Deportes='CA001#CAZA - PICHON A BRAZO#|CA002#CAZA - CAZA MENOR CON PERROS#|CA003#CAZA - RECORRIDOS DE CAZA#|CA004#CAZA - CAZA SAN HUBERTO#|CA005#CAZA - PERROS DE CAZ

In [27]:
# HEADS UP! The column is C_Postal (It was C.Postal)
# And Municipio is Localidad in sports_club

Sparkdf_social = Sparkdf_association.select(
            col('Asociación').alias('name'),
            col('Domicilio').alias('address'),
            col('Provincia').alias('county'),
            col('Municipio').alias('city'),
            col('`C_Postal`').alias('postal_code')
        ).withColumn("social_kind",lit('association')).distinct()\
        .union(Sparkdf_sports_club.select(
            col('Nombre').alias('name'),
            col('Domicilio').alias('address'),
            col('Provincia').alias('county'),
            col('Localidad').alias('city'),
            col('`C.Postal`').alias('postal_code')
        ).withColumn("social_kind",lit('sports_club')).distinct()
        )

Sparkdf_social.describe()

DataFrame[summary: string, name: string, address: string, county: string, city: string, postal_code: string, social_kind: string]

In [28]:
## CLEANED UP SOCIAL TO PARQUET (LOCAL)

Sparkdf_social.write.partitionBy("county","postal_code").parquet(local_parquet_path + "social/", mode="overwrite")

In [32]:
## CLEANED UP SOCIAL TO PARQUET (S3)

#bucket_name = 'raul-udacity'
#bucket_parquet_path ='/parquet/'

#Sparkdf_social.write.partitionBy("county","postal_code").parquet('s3a://'+bucket_name+bucket_parquet_path + "social/", mode="overwrite")

In [52]:
## Sports / county
# https://stackoverflow.com/questions/57066797/pyspark-dataframe-split-column-with-multiple-values-into-rows#57080133

from pyspark.sql.functions import explode, regexp_replace, split

out=Sparkdf_sports_club.withColumn(
    "sport", 
    explode(split(col("Deportes"), "\|"))
).where("Provincia='Burgos'").select(
    col('sport')
    ).groupBy('sport').agg(count('sport').alias('sport_associations')) \
    .orderBy('sport_associations', ascending=False)



out.head(35)

[Row(sport='', sport_associations=1103),
 Row(sport='CA013#CAZA - EDUCACIÓN CANINA#', sport_associations=385),
 Row(sport='CA005#CAZA - PERROS DE CAZA Y AGILITY#', sport_associations=385),
 Row(sport='CA014#CAZA - CAZA DE BECADAS#', sport_associations=384),
 Row(sport='CA010#CAZA - CAZA FOTOGRAFICA Y VIDEO#', sport_associations=384),
 Row(sport='CA008#CAZA - CAZA CON ARCO#', sport_associations=384),
 Row(sport='CA006#CAZA - CETRERIA#', sport_associations=384),
 Row(sport='CA001#CAZA - PICHON A BRAZO#', sport_associations=384),
 Row(sport='CA011#CAZA - COMPAK SPORTING#', sport_associations=384),
 Row(sport='CA009#CAZA - TIRO A CAZA LANZADA#', sport_associations=384),
 Row(sport='CA012#CAZA - PERDIZ CON RECLAMO#', sport_associations=384),
 Row(sport='CA004#CAZA - CAZA SAN HUBERTO#', sport_associations=384),
 Row(sport='CA007#CAZA - PAJAROS DE CANTO#', sport_associations=383),
 Row(sport='CA003#CAZA - RECORRIDOS DE CAZA#', sport_associations=383),
 Row(sport='CA002#CAZA - CAZA MENOR CON P

# Step 3: Define the Data Model
## 3.1 Conceptual Data Model
Map out the conceptual data model and explain why you chose that model

## 3.2 Mapping Out Data Pipelines
List the steps necessary to pipeline the data into the chosen data model

# Step 4: Run Pipelines to Model the Data 
## 4.1 Create the data model
Build the data pipelines to create the data model.


## 4.2 Data Quality Checks
Explain the data quality checks you'll perform to ensure the pipeline ran as expected. These could include:
 * Integrity constraints on the relational database (e.g., unique key, data type, etc.)
 * Unit tests for the scripts to ensure they are doing the right thing
 * Source/Count checks to ensure completeness
 
Run Quality Checks

## 4.3 Data dictionary 
Create a data dictionary for your data model. For each field, provide a brief description of what the data is and where it came from. You can include the data dictionary in the notebook or in a separate file.

## Step 5: Complete Project Write Up
* Clearly state the rationale for the choice of tools and technologies for the project.
* Propose how often the data should be updated and why.
* Write a description of how you would approach the problem differently under the following scenarios:
 * The data was increased by 100x.
 * The data populates a dashboard that must be updated on a daily basis by 7am every day.
 * The database needed to be accessed by 100+ people.