### **Paso 6.2.3 - Ingesta del archivo "constructors.json"**

Nos permite crear e indicar parámetros en tiempo de ejecución

<center><img src="https://i.postimg.cc/NMfJ9t9h/db147.png"></center>

1. Este notebook se ejecutó para el directorio **2021-03-21**. 
2. Igual que los anteriores ejemplos, podemos reutilizarlo para los dos directorios restantes: **2021-03-28** y **2021-04-18**
3. Solamente debemos modificar el parámetro del notebook **p_file_date**

In [None]:
dbutils.widgets.text("p_data_source", "")
v_data_source = dbutils.widgets.get("p_data_source")

In [None]:
v_data_source

Out[2]: 'testing'

In [None]:
dbutils.widgets.text("p_file_date", "2023-06-11")
v_file_date = dbutils.widgets.get("p_file_date")

In [None]:
v_file_date

Out[4]: '2021-03-21'

In [None]:
%run "../includes/configuration"

In [None]:
%run "../includes/common_functions"

#### Paso 1 - Leer el archivo JSON

In [None]:
constructors_schema = "constructorId INT, constructorRef STRING, name STRING, nationality STRING, url STRING"

In [None]:
# El parámetro "raw_folder_path" se encuentra en el notebook "configuration"
# El parámetro "v_file_date" se encuentra en el notebook e indicamos su valor en tiempo de ejecución
constructor_df = spark.read \
.schema(constructors_schema) \
.json(f"{raw_folder_path}/{v_file_date}/constructors.json")
# Esto seria equivalente a la ruta: /mnt/formula1dl/raw/2021-03-21/constructors.json

In [None]:
constructor_df.printSchema()

root
 |-- constructorId: integer (nullable = true)
 |-- constructorRef: string (nullable = true)
 |-- name: string (nullable = true)
 |-- nationality: string (nullable = true)
 |-- url: string (nullable = true)



In [None]:
constructor_df.show(truncate=False)

+-------------+--------------+-----------+-----------+------------------------------------------------------------+
|constructorId|constructorRef|name       |nationality|url                                                         |
+-------------+--------------+-----------+-----------+------------------------------------------------------------+
|1            |mclaren       |McLaren    |British    |http://en.wikipedia.org/wiki/McLaren                        |
|2            |bmw_sauber    |BMW Sauber |German     |http://en.wikipedia.org/wiki/BMW_Sauber                     |
|3            |williams      |Williams   |British    |http://en.wikipedia.org/wiki/Williams_Grand_Prix_Engineering|
|4            |renault       |Renault    |French     |http://en.wikipedia.org/wiki/Renault_in_Formula_One         |
|5            |toro_rosso    |Toro Rosso |Italian    |http://en.wikipedia.org/wiki/Scuderia_Toro_Rosso            |
|6            |ferrari       |Ferrari    |Italian    |http://en.wikipedi

#### Paso 2 - Eliminar las columnas no deseadas

In [None]:
from pyspark.sql.functions import col

In [None]:
constructor_dropped_df = constructor_df.drop(col('url'))

In [None]:
constructor_dropped_df.show(truncate=False)

+-------------+--------------+-----------+-----------+
|constructorId|constructorRef|name       |nationality|
+-------------+--------------+-----------+-----------+
|1            |mclaren       |McLaren    |British    |
|2            |bmw_sauber    |BMW Sauber |German     |
|3            |williams      |Williams   |British    |
|4            |renault       |Renault    |French     |
|5            |toro_rosso    |Toro Rosso |Italian    |
|6            |ferrari       |Ferrari    |Italian    |
|7            |toyota        |Toyota     |Japanese   |
|8            |super_aguri   |Super Aguri|Japanese   |
|9            |red_bull      |Red Bull   |Austrian   |
|10           |force_india   |Force India|Indian     |
|11           |honda         |Honda      |Japanese   |
|12           |spyker        |Spyker     |Dutch      |
|13           |mf1           |MF1        |Russian    |
|14           |spyker_mf1    |Spyker MF1 |Dutch      |
|15           |sauber        |Sauber     |Swiss      |
|16       

#### Paso 3 - Cambiar el nombre de las columnas y añadir "ingestion date"

In [None]:
from pyspark.sql.functions import lit

In [None]:
constructor_renamed_df = constructor_dropped_df.withColumnRenamed("constructorId", "constructor_id") \
                                               .withColumnRenamed("constructorRef", "constructor_ref") \
                                               .withColumn("data_source", lit(v_data_source)) \
                                               .withColumn("file_date", lit(v_file_date))

In [None]:
constructor_renamed_df.show(truncate=False)

+--------------+---------------+-----------+-----------+-----------+----------+
|constructor_id|constructor_ref|name       |nationality|data_source|file_date |
+--------------+---------------+-----------+-----------+-----------+----------+
|1             |mclaren        |McLaren    |British    |testing    |2021-03-21|
|2             |bmw_sauber     |BMW Sauber |German     |testing    |2021-03-21|
|3             |williams       |Williams   |British    |testing    |2021-03-21|
|4             |renault        |Renault    |French     |testing    |2021-03-21|
|5             |toro_rosso     |Toro Rosso |Italian    |testing    |2021-03-21|
|6             |ferrari        |Ferrari    |Italian    |testing    |2021-03-21|
|7             |toyota         |Toyota     |Japanese   |testing    |2021-03-21|
|8             |super_aguri    |Super Aguri|Japanese   |testing    |2021-03-21|
|9             |red_bull       |Red Bull   |Austrian   |testing    |2021-03-21|
|10            |force_india    |Force In

In [None]:
# La función "add_ingestion_date()" se encuentra en el notebook "common_functions"
constructor_final_df = add_ingestion_date(constructor_renamed_df)

In [None]:
constructor_final_df.show(truncate=False)

+--------------+---------------+-----------+-----------+-----------+----------+-----------------------+
|constructor_id|constructor_ref|name       |nationality|data_source|file_date |ingestion_date         |
+--------------+---------------+-----------+-----------+-----------+----------+-----------------------+
|1             |mclaren        |McLaren    |British    |testing    |2021-03-21|2023-06-17 20:25:34.581|
|2             |bmw_sauber     |BMW Sauber |German     |testing    |2021-03-21|2023-06-17 20:25:34.581|
|3             |williams       |Williams   |British    |testing    |2021-03-21|2023-06-17 20:25:34.581|
|4             |renault        |Renault    |French     |testing    |2021-03-21|2023-06-17 20:25:34.581|
|5             |toro_rosso     |Toro Rosso |Italian    |testing    |2021-03-21|2023-06-17 20:25:34.581|
|6             |ferrari        |Ferrari    |Italian    |testing    |2021-03-21|2023-06-17 20:25:34.581|
|7             |toyota         |Toyota     |Japanese   |testing 

#### Paso 4 - Escribir datos en el datalake como delta y crear la tabla **constructors** en la base de datos **f1_processed** (Delta Managed table)

In [None]:
# Escribimos el archivo con formato DELTA en la base de datos "f1_processed" y en la tabla "constructors"
# No es necesario escribir .format("delta")
constructor_final_df.write.mode("overwrite").format("delta").saveAsTable("f1_processed.constructors")

In [None]:
spark.table("f1_processed.constructors").show(truncate=False)

+--------------+---------------+-----------+-----------+-----------+----------+-----------------------+
|constructor_id|constructor_ref|name       |nationality|data_source|file_date |ingestion_date         |
+--------------+---------------+-----------+-----------+-----------+----------+-----------------------+
|1             |mclaren        |McLaren    |British    |testing    |2021-03-21|2023-06-17 20:25:35.289|
|2             |bmw_sauber     |BMW Sauber |German     |testing    |2021-03-21|2023-06-17 20:25:35.289|
|3             |williams       |Williams   |British    |testing    |2021-03-21|2023-06-17 20:25:35.289|
|4             |renault        |Renault    |French     |testing    |2021-03-21|2023-06-17 20:25:35.289|
|5             |toro_rosso     |Toro Rosso |Italian    |testing    |2021-03-21|2023-06-17 20:25:35.289|
|6             |ferrari        |Ferrari    |Italian    |testing    |2021-03-21|2023-06-17 20:25:35.289|
|7             |toyota         |Toyota     |Japanese   |testing 

<center><img src="https://i.postimg.cc/cHVKH6Lh/db178.png"></center>

In [None]:
%sql
SELECT * FROM f1_processed.constructors;

constructor_id,constructor_ref,name,nationality,data_source,file_date,ingestion_date
1,mclaren,McLaren,British,testing,2021-03-21,2023-06-17T20:25:35.289+0000
2,bmw_sauber,BMW Sauber,German,testing,2021-03-21,2023-06-17T20:25:35.289+0000
3,williams,Williams,British,testing,2021-03-21,2023-06-17T20:25:35.289+0000
4,renault,Renault,French,testing,2021-03-21,2023-06-17T20:25:35.289+0000
5,toro_rosso,Toro Rosso,Italian,testing,2021-03-21,2023-06-17T20:25:35.289+0000
6,ferrari,Ferrari,Italian,testing,2021-03-21,2023-06-17T20:25:35.289+0000
7,toyota,Toyota,Japanese,testing,2021-03-21,2023-06-17T20:25:35.289+0000
8,super_aguri,Super Aguri,Japanese,testing,2021-03-21,2023-06-17T20:25:35.289+0000
9,red_bull,Red Bull,Austrian,testing,2021-03-21,2023-06-17T20:25:35.289+0000
10,force_india,Force India,Indian,testing,2021-03-21,2023-06-17T20:25:35.289+0000


In [None]:
%sql
DESCRIBE FORMATTED f1_processed.constructors;

col_name,data_type,comment
constructor_id,int,
constructor_ref,string,
name,string,
nationality,string,
data_source,string,
file_date,string,
ingestion_date,timestamp,
,,
# Detailed Table Information,,
Catalog,spark_catalog,


<center><img src="https://i.postimg.cc/k4V6qWqD/db177.png"></center>

In [None]:
dbutils.notebook.exit("Success")

Success