# Gold Layer - Fare Amount

Note: This notebook uses Azure Synapse Analytics with PySpark

This notebook contains the process to go from the Silver layer to the Gold layer for fare amounts.

By using domain knowledge, a first approach is to filter the data as follows:

1. Fare amount must be >= 3.00
2. RatecodeID = 1

The columns to keep are:
* pu_month
* pu_year
* pu_year_month
* trip_distance
* trip_duraion_min
* fare_amount

In [9]:
cols_to_keep = ["pu_month", "pu_year", "pu_year_month", "trip_distance", "trip_duration_min", "fare_amount"]

StatementMeta(ExecSmall, 46, 10, Finished, Available, Finished)

In [2]:
import pyspark.sql.functions as F

StatementMeta(ExecSmall, 46, 3, Finished, Available, Finished)

In [3]:
%%pyspark
df = spark.read.load("abfss://yellowtaxicab@jdshammasstorage.dfs.core.windows.net/silver/union", format="parquet")

StatementMeta(ExecSmall, 46, 4, Finished, Available, Finished)

In [10]:
gold_df = df.select(cols_to_keep).filter((F.col("RatecodeID") == 1) & (F.col("fare_amount") >= 3.00) )

StatementMeta(ExecSmall, 46, 11, Finished, Available, Finished)

In [11]:
gold_df.write.partitionBy("pu_year_month").parquet("abfss://yellowtaxicab@jdshammasstorage.dfs.core.windows.net/gold/fare_amount_init", mode='overwrite')

StatementMeta(ExecSmall, 46, 12, Finished, Available, Finished)

Let's slice off a 3 month block of data to work with.

In [12]:
small_gold_df = gold_df.filter(F.col("pu_year_month").isin(["2023-6", "2023-7", "2023-8"]))

StatementMeta(ExecSmall, 46, 13, Finished, Available, Finished)

In [13]:
small_gold_df.coalesce(1).write.parquet("abfss://yellowtaxicab@jdshammasstorage.dfs.core.windows.net/gold/fare_amount_init_6_8_2023", mode='overwrite')

StatementMeta(ExecSmall, 46, 14, Finished, Available, Finished)

In [14]:
small_gold_df.show()

StatementMeta(ExecSmall, 46, 15, Finished, Available, Finished)

+--------+-------+-------------+-------------+------------------+-----------+
|pu_month|pu_year|pu_year_month|trip_distance| trip_duration_min|fare_amount|
+--------+-------+-------------+-------------+------------------+-----------+
|       6|   2023|       2023-6|          3.4|20.883333333333333|       21.9|
|       6|   2023|       2023-6|         10.2|18.716666666666665|       40.8|
|       6|   2023|       2023-6|         9.83|23.433333333333334|       39.4|
|       6|   2023|       2023-6|         1.17| 8.566666666666666|        9.3|
|       6|   2023|       2023-6|          3.6|13.266666666666667|       18.4|
|       6|   2023|       2023-6|         3.08|18.933333333333334|       19.8|
|       6|   2023|       2023-6|          1.1| 8.783333333333333|       10.0|
|       6|   2023|       2023-6|         0.99|              3.95|        6.5|
|       6|   2023|       2023-6|         0.73|              2.85|        5.1|
|       6|   2023|       2023-6|         5.43|16.466666666666665