# CREATE AIRPORT LOCATION DATA

This Python script fetches airport data and creates a table containing airport location details using PySpark.

**Fetch Data** → The script reads in a file with airport information.

**Extract Relevant Fields** → The script extracts key details like IATA code, ICAO code, airport name, country, latitude, and longitude.

**Convert to PySpark DataFrame** → The extracted data is transformed into a structured PySpark DataFrame.

**Save as a Table** → The final DataFrame is written to a table in Databricks, making it easy to query and analyze airport locations.

This process automates airport data collection, ensuring up-to-date location information is stored in a structured format for further analysis. 🚀✈️

In [0]:
# NECESSARY LIBRARIES
import requests
import pandas as pd
from pyspark.sql.functions import *
from pyspark.sql.types import *

### LET'S CREATE BRONZE TABLES
THIS SECTION OF CODE READS IN THE FILE. IT CREATES THE RAW DATA TABLE.

In [0]:
# READ IN FILE
df = spark.read.option("delimiter", ":").csv("/Volumes/tabular/dataexpert/josephgabbrielle62095/capstone_flight/GlobalAirportDatabase.txt")

# RENAME COLUMNS
new_columns = ["icao", "iata", "airport", "city", "country", "latitude_degrees", "latitude_minutes", "latitude_seconds", "latitude_direction", "longitude_degrees", "longitude_minutes", "longitude_seconds", "longitude_direction", "altitude", "latitude_decimal_degrees", "longitude_decimal_degrees"]
df = df.toDF(*new_columns)

# LOOK AT THE DATA
display(df)

In [0]:
# WRITE THE DATA TO A BRONZE TABLE
df.write.mode("overwrite").saveAsTable("tabular.dataexpert.josephgabbrielle62095_airport_geocode_bronze")

### LET'S CREATE THE SILVER TABLES
THIS SECTION OF CODE TAKES THE RAW TABLE AND CLEANS IT TO PUT IN A USEABLE FORMAT.

In [0]:
airport_bronze = spark.sql("SELECT * FROM tabular.dataexpert.josephgabbrielle62095_airport_geocode_bronze")

In [0]:
# SELECT CERTAIN COLUMNS
airport_bronze = airport_bronze.select(
    "icao",
    "iata",
    "airport",
    "city",
    "country",
    "latitude_decimal_degrees",
    "longitude_decimal_degrees"
    )

# CHANGE LATITUDE AND LONGITUDE TO DOUBLE TYPE
airport_bronze = airport_bronze.withColumn("latitude_decimal_degrees", col("latitude_decimal_degrees").cast(DoubleType())).withColumn("longitude_decimal_degrees", col("longitude_decimal_degrees").cast(DoubleType()))

# CORRECT ENGLAND MISSPELLING
airport_bronze = airport_bronze.withColumn("country", when(col("country") == "ENGALND", "ENGLAND").otherwise(airport_bronze.country))

display(airport_bronze)

# WRITE TO SILVER TABLE
airport_bronze.write.mode("overwrite").saveAsTable("tabular.dataexpert.josephgabbrielle62095_airport_geocode_silver")

## CHECK SILVER TABLE
THIS CHECKS THE SILVER TABLE BEFORE PUSHING IT TO THE GOLD LEVEL TABLE BY PERFORMING UNIT TESTS. THIS ENSURES BAD DATA WILL BE KEPT OUT OF PRODUCTION AND ANY DATA VISUALIZATIONS.

In [0]:
airport_silver = spark.sql("SELECT * FROM tabular.dataexpert.josephgabbrielle62095_airport_geocode_silver")

# CHECK THAT EVERY COLUMN IS THERE
airport_columns = ["icao", "iata", "airport", "city", "country", "latitude_decimal_degrees", "longitude_decimal_degrees"]

for i in airport_columns:
    if i in airport_silver.columns:
        print(f"Column '{i}' exists in DataFrame")
    else:
        raise ValueError(f"Missing column: {i}")

# CHECK THE DATA ISN'T EMPTY
if airport_silver.count() > 1:
    print("Data found")
else:
    raise ValueError("There is no data!")

# CHECK FOR NULL DATA
columns_to_check = ["icao", "airport", "city", "country", "latitude_decimal_degrees", "longitude_decimal_degrees" ]

# LOOP THROUGH THE COLUMNS TO SEE WHICH ONE HAS NULL DATA
for col_name in columns_to_check:
    if airport_silver.filter(col(col_name).isNull()).limit(1).count() > 0:
        raise ValueError(f"There is a null in the {col_name} column!")

print("No nulls found in the dataset")

display(airport_silver)

## PUSH TO GOLD TABLE
THIS CREATES THE GOLD TABLE. WITH THE UNIT TESTS ABOVE, THE GOLD TABLE SHOULD ONLY INCLUDE READY TO USE DATA.

In [0]:
# CHECK THE SILVER TABLE BEFORE PUSHING TO GOLD TABLE
airport_silver.write.mode("overwrite").saveAsTable("tabular.dataexpert.josephgabbrielle62095_airport_geocode_gold")