
## Goal

The objective of this notebook is to ingest GeoPackage data from a storage account (Azure), S3 bucket (AWS), or Unity Catalog volume, and load it into Unity Catalog tables using Apache Sedona readers. The workflow leverages Databricks notebook widgets to parameterize the cloud provider and dataset location.

In [0]:
%run ../get_user

In [0]:
# Getting the current user
user_email = spark.sql("SELECT current_user()").collect()[0][0]
username = get_username_from_email(user_email)
print(username)

In [0]:
catalog_name = "geospatial"

In [0]:
# Transform the geometry column using ST_GeomFromWKB, and overwrites the table in Unity Catalog with Databricks (geobrix) geometry type.
from pyspark.sql.functions import expr
from pyspark.databricks.sql import functions as dbf

schema_tables = {
    "lookups_geobrix": ["boundary_line_ceremonial_counties"],
    "greenspaces_geobrix": ["greenspace_site", "access_point"],
    "networks_geobrix": ["road_link", "road_node"]
}

for schema, tables in schema_tables.items():
    for table_name in tables:
        df = spark.read.table(f"{catalog_name}.{schema}.{table_name}_{username}").withColumn("geometry", expr("ST_SetSrid(ST_GeomFromEWKB(wkb_geometry),27700)"))
        df.write.mode("overwrite").option("overwriteSchema", "true").saveAsTable(f"{catalog_name}.{schema}.{table_name}_{username}")
        print(f"Table {catalog_name}.{schema}.{table_name}_{username} is created, yay!")