In [None]:
# Fetch parameters from Data Factory
schemaName=dbutils.widgets.get("schemaName")
tableName=dbutils.widgets.get("tableName")
filePath=dbutils.widgets.get("filePath")

# Create database
spark.sql(f'CREATE SCHEMA IF NOT EXISTS bronze_{schemaName}')

# Drop table
spark.sql(f'DROP TABLE IF EXISTS bronze_{schemaName}.{tableName}')

# Create new external table using latest datetime location
ddl_query = f"""
  CREATE TABLE bronze_{schemaName}.{tableName} 
  USING PARQUET
  LOCATION '/mnt/bronze/
  {schemaName}/{filePath}/{tableName}.parquet'
"""

# Execute query
spark.sql(ddl_query)

Using external tables is simple and straightforward; it does not require data duplication. However, you must keep several considerations in mind:


- Reading from external Parquet tables is generally slower than reading from Delta tables. This performance lag can be attributed to factors like suboptimal partitioning, which affects how data is accessed and processed.


- Unlike managed Delta tables, external Parquet tables lack support for advanced features such as schema enforcement and time travel, which can enhance data management and analytics.