#### Refresh Lakehouse Snapshot

##### Data ingestion strategy:
<mark style="background: #88D5FF;">**REPLACE**</mark>

##### Related pipeline:

**Ext_Load_PBI_Report_Usage_E2E**

##### Source:

**Files** from FUAM_Ext_Lakehouse

##### Target:

**ALL Delta table** in FUAM_Ext_Lakehouse 




In [None]:
## Parameters
display_data = True
lakehouse_name = "FUAM_Ext_lakehouse"

print("Successfully configured all paramaters for this run.")

In [None]:
from pyspark.sql import SparkSession # type: ignore

print("Successfully imported all packages for this notebook.")

In [None]:
#
# Create the Spark session
#
app_name = "RefreshLakehouseSnapshot"

# Get the current Spark session
spark = SparkSession.builder \
    .appName(app_name) \
    .getOrCreate()
spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")

print(f"Spark session {app_name} has been created successfully.")

In [None]:
#
# List tables in the specified lakehouse (assuming the lakehouse name is the database name)
#
spark.catalog.setCurrentDatabase(lakehouse_name)

# Get the list of tables as a DataFrame
tables_df = spark.sql("SHOW TABLES")

# Extract just the table names into a Python list
table_names = [row.tableName for row in tables_df.collect()]

print(f"The list of tables fom lakehouse {lakehouse_name} has been created successfully.")

In [None]:
if display_data:
    display(tables_df)

In [None]:
#
# When Power BI connects to a Fabric Lakehouse in Import mode via the SQL Analytics endpoint, it may query a snapshot of the Delta table 
# that hasn’t yet caught up with the latest physical data update. This is particularly true when:
# 	•	You’re writing to the Lakehouse using notebooks or pipelines.
# 	•	The updates are made via overwrite or non-transactional file-level operations.
# 	•	Power BI’s import query pulls from a delta table snapshot, and the _delta_log has not fully committed or compacted.
#
# ✅ Recommendation
# 	1.	Force a newer snapshot via the OPTIMIZE command after your Lakehouse update step to commit a clean version.
#
for table in table_names:
    print(f"Optimizing table {table} ...")
    spark.sql(f"OPTIMIZE {table}")

print(f"\nAll {len(table_names)} tables in {lakehouse_name} have been committed to a clean version successfully.")

In [None]:
#
# Stop the Spark session
# NOTE: frees up limited F2 SKU capacity resources
#
spark.stop()

print("Spark session has been stopped successfully.")