<center>

# **Initialize Metadata Table**

</center>

# **Purpose**
#### This notebook initializes the metadata tracking table for the AquaQuiver pipeline. The metadata table stores execution information including:
- Notebook run timestamps
- Row counts for each table processed
- Success/failure status
- Error messages (if any)
- Execution duration

#### Run this notebook once before running your data pipelines to ensure the metadata table exists.

# **Install Dependencies**

In [None]:
!pip install --quiet deltalake==0.18.2

# **Load AquaQuiver Functions**

In [None]:
%run AquaQuiver_functions

# **Initialize Metadata Table**
#### This creates the metadata table if it doesn't exist. If the table already exists, it will skip creation.

In [None]:
# Initialize the metadata table
initialize_metadata_table(database="Base", delta_table="pipeline_metadata")

print("\nMetadata table initialization complete.")
print("The metadata table 'Base.pipeline_metadata' is ready to track pipeline executions.")

# **Query Metadata (Optional)**
#### You can use the following code to query existing metadata records:

In [None]:
from deltalake import DeltaTable
import pyarrow.compute as pc

# Read metadata table
lakehouse_properties = notebookutils.lakehouse.get("Base")
abfss_path = lakehouse_properties["properties"]["abfsPath"]
delta_table_path = f"{abfss_path}/Tables/pipeline_metadata"

aadToken = notebookutils.credentials.getToken('storage')
storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"}

try:
    dt = DeltaTable(delta_table_path, storage_options=storage_options)
    metadata_df = dt.to_pyarrow_table()
    
    if metadata_df.num_rows > 0:
        print(f"Total metadata records: {metadata_df.num_rows}")
        print("\nLast 5 pipeline runs:")
        # Sort by timestamp and show last 5 records
        sorted_indices = pc.sort_indices(metadata_df['run_timestamp'], sort_keys=[("run_timestamp", "descending")])
        last_5 = metadata_df.take(sorted_indices[:5])
        
        # Display as pandas for better formatting
        import pandas as pd
        display(last_5.to_pandas()[['run_timestamp', 'notebook_name', 'table_name', 'layer', 'row_count', 'status', 'duration_seconds']])
    else:
        print("No metadata records found yet. Run your pipelines to start collecting metadata.")
except Exception as e:
    print(f"Could not read metadata table: {str(e)}")