# Intro

This notebook reads all bronze tables from Mythic via uploader folder, cleans tables and writes tables to silver.


## Change History

<style>
  table {margin-left: 0 !important;}
</style>

| Date    | Author | Description |
| :-------- | :------- | :------- | 
|2025-03-14 | Mclain R |  Created Date|

# Code

## Imports

###### notebookutils
- **mssparkutils**: A utility module in Microsoft Fabric that provides functions for handling file operations, secrets, and other notebook-related tasks within the Spark environment.

###### pyspark.sql.functions
- **col**: A function used to reference a DataFrame column in PySpark expressions, typically for transformations or filtering.
- **F**: A common alias for importing PySpark SQL functions, allowing access to various built-in functions (e.g., F.lit(), F.when(), etc.) for DataFrame transformations.

###### python
- **re**: The regular expressions module used for pattern matching, text parsing, and string manipulation in Python.

In [2]:
from notebookutils import mssparkutils
from pyspark.sql.functions import col
import re
# import pyspark.sql.functions as F
from pyspark.sql import functions as F

StatementMeta(, f24e469f-354f-4ca6-bc7a-372922e56e7a, 4, Finished, Available, Finished)

## Define Parameters
- none

Note: the following is a parameter cell and will be interpreted by Pipelines as such.

## Reused Functions
- none

## Define Fields

- **workspace_name**: name of workspace


In [3]:
import sempy
import sempy.fabric as fabric

# Get the current workspace ID
workspace_id = fabric.get_workspace_id()
print(f"Workspace ID: {workspace_id}")

# Get the workspace name from the workspace ID
workspace_name = fabric.resolve_workspace_name(workspace_id)
print(f"Workspace Name: {workspace_name}")

StatementMeta(, f24e469f-354f-4ca6-bc7a-372922e56e7a, 5, Finished, Available, Finished)

Workspace ID: fef507b4-c0af-40cc-9309-b183e59c0547
Workspace Name: Mythic


## Process Data

In [4]:
# Base path containing the folders
base_path = f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/bronze_lakehouse.Lakehouse/Tables"

# List all items in the base path
items = mssparkutils.fs.ls(base_path)

# Get Table paths for additional fields tables
table_paths = [item.path for item in items if item.isDir and item.name.startswith('nucleus__sa_usq')]

# Process each table
for table_path in table_paths:
    table_name = table_path.split('/')[-1]

    try:
        # Read Delta Table from the folder
        df = spark.read.format("delta").load(table_path)
        display(f"Processing: {table_name}")

        display(df.head(10))

        # Write to Delta table
        delta_table_path = f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/silver_lakehouse.Lakehouse/Tables/{table_name}"
        df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(delta_table_path)

    except Exception as e:
        display(f"Skipping {table_name}: {str(e)}")

StatementMeta(, f24e469f-354f-4ca6-bc7a-372922e56e7a, 6, Finished, Available, Finished)

'Processing: nucleus__sa_usq'

SynapseWidget(Synapse.DataFrame, 48a1a058-84e6-4b2b-84d7-df967e27ed19)