# Intro

This notebook reads all bronze tables from Mythic via uploader folder, cleans tables and writes tables to silver.


## Change History

<style>
  table {margin-left: 0 !important;}
</style>

| Date    | Author | Description |
| :-------- | :------- | :------- | 
|2025-02-12 | Mclain R |  Created Date|

# Code

## Imports

###### notebookutils
- **mssparkutils**: A utility module in Microsoft Fabric that provides functions for handling file operations, secrets, and other notebook-related tasks within the Spark environment.

###### pyspark.sql.functions
- **col**: A function used to reference a DataFrame column in PySpark expressions, typically for transformations or filtering.
- **F**: A common alias for importing PySpark SQL functions, allowing access to various built-in functions (e.g., F.lit(), F.when(), etc.) for DataFrame transformations.

###### python
- **re**: The regular expressions module used for pattern matching, text parsing, and string manipulation in Python.

In [1]:
from notebookutils import mssparkutils
from pyspark.sql.functions import col
import re
# import pyspark.sql.functions as F
from pyspark.sql import functions as F

StatementMeta(, d7fe9c96-d13e-4eb4-a3a6-a6293bf568f8, 3, Finished, Available, Finished)

## Define Parameters
- none

Note: the following is a parameter cell and will be interpreted by Pipelines as such.

## Reused Functions
- none

## Define Fields

- **workspace_name**: name of workspace


In [2]:
import sempy
import sempy.fabric as fabric

# Get the current workspace ID
workspace_id = fabric.get_workspace_id()
print(f"Workspace ID: {workspace_id}")

# Get the workspace name from the workspace ID
workspace_name = fabric.resolve_workspace_name(workspace_id)
print(f"Workspace Name: {workspace_name}")

StatementMeta(, d7fe9c96-d13e-4eb4-a3a6-a6293bf568f8, 4, Finished, Available, Finished)

Workspace ID: fef507b4-c0af-40cc-9309-b183e59c0547
Workspace Name: Mythic


## Process Data

In [3]:
# Base path containing the folders
base_path = f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/bronze_lakehouse.Lakehouse/Tables"

# List all items in the base path
items = mssparkutils.fs.ls(base_path)

# Get Table paths for additional fields tables
table_paths = [item.path for item in items if item.isDir and (item.name.startswith('nucleus__icrm_additional_account_fields') or item.name.startswith('nucleus__dcrm_additional_account_fields'))]

# Define columns to drop (easier to update later)
columns_to_drop = {
    "nucleus__icrm_additional_account_fields": [
        "bac_code", "parent_bac", "parent_account", "pdn", "account_dba_name",
        "account_id", "dealer_group_code", "dealer_group_name", "dealer_status",
        "garage_package", "garage_current_provider", "f_i_relationship",
        "f_i_dir_of_sales", "region_description"
    ],
    "nucleus__dcrm_additional_account_fields": [  
        "oe_id", "account_name", "account_id", "parent_account", "group_id",
        "dealer_status", "group_name"
    ]
}

# Process each table
for table_path in table_paths:
    table_name = table_path.split('/')[-1]

    try:
        # Read Delta Table from the folder
        df = spark.read.format("delta").load(table_path)
        display(f"Processing: {table_name}")

        # Drop unused columns if defined for this table
        if table_name in columns_to_drop:
            df = df.drop(*columns_to_drop[table_name])

        display(df.head(10))

        # Write to Delta table
        delta_table_path = f"abfss://{workspace_name}@onelake.dfs.fabric.microsoft.com/silver_lakehouse.Lakehouse/Tables/{table_name}"
        df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save(delta_table_path)

    except Exception as e:
        display(f"Skipping {table_name}: {str(e)}")


StatementMeta(, d7fe9c96-d13e-4eb4-a3a6-a6293bf568f8, 5, Finished, Available, Finished)

'Processing: nucleus__dcrm_additional_account_fields'

SynapseWidget(Synapse.DataFrame, 6e45bb3d-3aa7-444c-a38f-7c12a254ade1)

'Processing: nucleus__icrm_additional_account_fields'

SynapseWidget(Synapse.DataFrame, 10ed7d59-0bb4-4ebe-a2f6-d6a435bf66bf)