# Refreshing and Loading a table from a Power BI Semantic Model into a Fabric Lakehouse

##### Been having a ton of fun learning fabric and playing with pySpark. This notebook will:
##### - trigger a PBI semantic model refresh.
##### - wait until the refresh has completed.
##### - load a table from the newly refreshed semantic model into a lakehouse table. 

##### This allows you to schedule a fabric notebook to refresh:
#####   - a PBI calculated table.
#####   - a bunch of PBI reports cross multiple workspaces in and run code once they have all finished.

### Step 1: Import Modules
###### Once you import a module in python you can use the functions and variables within the module in within your code.

Install the SemPy Python library in the notebook kernel and import the SemPy fabric module as "fabric." We will be using this package to interact with the Power BI semantic model we want to load data from. 

In [None]:
%pip install semantic-link
import sempy.fabric as fabric

Import the functions module from the PySpark SQL package. The PySpark.SQL package is provides functions to work with structured data using spark. 

In [None]:
import pyspark.sql.functions as f

Import the time package; we will be using the time.Sleep() function to pause the notebook's execution as we wait for the Power BI table we want to load to finish refreshing.

In [None]:
import time

Import the re package; we will be using this to implement a regex expression to allow us to ensure the Power BI table column names comply with the Fabric Lakehouse's column name requirements.

In [None]:
import re

### Step 2: Setting Variables
###### We are setting variables with the:
- Workspace name the Power BI semantic model we want the data from exists within.
- Semantic model name the table we want the data from.
- Power BI Table name within the Semantic model we want data from.
- Name of the table within the Fabric Lakehouse we want to create or overwrite. 

In [None]:
#Setting the workspace name
workspace = 'Sneaker Workspace'
#Setting the semantic model name
semantic_model = 'Nike Sneaker Calendar'
#Setting the PBI Table name
pbi_source_table = 'Launches'
#Setting the Lakehouse Table name
lakehouse_destination_table = 'launches_export'

### Step 3: Refreshing the Power bi Table
###### Before we load data into the Fabric Lakehouse we need to make sure it is refreshed
We are going to request a refresh of the Power BI table we want to load into the Fabric lakehouse and then use a while loop to check if the dataset is refreshed. We will break out of the loop if:
- Our requested refresh has a status of 'Completed'
- Our requested refresh has a status of 'Failed'
- The loop has been 120% of the average refresh time of the semantic model

If we are going to be scheduling this notebook we want to avoid a never ending loop, and the above conditions will help us with that. 

We use the code below to determine the average refresh duration of the model. Subsequently, we divide that number by 5 to calculate 20% of the average refresh duration.

In [None]:
#getting a pandas dataframe of all the refreshes of the model we are interested in refreshing the table in using the fabric API
refresh_history = fabric.list_refresh_requests(workspace=workspace, dataset=semantic_model)
#Converting the pandas dataframe into a spark dataframe
refresh_history_spark = spark.createDataFrame(refresh_history)
#adding a calculated column to represent the refresh duration of each run
refresh_history_spark = refresh_history_spark.withColumn( \
        'Refresh Duration', \
        (f.col('End Time').cast('long') - f.col('Start Time').cast('long'))
    )
#getting the average duration using the newly created calculated column
refresh_average_duration = refresh_history_spark.agg({'Refresh Duration': 'avg'}).first()[0]
#Calculating 20% of the average refresh time by dividing by five.
time_to_sleep = refresh_average_duration / 5

We will now put the table we want to refresh into a dictionary and then submit it for a refresh

In [None]:
#Create a dictionary with the specific table we want to refresh
table_dictionary = [{'table': pbi_source_table}]
#Use the fabric api to request a refresh of the table using the new dictionary. 
#This will return a refresh request ID which we are stroing in a variable.
refresh_request_id = fabric.refresh_dataset(workspace=workspace, \
                                            dataset=semantic_model, \
                                            objects=table_dictionary)

We will now use a loop to check if the refresh of table has completed. 

In [None]:
#setting a counter to ensure the loop does not run more then 6 times
loop_count = 0
#creating a variable outside of the loop to store the refresh status in
refresh_status = ''
#storing the refresh statuses that will break the loop 
exit_statuses = ['Completed','Failed']
#Creating a while loop
while loop_count < 6:
    #Gets the current status of the power bi refresh
    refresh_status = fabric.get_refresh_execution_details(workspace=workspace,dataset=semantic_model,refresh_request_id=refresh_request_id).status
    #if the refresh status is within the exit status array exiting the loop
    if refresh_status in exit_statuses:
        break
    #Sleeping the code for 20% of the average refresh time
    time.sleep(time_to_sleep)
    #adding 1 to the loop count so the loop will not run endlessly 
    ++loop_count

Then if the table has refreshed the below code will now load the table into the Lakehouse

In [None]:
#checking to see if the refresh has completed
if refresh_status == 'Completed':
    #if the refresh has completed reading the table into a dataframe
    table_fabric_df = fabric.read_table(workspace=workspace, dataset=semantic_model, table=pbi_source_table)
    #changing the table into a spark data frame
    table_spark_df = spark.createDataFrame(table_fabric_df)
    #removing things like spaces from the column names of the data frame
    table_spark_df = table_spark_df.select([f.col(column_name).alias(re.sub('[^0-9a-zA-Z$]+', '', column_name)) for column_name in table_spark_df.columns])
    #loading the table into the lakehouse 
    #note -- I am using a overwrite, write mode you could use an append
    table_spark_df.write.mode('overwrite').option('overwriteSchema', 'true').format('delta').saveAsTable(lakehouse_destination_table)