# <span style="color: blue;">**DF Market** - _A Fabric Playtaset_ </span>

## About the solution
- This notebook will deploy multiple related Fabric items that can be used to generate sample data of variable size for a fictional grocery store chain called "DF Market" (DF for Data Factory)
- This solution intentionally uses multiple Fabric items so that it may be useful both to demo/test Fabric workloads and/or to demo/test data analysis/visualization scenarios
- The resulting model has 4 dimension tables (Stores, Products, Date, and Time) and one fact table called Sales
- Seed values are used in the solution so that the same data will be generated each time for the same input values.
- The data has some useful patterns/fields in it:
    - one store in each city but different sales volume that change at different rates over time
    - sales at each hour differ by day of the week
    - stores have open/close dates and products have launch/discontinue dates
    - cities have lat/long for mapping
    - sales data has 1+ products per transaction for distinct count scenarios
    - stores table has store manager email for RLS and/or report bursting scenarios
    - the Stores, Products, and Date table are created last and are filtered to only include rows that exist in the Sales table

## How to Deploy
- Hit "Run All" above or run each cell ***in order*** 
- Once all items are deployed
    - Open each of the two Dataflows (append sales and replace dims) and click on "Manage Connections" and choose/create a Lakehouse connection. ***Save each Dataflow.*** This updates the connection and since there is a change, the save action published the dataflow. If any issues, make a small change (e.g., add a space in the formula bar) and save again to force a publish.
    - Open the Generate_Data pipeline and review the pipeline Variables (starting month and number of months), also review the Dataflow parameters in the ForEach with the Append Monthly Sales dataflow activity.
    - With the default settings, about 200 million Sales table rows will be generated in about 25 min. **Note this gets very big fast, so don't max out the values!** Do the math. For example, if you double the number of stores and double the transactions per day per store, you should get about 4X more.
    - Refresh the semantic model in the Workspace UI and view the report to see how many rows were created

## Future Plans
- This is V1 and more is planned (e.g., incremental refresh, copy jobs to load data to WH and Eventhouse, materialized lake view aggregation tables w/ updated semantic model)
- Your feedback and ideas are welcomed. Please also share any additional items you create that may also be useful to others.


In [None]:
import requests
import pandas as pd
import sempy.fabric as fabric
from sempy.fabric.exceptions import FabricHTTPException, WorkspaceNotFoundException
import requests
import base64
import json
import time
try:
    import sempy_labs as labs
    print('labs already installed')
except:
    print('installing labs')
    %pip install semantic-link-labs
    import sempy_labs as labs

In [None]:
newids = {
    'Workspace_DFMarket_GUID': '',
    'Lakehouse_DF_Market_LH_GUID': '',
    'Report_DF_Market_Report_GUID': '', 
    'SemanticModel_DF_Market_SM_GUID': '',
    'Notebook_Drop_Create_Sales_Table_GUID': '', 
    'Dataflow_Append_Sales_Table_GUID': '', 
    'Dataflow_Replace_DIM_Tables_GUID': '', 
    'DataPipeline_Generate_Data_GUID': '' ,
    'Lakehouse_DF_Market_LH_SQLEndpoint': '',
    'Lakehouse_DF_Market_LH_DatabaseId': ''
}
thisworkspaceid = spark.conf.get("trident.workspace.id")
newids['Workspace_DFMarket_GUID'] = thisworkspaceid

newids

In [None]:
# Create DF Market Lakehouse
access_token = notebookutils.credentials.getToken("pbi")
headers = {"Authorization": f"Bearer {access_token}",
            "Content-Type": "application/json"}
url = f"https://api.fabric.microsoft.com/v1/workspaces/{thisworkspaceid}/lakehouses"
body = {
  "displayName": "DF_Market_LH",
  "description": "DF Market Lakehouse"
}
response = requests.post(url, headers=headers, json=body)
jsonresponse = response.json()
print(jsonresponse)
lakehouseid = jsonresponse['id']

# Add new LH id to newids
newids['Lakehouse_DF_Market_LH_GUID'] = lakehouseid
newids

In [None]:
# Get Lakehouse SQL Endpoint
time.sleep(30) #gives time to create lakehouse and sql endpoint if "Run All" is used. Comment it out if you run each cell manually and repeat it until you see sqlendpoint and databaseid values in the output.
access_token = notebookutils.credentials.getToken("pbi")
headers = {"Authorization": f"Bearer {access_token}",
            "Content-Type": "application/json"}
url = f"https://api.fabric.microsoft.com/v1/workspaces/{thisworkspaceid}/lakehouses/{lakehouseid}"

response = requests.get(url, headers=headers)
jsonresponse = response.json()
# print(jsonresponse)

# Add new LH info to newids
newids['Lakehouse_DF_Market_LH_SQLEndpoint'] = jsonresponse['properties']['sqlEndpointProperties']['connectionString']
newids['Lakehouse_DF_Market_LH_DatabaseId'] = jsonresponse['properties']['sqlEndpointProperties']['id']
newids

In [None]:
# for troubleshooting
# lakehouseid = 'f80140a9-881b-4437-86fc-39c8baf87aef'
# thisworkspaceid = '17302819-7995-4a37-9d9f-86e2c5d2b2c3'

In [None]:
url = "https://raw.githubusercontent.com/hoosierbi/fileshare/refs/heads/main/DFMarketFiles/V1/DFMarket_V1.json"
deployjson = requests.get(url).text
deploy_df = pd.read_json(deployjson)
deploy_df['ReplaceString'] = deploy_df['type'] + '_' + deploy_df['displayName'].replace(' ', '_') + '_GUID'
deploy_df


In [None]:
# Define Functions
def tobase64(textstring):
    textstring_bytes = textstring.encode("ascii")
    ascii_bytes = base64.b64encode(textstring_bytes)
    base64_output = ascii_bytes.decode("ascii")
    return base64_output

def convertpayloadstobase64(definitionjson):
    asjson = json.loads(definitionjson)
    for load in asjson['parts']:
        load['payload'] = tobase64(load['payload'])
    return asjson

def ReplaceGUIDs(defnstring):
    jsonstring = defnstring # json.dumps(defnstring)
    for guid1 in newids.keys():
        jsonstring = jsonstring.replace(guid1, newids[guid1])
    return jsonstring


# Create Item Function

def CreateItemFromDefinition(wsid, itemname, itemtype, itemdefinition):
    access_token = notebookutils.credentials.getToken("pbi")
    headers = {"Authorization": f"Bearer {access_token}",
                "Content-Type": "application/json"}
    workspaceId = wsid     
    url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspaceId}/items"
    body = {
        "displayName": itemname, 
        "type": itemtype, 
        "definition": itemdefinition
     }  
    response = requests.post(url, headers=headers, json = body)
    # return response.json()
    return response

In [None]:
deploylist =  [
    'Notebook_Drop_Create_Sales_Table_GUID'
    ,'Dataflow_Append_Sales_Table_GUID'
    ,'Dataflow_Replace_DIM_Tables_GUID'
    ,'DataPipeline_Generate_Data_GUID'
    ,'SemanticModel_DF_Market_SM_GUID'
    ,'Report_DF_Market_Report_GUID'
]

access_token = notebookutils.credentials.getToken("pbi")
headers = {"Authorization": f"Bearer {access_token}",
            "Content-Type": "application/json"}

for replacestring in deploylist:
    itemrecord = deploy_df[deploy_df['ReplaceString'] == replacestring]
    definitionstring = itemrecord.iloc[0]['Definition']
    convertedstring = convertpayloadstobase64(ReplaceGUIDs(definitionstring))
    createitem = CreateItemFromDefinition(thisworkspaceid, itemrecord.iloc[0]['displayName'], itemrecord.iloc[0]['type'], convertedstring)
    # createitem = CreateItemFromDefinition(thisworkspaceid, 'SMtest', itemrecord.iloc[0]['type'], convertedstring) # for troubleshooting

    print(createitem.status_code)

    if createitem.status_code in { 200, 201 }:
        newitemid = createitem.json()['id']
        newids[replacestring] = newitemid
        print(replacestring + " - " + newitemid)

    elif createitem.status_code==202:
        while True:
            url = createitem.headers["Location"]
            retry_after = createitem.headers.get("Retry-After",0)
            time.sleep(int(retry_after))

            headers = {"Authorization": f"Bearer {access_token}" }
            createitem = requests.get(url, headers=headers)
            createitem.raise_for_status()

            body = createitem.json()
            status = body["status"]
            if status == "Succeeded":
                url = createitem.headers["Location"]
                createitem = requests.get(url,headers=headers)
                newitemid = createitem.json()['id']
                newids[replacestring] = newitemid
                print(replacestring + " - " + newitemid)
                break

In [None]:
#bind semantic model to new lakehouse
from sempy_labs.directlake import update_direct_lake_model_lakehouse_connection
dataset_name = "DF_Market_SM"
lakehouse_name = "DF_Market_LH"
update_direct_lake_model_lakehouse_connection(dataset=dataset_name,lakehouse=lakehouse_name)