# Lakehouse architecture with Synapse Link - Facts and Dims tables
Below is a small demo of a lakehouse architecture utilising delta parquet files. We use PySpark code in a notebook so the example is more programmer friendly.

We will use Dynamics products and customers data from data lake to do lookups and joins to enrich this raw data or bronze delta table and create more refined tables, or silver delta table. Finally do some aggregation and create a Gold delta table and do some basic analytics right within this notebook.

We first read the delta parquet data file into a data frame and display 10 rows

In [1]:
#Set up file path for delta files
itempath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/inventtable_partitioned/"
df_item = spark.read.load(itempath, format='delta')
display(df_item.limit(10))

StatementMeta(SparkNB, 53, 2, Finished, Available)

SynapseWidget(Synapse.DataFrame, 648b274e-9d8f-433a-a117-14b0bd89f28a)

In [2]:
#Set up file path for delta files
productPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/ecoresproduct_partitioned/"
df_product = spark.read.load(productPath, format='delta')
display(df_product.limit(10))

StatementMeta(SparkNB, 53, 3, Finished, Available)

SynapseWidget(Synapse.DataFrame, c9123e62-978a-47a6-a4c2-c4ca05d3c42f)

In [3]:
#Set up file path for delta files
productTransPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/ecoresproducttranslation_partitioned/"
df_productTrans = spark.read.load(productTransPath, format='delta')
display(df_productTrans.limit(10))

StatementMeta(SparkNB, 53, 4, Finished, Available)

SynapseWidget(Synapse.DataFrame, 81aeb629-cf72-4a20-97fc-89b060f09947)

In [4]:
#create joined dataframe 
df_item.createOrReplaceTempView("vw_items")
df_product.createOrReplaceTempView("vw_products")
df_productTrans.createOrReplaceTempView("vw_producttrans")

df_joineditem = spark.sql("""

	select
		it.Recid as ProductId, 
		p.displayproductnumber as ProductNumber, 
		pt.name ProductName,
		it.itemid as ItemNumber,
		it.namealias ItemShortName,
		it.dataareaid LegalEntity
	from vw_items as it
	left outer join vw_products p on p.recid = it.product
	left outer join vw_producttrans pt on it.product = pt.product and pt.languageid = 'en-us'

""")
display(df_joineditem.limit(10))

StatementMeta(SparkNB, 53, 5, Finished, Available)

SynapseWidget(Synapse.DataFrame, f239b700-aa13-4137-9ed6-c90b4b9716d4)

In [5]:
df_joineditem.createOrReplaceTempView("vw_JoinedProducts")

StatementMeta(SparkNB, 53, 6, Finished, Available)

In [6]:
%%sql
select * from vw_JoinedProducts limit 10

StatementMeta(SparkNB, 53, 7, Finished, Available)

<Spark SQL result set with 10 rows and 6 fields>

**Next, we save the refined dataframe as a table in lake database SL_GoldServerless**

In [8]:
#Save dataframe as Dim Table
df_joineditem.write.mode("overwrite").format("delta").saveAsTable("SL_GoldServerless.Products_Dim")

StatementMeta(SparkNB, 53, 9, Finished, Available)

**Alternative to saving as a lake database table is to save your refined tables as delta files**

In [None]:
#set paths for silver or gold tables
productSilverTablePath = 'abfss://containername@datalakename.dfs.core.windows.net/lakehouse2/productSilverTable'
#Save as delta file
df_joineditem.write.mode("overwrite").format("delta").option("overwriteSchema", "true").save(productSilverTablePath)

StatementMeta(, , , Cancelled, )

**Also, we can do some aggregations and save them as tables too**

In [None]:
df_productAgg = df_joineditem.groupBy("LegalEntity").count().sort("LegalEntity")
df_productAgg.show()

StatementMeta(, , , Cancelled, )

**Lets now do the same with customers data**

In [None]:
#Set up file path for delta files
customerBronzePath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/custtable_partitioned/"
df_cust = spark.read.load(customerBronzePath, format='delta')
display(df_cust.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Set up file path for delta files
partyTablePath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/dirpartytable_partitioned/"
df_dirpartytable = spark.read.load(partyTablePath, format='delta')
display(df_dirpartytable.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Set up file path for delta files
partyTableLocationPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/dirpartylocation_partitioned/"
df_dirpartylocation = spark.read.load(partyTableLocationPath, format='delta')
display(df_dirpartylocation.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Set up file path for delta files
lpaPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/logisticspostaladdress_partitioned/"
df_lpa = spark.read.load(lpaPath, format='delta')
display(df_lpa.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Set up file path for delta files
llPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/logisticslocation_partitioned/"
df_ll = spark.read.load(llPath, format='delta')
display(df_ll.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Set up file path for delta files
leaPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/logisticselectronicaddress_partitioned/"
df_lea = spark.read.load(leaPath, format='delta')
display(df_lea.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#create joined dataframe 
df_cust.createOrReplaceTempView("vw_customers")
df_dirpartytable.createOrReplaceTempView("vw_partytable")
df_dirpartylocation.createOrReplaceTempView("vw_partylocation")
df_lpa.createOrReplaceTempView("vw_lpa")
df_ll.createOrReplaceTempView("vw_ll")
df_lea.createOrReplaceTempView("vw_lea")

df_joinedcustomers = spark.sql("""

	select 
		ct.RECID as CustomerId
		,ct.accountnum as AccountNumber
		,ct.dataareaid as LegalEntity
		,dpt.name as Name
        ,dpt.namealias as SearchName
        ,lpa.countryregionid as Country
        ,lpa.state as State
        ,lpa.city as City
        ,lpa.district as District
        ,lpa.street as Street
        ,lpa.zipcode as ZipCode
        ,lea_phone.locator as PhoneNumber
        ,lea_email.locator as Email
	from vw_customers ct
	join vw_partytable dpt on ct.party = dpt.recid
	left outer join vw_partylocation dpl_lpa on dpl_lpa.party =  dpt.recid and dpl_lpa.isprimary = 1 and dpl_lpa.ispostaladdress = 1 
	left outer join vw_lpa lpa on dpl_lpa.location = lpa.location and lpa.validto > current_date()
	left outer join vw_partylocation dpl_lea on dpl_lea.party =  dpt.recid and dpl_lea.isprimary = 1 and dpl_lea.ispostaladdress = 0
	left outer join vw_ll ll_lea on ll_lea.recid = dpl_lea.location
	left outer join vw_lea lea_phone on lea_phone.location = ll_lea.recid and lea_phone.type = 1
	left outer join vw_lea lea_email on lea_email.location = ll_lea.recid and lea_email.type = 2

""")
display(df_joinedcustomers.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
df_joinedcustomers.createOrReplaceTempView("vw_JoinedCustomers")

StatementMeta(, , , Cancelled, )

In [None]:
%%sql
select * from vw_JoinedCustomers limit 10

StatementMeta(, , , Cancelled, )

In [None]:
#Save dataframe as Dim Table
df_joinedcustomers.write.mode("overwrite").format("delta").saveAsTable("SL_GoldServerless.Customers_Dim")

StatementMeta(, , , Cancelled, )

In [None]:
df_customerAgg = df_joinedcustomers.groupBy("LegalEntity").count().sort("LegalEntity")
df_customerAgg.show()

StatementMeta(, , , Cancelled, )

**Another benefit of Synapse link is you get Datavese and FO data all in one workspace**

In [None]:
#Set up file path for contact table from Dataverse
contactBronzePath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/contact_partitioned/"
df_contact = spark.read.load(contactBronzePath, format='delta')
display(df_contact.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
# rename columns that we need and create a new dataframe
df_contactSilver =  df_contact.selectExpr(
    'msdyn_contactpersonid AS ContactID',
    'address1_country AS Country',
    'address1_telephone1 AS Phone',
    'emailaddress1 AS Email',
    'fullname AS FullName')

display(df_contactSilver.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
df_contactSilver.createOrReplaceTempView("vw_contacts")
df_joinedcustomers.createOrReplaceTempView("vw_FOcustomers")

df_unifiedcustomers = spark.sql("SELECT co.*, cu.* FROM vw_contacts co INNER JOIN vw_FOcustomers cu ON co.ContactID == cu.AccountNumber") \

display(df_unifiedcustomers.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Set up file path for sales orders Fact table
salesPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/salestable_partitioned/"
df_sales = spark.read.load(salesPath, format='delta')
display(df_sales.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
# rename columns that we need and create a new dataframe
df_salesSilver =  df_sales.selectExpr(
    'salesid AS SalesID',
    'custaccount AS CustomerNum',
    'deliverydate AS DelvDate',
    'inventlocationid AS Store',
    'smmsalesamounttotal AS TotalAmount')
    
display(df_salesSilver.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Save dataframe as Fact Table
df_salesSilver.write.mode("overwrite").format("delta").saveAsTable("SL_GoldServerless.Sales_Fact")

StatementMeta(, , , Cancelled, )

In [None]:
#Set up file path for sales orders Fact table
retailsalesPath = "abfss://containername@datalakename.dfs.core.windows.net/deltalake/retailtransactionsalestrans_partitioned/"
df_retailsales = spark.read.load(retailsalesPath, format='delta')
display(df_retailsales.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
# rename columns that we need and create a new dataframe
df_retailsalesSilver =  df_retailsales.selectExpr(
    'store AS Store',
    'costamount AS Cost',
    'custaccount AS CustomerNum',
    'itemid AS ItemID',
    'netamountincltax AS NetAmountInclTax',
    'price AS Price',
    'qty AS Qty',
    'staffid AS StaffID',
    'transdate AS TransDate')
    
display(df_retailsalesSilver.limit(10))

StatementMeta(, , , Cancelled, )

In [None]:
#Save dataframe as Fact Table
df_retailsalesSilver.write.mode("overwrite").format("delta").saveAsTable("SL_GoldServerless.RetailSales_Fact")

StatementMeta(, , , Cancelled, )