# Data Loading and Joining in Databricks

This notebook demonstrates how to load data from different tables in Databricks and perform joins to create a fact table.

## Steps

1. **Load Orders Data**  
   Load the orders data from the silver layer.

2. **Load Dimension Tables**  
   Load customer and product dimension tables from the gold layer.

3. **Join DataFrames**  
   Join the orders data with the customer and product dimensions to create a fact table.


In [0]:
df = spark.sql("select * from databricks_catalog.silver.orders_silver")

df.display()

In [0]:
df_dim_cus = spark.sql("select DimCustomerKey, customer_id as dim_customer_id from databricks_catalog.gold.dimcustomers")

df_dim_pro = spark.sql("select product_id as dim_product_id from databricks_catalog.gold.dim_products")


In [0]:

df_fact = df.join(df_dim_cus, df['customer_id'] == df_dim_cus['dim_customer_id'], 'left').join(df_dim_pro, df['product_id'] == df_dim_pro['dim_product_id'], 'left')

display(df_fact)

In [0]:
from delta.tables import DeltaTable

In [0]:
if spark.catalog.tableExists("databricks_catalog.gold.gold_orders"):
    dlt_obj = DeltaTable.forName(spark, "databricks_catalog.gold.gold_orders")

    dlt_obj.alias("t").merge(
        df_fact.alias("s"),
        "t.order_id = s.order_id AND t.DimCustomerKey = s.DimCustomerKey AND t.dim_customer_id = s.dim_customer_id AND t.dim_product_id = s.dim_product_id" )\
            .whenMatchedUpdateAll()\
                .whenNotMatchedInsertAll()\
                    .execute()
else:
    df_fact.write.format("delta")\
        .option("path","abfss://gold@azuresadatalake.dfs.core.windows.net/FactOrders")\
        .saveAsTable("databricks_catalog.gold.gold_orders")



In [0]:
%sql
select * from databricks_catalog.gold.gold_orders