In [0]:
%python
#Decorator in Python
def add_accessories(f_sendcar):
    def wrapper():
        print("3M will decorate your car")
        f_sendcar()
        print("3M have decorated your car")
    return wrapper

@add_accessories
def car():
    #cost=100000
    print("Irfan purchased a new car without decoration")

car()

![](/Workspace/Users/sunilasha625@gmail.com/DataBricksCodeRepo/data_bricks_use_case_workout/6_lakeflow_pipelines/connection_foreign_catalog.png)

###Foreign Catalog Connection
A Foreign Catalog is a Unity Catalog object that allows Databricks to reference and query metadata that lives outside Databricks, in an external metastore or database, without copying the data.

Databricks can securely access an external database (Google Cloud SQL) using Unity Catalogâ€“managed connections and foreign catalogs, without ingesting or copying the data.

**Use Foreign Catalog when:**
- Ad-hoc analysis
- Alternative for Custom JDBC or 3rd party connectors (for data copy)
- Incremental Ingestion (I just need only data from yesterday)
- No need to persist data (If we only want to refer/lookup)
- Real-time lookup (If any changes made in source DB, will be reflected)
- Avoid data duplication (Acts like a shallow copy (without snapshot))

MySQL (shipments table)-  (Foreign Catalog) - Databricks (External Table (foreign catalog)) -> CDC Filter (insert_ts / update_ts) -> Bronze Delta Table 


####1. Source DB Readiness


**Create the following table in source Database**
create database if not exists logistics;

CREATE TABLE logistics.shipments (
  shipment_id INT PRIMARY KEY,
  first_name  VARCHAR(50),
  last_name   VARCHAR(50),
  age         INT,
  role        VARCHAR(50),
  updated_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);


INSERT INTO logistics.shipments VALUES
(5000001,'Rajesh','Kumar',35,'Driver',CURRENT_TIMESTAMP),
(5000002,'Anita','Sharma',29,'Dispatcher',CURRENT_TIMESTAMP),
(5000003,'Michael','Chen',41,'Warehouse Manager',CURRENT_TIMESTAMP),
(5000004,'Suresh',NULL,52,'Loader',CURRENT_TIMESTAMP),
(5000005,'Priya','Iyer',27,'Analyst',CURRENT_TIMESTAMP);


####2. Create Connection & Foreign Catalog
**Foreign Catalog** is a Unity Catalog object that allows Databricks to reference and query metadata that lives outside Databricks, in an external metastore or database, without copying the data.

**Connection**
- Avoid hard-coding usernames/passwords in notebooks
- Enable centralized governance via Unity Catalog
- Allow multiple users and tables to reuse the same connection
- Support SQL-based external access

In [0]:
%sql
--Once for all activity
CREATE CONNECTION gcp_mysql_conn
TYPE mysql
OPTIONS (
  host '34.123.166.158',
  port '3306',
  user 'devuser',
  password 'Incepte@123'
);

In [0]:
%sql
--Activity to create a foreign catalog to refresh/refer the metadata
CREATE FOREIGN CATALOG gcp_mysql_fc1
USING CONNECTION gcp_mysql_conn;

In [0]:
%sql
DESCRIBE CONNECTION gcp_mysql_conn;

In [0]:
%sql
show catalogs;

In [0]:
%sql
select * from gcp_mysql_fc1.practice2.shipments;

####3. Bronze table (Incremental ingestion)
Let me simply use Foreign catalog as a data ingestion mechanism (rather than using traditional JDBC Driver or third party tools to ingest data from Database like datafactory).

In [0]:
%sql
CREATE TABLE IF NOT EXISTS lakehousecat.deltadb.bronze_shipments
USING DELTA
AS SELECT
  shipment_id,
  first_name,
  last_name,
  age,
  role,
  updated_at
FROM gcp_mysql_fc1.practice2.shipments;

In [0]:
%sql
DESCRIBE HISTORY lakehousecat.deltadb.bronze_shipments;

In [0]:
%sql
SELECT * FROM lakehousecat.deltadb.bronze_shipments

Try to achieve SCD Type2 (just like that)<br>
####Insert data into the source Database table and run the incremental load<br>
INSERT INTO logistics.shipments1 VALUES (5000006,'Bala','Chander',35,'DE',CURRENT_TIMESTAMP);<br>
####Update data into the source Database table and run the incremental load<br>
update logistics.shipments1 set role='Databricks Data Engineer',updated_at=CURRENT_TIMESTAMP where shipment_id=5000006;<br>

In [0]:
%sql
SELECT COALESCE(MAX(updated_at), '1970-01-01')
  FROM lakehousecat.deltadb.bronze_shipments

In [0]:
%sql
--Incremental data ingestion/load of newly added/updated data is insert
--Extraction - We are doing Change Data Capture (CDC) (Inserted/updated)
--Load - We are doing Slowly Changing Dimension Type 2
INSERT INTO lakehousecat.deltadb.bronze_shipments
SELECT
  shipment_id,
  first_name,
  last_name,
  age,
  role,
  updated_at
FROM gcp_mysql_fc1.practice2.shipments
WHERE updated_at >
(
  SELECT COALESCE(MAX(updated_at), '1970-01-01')
  FROM lakehousecat.deltadb.bronze_shipments
);


In [0]:
%sql
select * from lakehousecat.deltadb.bronze_shipments

In [0]:
%sql
select *,
row_number() over(partition by shipment_id order by updated_at desc) AS rno 
from lakehousecat.deltadb.bronze_shipments
qualify rno>1;--latest version or history you can access

In [0]:
%sql
--Incremental load MERGE for SCD Type 1
MERGE INTO lakehousecat.deltadb.bronze_shipments AS target
USING gcp_mysql_fc1.practice2.shipments AS source
ON target.shipment_id = source.shipment_id
WHEN MATCHED THEN
UPDATE SET
  target.first_name=source.first_name,
  target.last_name = source.last_name,
  target.age = source.age,
  target.role = source.role,
  target.updated_at = source.updated_at
WHEN NOT MATCHED THEN
INSERT (shipment_id, first_name, last_name, age, role, updated_at)
VALUES (source.shipment_id, source.first_name, source.last_name, source.age, source.role, source.updated_at)
WHEN NOT MATCHED BY SOURCE THEN
DELETE;

In [0]:
%sql
select * from lakehousecat.deltadb.bronze_shipments

Below step is a manual way of doing what
WHEN NOT MATCHED BY SOURCE THEN DELETE does inside MERGE.

In [0]:
%sql
DELETE FROM lakehousecat.deltadb.bronze_shipments
WHERE shipment_id IN (
  SELECT tgt.shipment_id
  FROM lakehousecat.deltadb.bronze_shipments tgt
  LEFT JOIN gcp_mysql_fc1.practice2.shipments src
  ON tgt.shipment_id = src.shipment_id
  WHERE src.shipment_id IS NULL
);

In [0]:
%sql
select * from lakehousecat.deltadb.bronze_shipments