# Data Engineering / Platform Review on Glue Connection

This notebook is to show how Glue can be connected to Databricks through Lakehouse Federation. 


For more info, check out the documentation on running federated queries on AWS Glue Hive ([link](https://docs.databricks.com/aws/en/query-federation/hms-federation-glue)).

## Glue Connectivity


![](images/glue-catalog-overview.png)

![](images/glue-foreign-schema.png)

![](images/glue-connection.png)

## Basic profiling query

In [0]:
USE CATALOG `joy-foreign-glue`;
USE SCHEMA joy_db;

In [0]:
SELECT *
FROM `joy-foreign-glue`.joy_db.joy_bronze_customer_data
LIMIT 10
--ORDER BY order_date DESC;


customer_id,customer_zip_code_prefix,customer_city,customer_state
customer_id,customer_zip_code_prefix,customer_city,customer_state
hCT0x9JiGXBQ,58125,varzea paulista,SP
PxA7fv9spyhx,3112,armacao dos buzios,RJ
g3nXeJkGI0Qw,4119,jandira,SP
EOEsCQ6QlpIg,18212,uberlandia,MG
mVz5LO2Vd6cL,88868,ilhabela,SP
UkqnhxmX7YMP,25902,porto uniao,SC
85jiDiGSfhTu,4762,guarulhos,SP
gDdkaN8b9s1g,75870,mogi-guacu,SP
9Csx6oXlpLl1,69068,bebedouro,SP


In [0]:
%python
# Use the foreign catalog and database that mirrors Glue
spark.sql("USE CATALOG `joy-foreign-glue`")
spark.sql("USE joy_db")

df = spark.table("joy_bronze_customer_data")   # this is the Glue CSV-classified table
display(df.limit(10))

customer_id,customer_zip_code_prefix,customer_city,customer_state
customer_id,customer_zip_code_prefix,customer_city,customer_state
hCT0x9JiGXBQ,58125,varzea paulista,SP
PxA7fv9spyhx,3112,armacao dos buzios,RJ
g3nXeJkGI0Qw,4119,jandira,SP
EOEsCQ6QlpIg,18212,uberlandia,MG
mVz5LO2Vd6cL,88868,ilhabela,SP
UkqnhxmX7YMP,25902,porto uniao,SC
85jiDiGSfhTu,4762,guarulhos,SP
gDdkaN8b9s1g,75870,mogi-guacu,SP
9Csx6oXlpLl1,69068,bebedouro,SP


## Time travel for external catalogs

A foreign catalog is just a virtual mirror of an external database. Databricks does not manage that storage or keep a Delta-style history for it.

Lakehouse Federation provides read-only access. That being said, Databricks is not the system of record and does not rewrite or version data on Glue.

Because Databricks never creates its own snapshots/versions for those foreign tables, there is no Delta/UC history to show, so DESCRIBE HISTORY and Delta time travel are not supported for foreign catalogs like Redshift or Glue.

## UPDATE / DELETE on Foreign Catalogs

Lakehouse Federation foreign catalogs (Redshift, Glue, etc.) are read-only from Databricksâ€™ perspective. So updating and deleting would be prohibited. Also this is true for clustering, compaction, history retention, vacuum, and optimization.

Only SELECT (and some metadata operations like SHOW TABLES) are supported on foreign catalogs.
