In [0]:
%run ../00.set_variables

## 1.2 Unity Catalog


Unity Catalog provides a unified governance layer in Databricks to centrally manage data access, lineage, and security across all data assets — including tables, files, models, and notebooks — with fine-grained, role-based controls.

<div style="text-align: center;">
  <img src="../demo_setup/images/Unity_Catalog.png" width="800px"/> 
</div>

## 1.3. 🔐 Now Lets Secure the Datasets with Unity Catalog

In complex environments, ensuring robust data governance and security across the entire data platform is critical. Traditional approaches — like using SQL GRANT statements on individual tables — fall short. Governance must extend beyond just tables to include files, models, dashboards, features, and queries.

To both minimise risk and enable data-driven innovation, the team must:

- 🔄 Unify all data assets — including tables, files, ML models, features, dashboards, and queries — under a single governance layer
- 👥 Onboard and collaborate across multiple teams, from operations to data science
- 🌐 Securely share selected data and insights with external partners or organisations, while maintaining fine-grained access controls

Unity Catalog enables this unified governance approach across the platform — critical in regulated and high-value environments.

<style>
.box{
  box-shadow: 20px -20px #CCC; height:300px; box-shadow:  0 0 10px  rgba(0,0,0,0.3); padding: 5px 10px 0px 10px;}
.badge {
  clear: left; float: left; height: 30px; width: 30px;  display: table-cell; vertical-align: middle; border-radius: 50%; background: #fcba33ff; text-align: center; color: white; margin-right: 10px}
.badge_b { 
  height: 35px}
</style>
<link href='https://fonts.googleapis.com/css?family=DM Sans' rel='stylesheet'>
<div style="padding: 20px; font-family: 'DM Sans'; color: #1b5162">
  <div style="width:200px; float: left; text-align: center">
    <div class="box" style="">
      <div style="font-size: 26px;">
        <strong>Team A</strong>
      </div>
      <div style="font-size: 13px">
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/da.png" style="" width="60px"> <br/>
        Data Analysts<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/ds.png" style="" width="60px"> <br/>
        Data Scientists<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/de.png" style="" width="60px"> <br/>
        Data Engineers
      </div>
    </div>
    <div class="box" style="height: 80px; margin: 20px 0px 50px 0px">
      <div style="font-size: 26px;">
        <strong>Team B</strong>
      </div>
      <div style="font-size: 13px">...</div>
    </div>
  </div>
  <div style="float: left; width: 400px; padding: 0px 20px 0px 20px">
    <div style="margin: 20px 0px 0px 20px">Permissions on queries, dashboards</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on tables, columns, rows</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on features, ML models, endpoints, notebooks…</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on files, jobs</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
  </div>
  
  <div class="box" style="width:550px; float: left">
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/gov.png" style="float: left; margin-right: 10px;" width="80px"> 
    <div style="float: left; font-size: 26px; margin-top: 0px; line-height: 17px;"><strong>Emily</strong> <br />Governance and Security</div>
    <div style="font-size: 18px; clear: left; padding-top: 10px">
      <ul style="line-height: 2px;">
        <li>Central catalog - all data assets</li>
        <li>Data exploration & discovery to unlock new use-cases</li>
        <li>Permissions cross-teams</li>
        <li>Reduce risk with audit logs</li>
        <li>Measure impact with lineage</li>
      </ul>
      + Share data with external organization (Delta Sharing)
    </div>
  </div>
</div>

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=1444828305810485&notebook=%2F02-Data-governance%2F02-UC-data-governance-security-churn&demo_name=lakehouse-retail-c360&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-retail-c360%2F02-Data-governance%2F02-UC-data-governance-security-churn&version=1">

### Exploring our Iron Ore Processing Catalog

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-base-1.png" style="float: right" width="800px"/> 

🧭 Exploring Our Iron Ore Processing Catalog

Now that we’ve processed the data, let’s take a look at how it’s organised in Unity Catalog — the unified governance layer for all data assets in Databricks.

Unity Catalog follows a three-tiered structure:

- 🗂️ CATALOG – The top-level container, typically used to separate environments or domains (e.g., iron_ore_demo)
- 📁 SCHEMA (or DATABASE) – A logical grouping of related tables within a catalog (e.g., raw, silver, gold)
- 📊 TABLE – The actual datasets used in queries and pipelines (e.g., flotation_data, lab_results)

You can manage all of this directly with SQL — for example:

`CREATE CATALOG IF NOT EXISTS my_catalog ...`

This structure makes it easy to organise, secure, and discover datasets across the processing lifecycle.

In [0]:
spark.sql(f"USE CATALOG {catalog_name}")
spark.sql(f"USE SCHEMA {schema_name}")


### Let's review the tables we created under our schema


Unity Catalog provides a comprehensive Data Explorer that you can access on the left menu.

You'll find all your tables, and can use it to access and administrate your tables.

They'll be able to create extra table into this schema.

#### Discoverability 

In addition, Unity catalog also provides explorability and discoverability. 

Anyone having access to the tables will be able to search it and analyze its main usage. <br>
You can use the Search menu (⌘ + P) to navigate in your data assets (tables, notebooks, queries...)

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-churn-data-explorer.gif" style="float: right" width="800px"/> 


In [0]:
display(spark.sql("SHOW TABLES"))

#### 🛡️ Data Classification, PII Scanning and Anomaly Detection
Unity Catalog support Automatic and wide ranging PII scanning, using a combination of GenAI and ML models to identify and classify both sensitive data, as well as anomalies in Data.

##### Data Classification Dashboard

![](https://docs.databricks.com/aws/en/assets/images/data-classification-dashboard-overview-be8575d1deec9cba00a5663139cbd016.png)




In [0]:
host_url = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get()
#https://e2-demo-field-eng.cloud.databricks.com/explore/data/jack_demos?o=1444828305810485&activeTab=details
displayHTML(f"<b style='font-size:24px;'>Classification Settings URL:</b> <a href='{host_url}/explore/data/{catalog_name}?activeTab=details' target='_blank' style='font-size:24px;'>'{host_url}/explore/data/{catalog_name}?activeTab=details'</a>")



#### 🛡️ PII Masking, Row-Level Security & Column-Level Filtering with Unity Catalog

The `gold_iron_ore_prediction_dataset` table contains personally identifiable information (PII), such as operator names. In the cells below, we’ll demonstrate how to protect sensitive data using column- and row-level masking techniques.

In [0]:
display(spark.sql("SELECT * FROM gold_iron_ore_prediction_dataset"))

In [0]:
spark.sql(f"CREATE OR REPLACE TABLE gold_iop_features_version_demo AS SELECT * FROM gold_iron_ore_prediction_dataset")
spark.sql(f"CREATE OR REPLACE TABLE gold_iop_features_protected AS SELECT * FROM gold_iron_ore_prediction_dataset")

In [0]:
spark.sql("""
-- hls_admin group will have access to all data, all other users will see a masked information.
CREATE OR REPLACE FUNCTION simple_mask(column_value STRING)
   RETURN IF(is_account_group_member('really_important_group'), column_value, "****")
   """)
   
spark.sql("""
-- Mask all PII information
ALTER TABLE gold_iop_features_protected ALTER COLUMN operator_name SET MASK simple_mask
""")

spark.sql("""
ALTER TABLE gold_iop_features_version_demo ALTER COLUMN operator_name SET MASK simple_mask
""")

spark.sql("""
-- Apply row filter based on the country
CREATE OR REPLACE FUNCTION date_filter(date_param TIMESTAMP) 
RETURN 
  date_param > "2017-09-09T20:00:00.000+00:00"                  -- Filter rows for records with a certain date
""")

spark.sql("""
ALTER TABLE gold_iop_features_protected SET ROW FILTER date_filter ON (date)
""")

display(spark.sql("""
SELECT * FROM gold_iop_features_protected
"""))

#### 🕵️ Auditing Your Features with Time Travel
With our feature table now secured, we can take advantage of powerful capabilities provided by Delta Tables and Unity Catalog, such as table versioning and time travel.

In this step, we’ll create a new table to demonstrate how you can:
- 🔍 Audit and review changes to a feature set over time
- ♻️ Restore a table to a previous version if needed — a critical feature for debugging, governance, and compliance

These capabilities make it easy to track feature evolution, support reproducibility, and maintain trust in your machine learning pipelines.

In [0]:
spark.sql("""
          CREATE OR REPLACE TABLE gold_iop_features_version_demo AS SELECT * FROM gold_iron_ore_prediction_dataset
          """)

##### Now lets delete some data by "accident"

Dont worry, we can recover it with the power of Time Travel and Table versions!

In [0]:
spark.sql("""
    --Uh oh... I've deleted everything from my table!
    DELETE FROM gold_iop_features_version_demo
""")

spark.sql("""
    --Lets take a look at what happened to the table - Notice the delete operation
    DESCRIBE HISTORY gold_iop_features_version_demo
    """)

##### Restore to a Previous Version of the Table

Luckily we can undo my mistake by restoring the table to its version before the delete Operation. 

We can also restore back to a point in time as well!

`RESTORE TABLE gold_iop_features_version_demo TO TIMESTAMP AS OF '2025-05-01T06:00:00';`

In [0]:
spark.sql("""
  --Restoring is as easy as going back to a previous version
  RESTORE TABLE gold_iop_features_version_demo TO VERSION AS OF 0
""")

display(spark.sql("""
  --Huzzah! It is restored!
  SELECT * FROM gold_iop_features_version_demo
"""))

#### Going further with Data Governance & Security

By bringing all your data assets together, Unity Catalog let you build a complete and simple governance to help you scale your teams.

Unity Catalog can be leveraged from simple GRANT to building a complete datamesh organization.

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/lineage/lineage-table.gif" style="float: right; margin-left: 10px"/>

##### Fine-grained ACL: row/column level access

Need more advanced control? You can chose to dynamically change your table output based on the user permissions: `dbdemos.intall('uc-01-acl')`

##### Secure external location (S3/ADLS/GCS)

Unity Catatalog let you secure your managed table but also your external locations:  `dbdemos.intall('uc-02-external-location')`

##### Lineage 

UC automatically captures table dependencies and let you track how your data is used, including at a row level: `dbdemos.intall('uc-03-data-lineage')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).


##### Audit log

UC captures all events. Need to know who is accessing which data? Query your audit log:  `dbdemos.intall('uc-04-audit-log')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).


##### Sharing data with external organization

Sharing your data outside of your Databricks users is simple with Delta Sharing, and doesn't require your data consumers to use Databricks:  `dbdemos.intall('delta-sharing-airlines')`