# Ensuring Governance and security for our C360 lakehouse

Data governance and security is hard when it comes to a complete Data Platform. SQL GRANT on tables isn't enough and security must be enforced for multiple data assets (dashboards, Models, files etc).

To reduce risks and driving innovation, Emily's team needs to:

- Unify all data assets (Tables, Files, ML models, Features, Dashboards, Queries)
- Onboard data with multiple teams
- Share & monetize assets with external Organizations

<style>
.box{
  box-shadow: 20px -20px #CCC; height:300px; box-shadow:  0 0 10px  rgba(0,0,0,0.3); padding: 5px 10px 0px 10px;}
.badge {
  clear: left; float: left; height: 30px; width: 30px;  display: table-cell; vertical-align: middle; border-radius: 50%; background: #fcba33ff; text-align: center; color: white; margin-right: 10px}
.badge_b { 
  height: 35px}
</style>
<link href='https://fonts.googleapis.com/css?family=DM Sans' rel='stylesheet'>
<div style="padding: 20px; font-family: 'DM Sans'; color: #1b5162">
  <div style="width:200px; float: left; text-align: center">
    <div class="box" style="">
      <div style="font-size: 26px;">
        <strong>Team A</strong>
      </div>
      <div style="font-size: 13px">
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/alice.png" style="" width="60px"> <br/>
        Data Analysts<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/marc.png" style="" width="60px"> <br/>
        Data Scientists<br/>
        <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/john.png" style="" width="60px"> <br/>
        Data Engineers
      </div>
    </div>
    <div class="box" style="height: 80px; margin: 20px 0px 50px 0px">
      <div style="font-size: 26px;">
        <strong>Team B</strong>
      </div>
      <div style="font-size: 13px">...</div>
    </div>
  </div>
  <div style="float: left; width: 400px; padding: 0px 20px 0px 20px">
    <div style="margin: 20px 0px 0px 20px">Permissions on queries, dashboards</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on tables, columns, rows</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on features, ML models, endpoints, notebooksâ€¦</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
    <div style="margin: 20px 0px 0px 20px">Permissions on files, jobs</div>
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/horizontal-arrow-dash.png" style="width: 400px">
  </div>
  
  <div class="box" style="width:550px; float: left">
    <img src="https://github.com/databricks-demos/dbdemos-resources/raw/main/images/emily.png" style="float: left; margin-right: 10px;" width="80px"> 
    <div style="float: left; font-size: 26px; margin-top: 0px; line-height: 17px;"><strong>Emily</strong> <br />Governance and Security</div>
    <div style="font-size: 17px; clear: left; padding-top: 10px">
      <ul style="line-height: 2px;">
        <li>Central catalog - all data assets</li>
        <li>Data exploration & discovery to unlock new use-cases</li>
        <li>Permissions cross-teams</li>
        <li>Reduce risk with audit logs</li>
        <li>Measure impact with lineage</li>
      </ul>
      + Monetize & Share data with external organization (Delta Sharing)
    </div>
  </div>
</div>

<!-- Collect usage data (view). Remove it to disable collection or disable tracker during installation. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=lakehouse&org_id=7405609900705693&notebook=%2F02-Data-governance%2F02.1-UC-data-governance-security-churn&demo_name=lakehouse-retail-c360&event=VIEW&path=%2F_dbdemos%2Flakehouse%2Flakehouse-retail-c360%2F02-Data-governance%2F02.1-UC-data-governance-security-churn&version=1">

# Scalable Data Governance with Unity Catalog  

<img src="https://github.com/databricks-demos/dbdemos-resources/blob/main/images/cross_demo_assets/Lakehouse_Demo_Team_architecture_2.png?raw=true" style="float: right" width="500px">


Managing secure, scalable data access is critical. With **Unity Catalog**, the **Lakehouse** enables seamless governance while ensuring teams can collaborate efficiently.  

### The Challenge  
Our data, stored as **Delta Tables**, needs to be secured while remaining accessible to different teams:  
- **Data Engineers** manage and update core datasets.  
- **Data Scientists** read final tables and refine feature sets.  
- **Analysts** explore and transform data within governed schemas.  
- **Access is dynamically masked/anonymized** based on user roles.  

### The Solution: Unity Catalog  
By centralizing access control, **Unity Catalog** enables:  
âœ… Fine-grained **ACLs**  
âœ… **Audit logs** for compliance  
âœ… **Data lineage** for transparency  
âœ… **Easy exploration & discovery**  
âœ… **Seamless data sharing** across teams and organizations (**Delta Sharing**)  

With **Unity Catalog**, teams can confidently manage **governance, security, and collaboration** across workspaces. ðŸš€  

In [0]:
%run ../_resources/00-setup $reset_all_data=false

## Exploring our Customer360 database

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-base-1.png" style="float: right" width="800px"/> 

Let's review the data created.

Unity Catalog works with 3 layers:

* CATALOG
* SCHEMA (or DATABASE)
* TABLE

All unity catalog is available with SQL (`CREATE CATALOG IF NOT EXISTS my_catalog` ...)

To access one table, you can specify the full path: `SELECT * FROM &lt;CATALOG&gt;.&lt;SCHEMA&gt;.&lt;TABLE&gt;`

In [0]:
-- the catalog has been created for your user and is defined as default. 
-- make sure you run the 00-setup cell above to init the catalog to your user. 
-- CREATE CATALOG IF NOT EXISTS dbdemos;
-- USE CATALOG dbdemos;
SELECT CURRENT_CATALOG();


## Let's review the tables we created under our schema

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/retail/resources/images/lakehouse-retail/lakehouse-retail-churn-data-explorer.gif" style="float: right" width="800px"/> 

Unity Catalog provides a comprehensive Data Explorer that you can access on the left menu.

You'll find all your tables, and can use it to access and administrate your tables.

They'll be able to create extra table into this schema.

### Discoverability 

In addition, Unity catalog also provides explorability and discoverability. 

Anyone having access to the tables will be able to search it and analyze its main usage. <br>
You can use the Search menu (âŒ˜ + P) to navigate in your data assets (tables, notebooks, queries...)

In [0]:
SHOW TABLES

In [0]:
-- Let's grant our ANALYSTS a SELECT permission:
-- Note: make sure you created an analysts and dataengineers group first, using the account console.
GRANT SELECT ON TABLE churn_users TO `analysts`;
GRANT SELECT ON TABLE churn_app_events TO `analysts`;
GRANT SELECT ON TABLE churn_orders TO `analysts`;

-- We'll grant an extra MODIFY to our Data Engineer
GRANT SELECT, MODIFY ON SCHEMA demos.dbdemos_retail_c360 TO `dataengineers`;


## PII data masking, row and column-level filtering

In the cells below we will demonstrate how to handle sensitive data through column and row masking.

In [0]:
DROP TABLE churn_users_protected;
CREATE OR REPLACE TABLE churn_users_protected AS SELECT * FROM churn_users;

In [0]:
-- hls_admin group will have access to all data, all other users will see a masked information.
CREATE OR REPLACE FUNCTION simple_mask(column_value STRING)
   RETURN IF(is_account_group_member('retail_admin'), column_value, "****");
   
-- Mask all PII information
ALTER TABLE churn_users_protected ALTER COLUMN email SET MASK simple_mask;
ALTER TABLE churn_users_protected ALTER COLUMN firstname SET MASK simple_mask;
ALTER TABLE churn_users_protected ALTER COLUMN lastname SET MASK simple_mask;
ALTER TABLE churn_users_protected ALTER COLUMN address SET MASK simple_mask;

-- Apply row filter based on the country
CREATE OR REPLACE FUNCTION country_filter(country_param STRING) 
RETURN 
  is_account_group_member('retail_admin') or  -- retail_admin can access all regions (you could do that with another table)
  country_param like "US%";                   -- non retail_admin's can only access regions containing US

ALTER TABLE churn_users_protected SET ROW FILTER country_filter ON (country);

-- ALTER FUNCTION simple_mask OWNER TO `account users`; -- grant access to all user to the function for the demo - don't do it in production
-- ALTER FUNCTION country_filter OWNER TO `account users`; -- grant access to all user to the function for the demo - don't do it in production

SELECT * FROM churn_users_protected


## Going further with Data governance & security

By bringing all your data assets together, Unity Catalog let you build a complete and simple governance to help you scale your teams.

Unity Catalog can be leveraged from simple GRANT to building a complete datamesh organization.

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/lineage/lineage-table.gif" style="float: right; margin-left: 10px"/>

### Fine-grained ACL: row/column level access

Need more advanced control? You can chose to dynamically change your table output based on the user permissions: `dbdemos.intall('uc-01-acl')`

### Secure external location (S3/ADLS/GCS)

Unity Catatalog let you secure your managed table but also your external locations:  `dbdemos.intall('uc-02-external-location')`

### Lineage 

UC automatically captures table dependencies and let you track how your data is used, including at a row level: `dbdemos.intall('uc-03-data-lineage')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).


### Audit log

UC captures all events. Need to know who is accessing which data? Query your audit log:  `dbdemos.intall('uc-04-audit-log')`

This leat you analyze downstream impact, or monitor sensitive information across the entire organization (GDPR).

### Upgrading to UC

Already using Databricks without UC? Upgrading your tables to benefit from Unity Catalog is simple:  `dbdemos.intall('uc-05-upgrade')`

### Sharing data with external organization

Sharing your data outside of your Databricks users is simple with Delta Sharing, and doesn't require your data consumers to use Databricks:  `dbdemos.intall('delta-sharing-airlines')`

# Next: Start building semantic layer with Databricks SQL

Now that these tables are available in our Lakehouse and secured, let's see how we can define our business semantics using metric views.

Jump to the [UC / Metric views notebook]($02.2-UC-metric-views) or [Go back to the introduction]($../00-churn-introduction-lakehouse)