# Advanced Unity Catalog Data Governance & Security Lab

**Scenario:**  
You’re building a production-grade, multi-environment analytics platform for a financial services organization with strict security, audit, and data sharing needs.

**Key Focus:**
- Environment isolation
- Cross-catalog sharing
- Fine-grained and dynamic access (row/column masking)
- Automated retention
- Auditing access
- External data integration
- Marketplace and lineage
---

## 1. Multi-Environment Catalog Structure

Establish dev/test/prod catalogs for isolation.

**Exam Tip:**  
Know why isolation by catalog matters for compliance and devops.

In [0]:
for env in ['dev', 'test', 'prod']:
    spark.sql(f"""
    CREATE CATALOG IF NOT EXISTS fin_{env}
    MANAGED LOCATION 'abfss://data@deassociateadls.dfs.core.windows.net/newcatalog/{env}/finance'
    COMMENT 'Financial {env} environment';
    """)

In [0]:
for env in ['dev', 'test', 'prod']:
    spark.sql(f"""
    CREATE SCHEMA IF NOT EXISTS fin_{env}.trans COMMENT 'Transactional data'
    """)
    spark.sql(f"""
    CREATE SCHEMA IF NOT EXISTS fin_{env}.pii COMMENT 'PII data'
    """)

## 2. Simulate Sensitive Table Creation & Ingestion

Create and ingest PII and non-PII tables.

In [0]:
# PII Table
spark.sql('''
CREATE TABLE IF NOT EXISTS fin_prod.pii.customers (
    customer_id STRING,
    customer_name STRING,
    ssn STRING,
    email STRING
) USING DELTA
''')

In [0]:
# Non-PII Table
spark.sql('''
CREATE TABLE IF NOT EXISTS fin_prod.trans.transactions (
    transaction_id STRING,
    customer_id STRING,
    amount DOUBLE,
    timestamp TIMESTAMP
) USING DELTA
''')

## 3. Column Masking for PII Data

Mask SSN for users not in the `pii_admins` group.

**Advanced Exam Concept:**  
Dynamic views for column masking.

In [0]:
spark.sql('''
CREATE OR REPLACE VIEW fin_prod.pii.customers_masked AS
SELECT
  customer_id,
  customer_name,
  CASE WHEN is_member('pii_admins') THEN ssn ELSE 'XXX-XX-XXXX' END as ssn,
  email
FROM fin_prod.pii.customers
''')

In [0]:
spark.sql('GRANT SELECT ON VIEW fin_prod.pii.customers_masked TO `analysts`')

## 4. Row-Level Security Based on Department/Region

Only allow users to see transactions from their own branch/region using a mapping table and the `current_user()` function.

In [0]:
# Assume a mapping table exists: fin_prod.pii.customer (customer_id)
# Simulate: Create a dynamic view that filters transactions by customer_id

# Check if the table exists
spark.sql("SHOW TABLES IN fin_prod.pii").show()

# Assuming the table exists, create the view
spark.sql('''
CREATE OR REPLACE VIEW fin_prod.trans.secure_transactions AS
SELECT t.*
FROM fin_prod.trans.transactions t
JOIN fin_prod.pii.customers u ON u.customer_id = t.customer_id
WHERE u.customer_id = current_user()
''')

In [0]:
spark.sql('GRANT SELECT ON VIEW fin_prod.trans.secure_transactions TO `regional_analysts`')

## 5. Cross-Catalog Data Sharing: Reporting

Expose production transaction data to a separate reporting catalog (read-only view, no direct table access).

In [0]:
%sql
CREATE CATALOG IF NOT EXISTS reporting
MANAGED LOCATION 'abfss://data@deassociateadls.dfs.core.windows.net/uc/reporting';
CREATE SCHEMA IF NOT EXISTS reporting.finance;
CREATE OR REPLACE VIEW reporting.finance.prod_transactions AS
SELECT * FROM fin_prod.trans.transactions;

In [0]:
spark.sql('GRANT SELECT ON VIEW reporting.finance.prod_transactions TO `reporting_team`')

## 6. Automated Retention Policy Check

Set and verify a 90-day retention on prod transactions for compliance.

In [0]:
spark.sql('''
ALTER TABLE fin_prod.trans.transactions SET TBLPROPERTIES (
  "delta.deletedFileRetentionDuration" = "interval 90 days"
)
''')

In [0]:
# Check the retention property
tbl = spark.sql('DESCRIBE DETAIL fin_prod.trans.transactions').toPandas()
tbl[['name', 'properties']]

## 7. Advanced Audit Query

Show all access events for a specific user/group (requires connection to cloud logs or audit log table, simulated here).

In [0]:
# This is a pattern; replace with your audit log query path if available
# Simulated: Query audit logs for all SELECTs by 'analysts'
try:
    logs = spark.read.json('/path/to/audit_logs')
    logs.filter((logs.actionName == 'select') & (logs.principalGroupName == 'analysts')).display()
except Exception as e:
    print('No audit logs available in this environment.')

In [0]:
spark.sql('GRANT SELECT ON VIEW fin_prod.trans.marketplace_enriched TO `risk_team`')

## 8. External Location & Secure Data Sharing

Register an external location for data exchange and grant usage only to a trusted partner group.

In [0]:
%sql
CREATE EXTERNAL LOCATION IF NOT EXISTS partner_share_new
URL 'abfss://data@deassociateadls.dfs.core.windows.net/uc/share_new'
WITH (STORAGE CREDENTIAL adls_azuremanagedidentity_1748360566649)

In [0]:
spark.sql('GRANT USAGE ON EXTERNAL LOCATION partner_share TO `trusted_partners`')

---
## End of Advanced Scenario

- Validate with group/user changes.
- Use Unity Catalog UI for lineage visualization.
- Review logs for security incidents.
- Clean up as needed.

---
**Exam Practice:**
- Identify when and how to use dynamic views, external locations, cross-catalog sharing.
- Be able to explain the business and compliance reasons for each setup.
