# 🗄️ Unity Catalog Demo: F1 Data Governance & Lineage
*Complete Unity Catalog features demonstration in under 20 cells*

## 🎯 What You'll Learn
We'll create and evolve a comprehensive F1 driver analytics table that demonstrates:

✅ **Data Lineage Tracking** - Visualize Bronze → Silver → Gold transformations  
✅ **Table History & Versioning** - Track changes with Delta Lake operations (INSERT, UPDATE, DELETE, OPTIMIZE)   
✅ **Governance Features** - Tags, comments, and metadata management  
✅ **Impact Analysis** - Understand downstream dependencies  
✅ **3-Level Namespace** - Catalog.schema.table organization

## 📊 Our F1 Medallion Architecture

Let's go to Unity Catalog and take a look at the Lineage tab for main.default.f1_bronze_race_results. You should see something similar to below. Use the + signs to expand the lineage if you're not seeing all tables. If you double-click on a column, you can view column-level lineage as well. 

![](./Images/Lineage.png "Lineage.png")

In [0]:
%sql
-- Verify our F1 medallion architecture tables
SHOW TABLES IN main.default LIKE 'f1_*';

## 🏆 Creating Our Unity Catalog Demo Table

We'll create `f1_driver_analytics_demo` - a comprehensive table that demonstrates:

🔗 **Rich Lineage**: Combines data from multiple gold tables  
📊 **Business Logic**: Advanced analytics and classifications  
🗓️ **Table History**: We'll perform various operations to build history  
🏷️ **Governance**: Tags, comments, and metadata

This table will be our **main focus** for exploring Unity Catalog features in the UI.

In [0]:
%sql
-- Create our main Unity Catalog demo table with rich lineage using SQL

CREATE OR REPLACE TABLE main.default.f1_driver_analytics_demo AS
SELECT
  d.driver AS driver_name,
  d.team AS team_name,
  d.races_entered,
  d.total_points AS driver_points,
  d.wins AS driver_wins,
  d.podiums AS driver_podiums,
  COALESCE(t.total_points, 0) AS team_total_points,
  COALESCE(t.wins, 0) AS team_wins,
  CASE
    WHEN d.wins >= 5 THEN 'Elite'
    WHEN d.wins >= 2 THEN 'Winner'
    WHEN d.podiums >= 5 THEN 'Podium_Regular'
    WHEN d.points_per_race >= 10 THEN 'Strong'
    ELSE 'Developing'
  END AS performance_tier,
  ROUND((CAST(d.wins AS DOUBLE) / d.races_entered) * 100, 2) AS win_rate_pct,
  ROUND((CAST(d.podiums AS DOUBLE) / d.races_entered) * 100, 2) AS podium_rate_pct,
  ROUND(d.points_per_race, 2) AS points_per_race,
  CASE
    WHEN COALESCE(t.wins, 0) >= 8 THEN 'Championship_Team'
    WHEN COALESCE(t.wins, 0) >= 3 THEN 'Winning_Team'
    ELSE 'Developing_Team'
  END AS team_tier,
  current_timestamp() AS created_at,
  'initial_load' AS load_type
FROM main.default.f1_gold_driver_championship d
LEFT JOIN main.default.f1_gold_team_championship t
  ON d.team = t.team;

In [0]:
%sql
-- Preview the analytics view
SELECT driver_name, team_name, performance_tier, win_rate_pct, team_tier
FROM main.default.f1_driver_analytics_demo
LIMIT 8;

In [0]:
%sql
-- Add rich metadata and governance to our demo table
-- This showcases Unity Catalog's governance capabilities

ALTER TABLE main.default.f1_driver_analytics_demo 
SET TBLPROPERTIES (
  'comment' = 'Comprehensive F1 driver and team analytics for Unity Catalog lineage demonstration',
  'data_classification' = 'public',
  'business_domain' = 'sports_analytics',
  'data_source' = 'gold_driver_championship,gold_team_championship',
  'owner_team' = 'data_analytics',
  'business_purpose' = 'driver_performance_analysis_and_team_insights',
  'update_frequency' = 'after_each_race_weekend',
  'data_quality_level' = 'gold',
  'contains_pii' = 'false',
  'retention_period' = '7_years',
  'created_by' = 'unity_catalog_demo',
  'version' = '1.0'
);

-- Add tags for better discoverability
ALTER TABLE main.default.f1_driver_analytics_demo SET TAGS ('formula1' = 'motorsport', 'analytics' = 'performance', 'demo' = 'yes');

You can see both the metadata and tags within the UI as well. If we navigate to the table in the Catalog UI, you'll see tags right there on the table metadata and further metadata on the Details tab.

Table Tags:   
![](./Images/Table Tags.png "Table Tags.png")

Table Details Tab: 
![](./Images/Table Details.png "Table Details.png")

In [0]:
%sql
-- Verify our demo table and its metadata
-- The details below are seen partially in the Overview tab above as well as in Details
DESCRIBE EXTENDED main.default.f1_driver_analytics_demo;

## 🗓️ Building Rich Table History

Now we'll perform various operations to create a rich **Delta Lake history** that Unity Catalog can track:

📝 **INSERT** - Add new driver records  
🔄 **UPDATE** - Update records based on conditions  
❌ **DELETE** - Remove test records  
⚙️ **OPTIMIZE** - Compact files for performance

Each operation creates a new version in Delta Lake, providing a complete audit trail!

**Wait** until the stream is up and running before executing the code below. Once you execute the code below, check the stream and see if anything changed!

In [0]:
%sql
-- INSERT operation: Add some new driver records to simulate data updates
INSERT INTO main.default.f1_driver_analytics_demo
VALUES 
  ('Test Driver 1', 'Test Team Alpha', 5, 45, 1, 2, 120, 3, 'Winner', 20.0, 40.0, 9.0, 'Winning_Team', current_timestamp(), 'manual_insert')

In [0]:
%sql
UPDATE main.default.f1_driver_analytics_demo SET team_tier = 'Developing Team' WHERE team_name = 'Test Team Alpha'

In [0]:
%sql
DELETE FROM main.default.f1_driver_analytics_demo WHERE team_name = 'Test Team Alpha'

In [0]:
%sql
-- OPTIMIZE operation: Compact small files for better performance
OPTIMIZE main.default.f1_driver_analytics_demo;

In [0]:
%sql
-- Show table history (Delta Lake versions) - this is key for Unity Catalog! Pull this up in the UI as well. 
DESCRIBE HISTORY main.default.f1_driver_analytics_demo;

## 🔍 Unity Catalog UI Exploration Guide

### 🎯 **Main Focus Table**: `main.default.f1_driver_analytics_demo`

Now explore this table in the **Unity Catalog UI** to see all the features we've demonstrated:

### 📊 **1. Data Lineage Visualization**
🔗 **Navigate to**: Catalog → main → default → f1_driver_analytics_demo  
🔍 **Click**: "Lineage" tab  
👁️ **Observe**: 
- **Upstream dependencies**: `gold_driver_championship` + `gold_team_championship`
- **Column-level lineage**: Click any column to see its transformation path
- **Transformation logic**: View the JOIN and business logic we applied in the source notebooks or queries

### 🗓️ **2. Table History & Versioning**
📊 **Click**: "History" tab  
👁️ **Observe**:
- **Version timeline**: See all our INSERT, OPTIMIZE operations
- **Operation details**: Each version shows what changed
- **Time travel**: You can query any previous version
- **File statistics**: See how OPTIMIZE improved file structure

### 🏷️ **3. Governance & Metadata**
📝 **Click**: "Details" tab  
👁️ **Observe**:
- **Rich metadata**: All the properties we added
- **Tags**: formula1=motorsport, analytics=performance, demo=yes
- **Business context**: Owner team, data classification, retention period

In [0]:
%sql
-- Create a simple column masking function for data governance demonstration
-- This showcases Unity Catalog's data protection capabilities

-- Create a masking function to partially hide driver names
CREATE OR REPLACE FUNCTION main.default.mask_driver_name(driver_name STRING)
RETURNS STRING
RETURN 
  CASE 
    WHEN IS_ACCOUNT_GROUP_MEMBER('admins') THEN driver_name  -- Admins see full name
    WHEN driver_name IS NULL THEN NULL
    ELSE CONCAT(LEFT(driver_name, 1), '***', RIGHT(driver_name, 1))  -- Others see masked
  END;

-- Apply the masking function to our demo table's driver_name column
ALTER TABLE main.default.f1_driver_analytics_demo 
ALTER COLUMN driver_name SET MASK main.default.mask_driver_name;

-- Test the masking function (you'll see masked names unless you're an account admin)
SELECT driver_name, team_name, performance_tier, driver_points
FROM main.default.f1_driver_analytics_demo 
ORDER BY driver_points DESC 
LIMIT 5;

## 🚀 Advanced Unity Catalog Features to Explore

### 🔍 **Search & Discovery**
- **Global Search**: Use the search bar to find "formula1" related assets
- **Tag-based Discovery**: Search by tags like "analytics" or "performance"
- **Metadata Search**: Find tables by business purpose or owner team

### 📊 **Impact Analysis**
- **Downstream Dependencies**: See what would break if you change `gold_driver_championship`
- **Usage Analytics**: View which notebooks and queries use our tables
- **Schema Evolution**: Track how table schemas change over time with versioning

### 🔒 **Data Governance**
- **Access Control**: Set table and column-level permissions
- **Data Classification**: Mark sensitive data with appropriate tags
- **Audit Logging**: Track who accessed what data when ([system tables](https://docs.databricks.com/aws/en/admin/system-tables/audit-logs))
- **Quality Monitoring**: Set up data quality rules and alerts ([automated data quality monitoring](https://docs.databricks.com/aws/en/lakehouse-monitoring/data-quality-monitoring))

### 🌐 **Data Sharing**
- **Delta Sharing**: Share tables across organizations securely
- **Catalog Federation**: Mirror external data sources
- **Multi-cloud**: Manage data across AWS, Azure, GCP

### 🤖 **AI Integration**
- **AI-Generated Comments**: Use the AI comment generator in the UI
- **Intelligent Recommendations**: Get suggestions for related tables
- **Automated PII Tagging**: AI-powered metadata enrichment for sensitive data masking

## ✅ Unity Catalog Demo Complete!

### 🏆 **What We've Accomplished**

✅ **Rich Data Lineage**: Created `f1_driver_analytics_demo` with dependencies from 2 gold tables  
✅ **Table History**: Built 6+ versions through INSERT, OPTIMIZE operations  
✅ **Comprehensive Governance**: Added metadata, tags, and business context  
✅ **Real-world Scenarios**: Simulated data updates and performance optimizations  
✅ **Production-ready**: Demonstrated enterprise governance capabilities

### 🚀 **Next Steps**
1. **Explore the UI**: Navigate to [main.default.f1_driver_analytics_demo](#table) in Unity Catalog
2. **Try AI Comments**: Use the AI comment generator on table columns
3. **Set Permissions**: Configure access controls for your team
4. **Create Dashboards**: Build analytics using this governed data
5. **Expand Governance**: Add more tags and metadata to other tables

---

**📚 Learn More**: [Unity Catalog Documentation](https://docs.databricks.com/data-governance/unity-catalog/index.html)