%markdown
# 🗄️ Unity Catalog Demo: F1 Data Governance & Lineage
*Complete Unity Catalog features demonstration in under 20 cells*

## 🎯 What You'll Learn

✅ **Data Lineage Tracking** - Visualize Bronze → Silver → Gold transformations  
✅ **Table History & Versioning** - Track changes with Delta Lake operations  
✅ **Governance Features** - Tags, comments, and metadata management  
✅ **Impact Analysis** - Understand downstream dependencies  
✅ **3-Level Namespace** - Catalog.schema.table organization

## 🏎️ Demo Focus: F1 Driver Analytics
We'll create and evolve a comprehensive F1 driver analytics table that demonstrates:
* **Rich lineage** from multiple source tables
* **Table history** through various operations (INSERT, UPDATE, DELETE, OPTIMIZE)
* **Governance metadata** with tags and business context
* **Real-world scenarios** for production data management

%markdown
## 📊 Our F1 Medallion Architecture

First, let's explore the existing F1 data from previous notebooks:

```
Bronze Layer (Raw)     →    Silver Layer (Cleaned)    →    Gold Layer (Analytics)
┌─────────────────┐         ┌─────────────────┐           ┌─────────────────┐
│ bronze_drivers  │────────▶│ silver_race_    │──────────▶│ gold_driver_    │
│ bronze_results  │         │ results         │           │ championship    │
│ bronze_races    │         │ silver_         │           │ gold_team_      │
│ bronze_sprint_* │         │ qualifying_*    │           │ championship    │
└─────────────────┘         └─────────────────┘           └─────────────────┘
```

**Our Demo Strategy:**
1. Use existing gold tables as foundation
2. Create enhanced analytics table with rich lineage
3. Perform various operations to build table history
4. Add comprehensive governance metadata

In [0]:
%sql
-- Verify our F1 medallion architecture tables
SHOW TABLES IN main.default;

-- Focus on our gold layer for the demo
SELECT 'Gold Tables Available:' as status;
SHOW TABLES IN main.default LIKE '*gold*';

%sql
-- Examine our main gold table structure and sample data
DESCRIBE main.default.gold_driver_championship;

-- Preview top F1 drivers with key metrics
SELECT driver, team, races_entered, total_points, wins, podiums, 
       ROUND(points_per_race, 1) as points_per_race,
       processed_at
FROM main.default.gold_driver_championship 
ORDER BY total_points DESC 
LIMIT 10;

## 🏆 Creating Our Unity Catalog Demo Table

We'll create `f1_driver_analytics_demo` - a comprehensive table that demonstrates:

🔗 **Rich Lineage**: Combines data from multiple gold tables  
📊 **Business Logic**: Advanced analytics and classifications  
🗓️ **Table History**: We'll perform various operations to build history  
🏷️ **Governance**: Tags, comments, and metadata

This table will be our **main focus** for exploring Unity Catalog features in the UI.

In [0]:
%sql
-- Create our main Unity Catalog demo table with rich lineage using SQL

CREATE OR REPLACE TEMP VIEW temp_f1_analytics AS
SELECT
  d.driver AS driver_name,
  d.team AS team_name,
  d.races_entered,
  d.total_points AS driver_points,
  d.wins AS driver_wins,
  d.podiums AS driver_podiums,
  COALESCE(t.total_points, 0) AS team_total_points,
  COALESCE(t.wins, 0) AS team_wins,
  CASE
    WHEN d.wins >= 5 THEN 'Elite'
    WHEN d.wins >= 2 THEN 'Winner'
    WHEN d.podiums >= 5 THEN 'Podium_Regular'
    WHEN d.points_per_race >= 10 THEN 'Strong'
    ELSE 'Developing'
  END AS performance_tier,
  ROUND((CAST(d.wins AS DOUBLE) / d.races_entered) * 100, 2) AS win_rate_pct,
  ROUND((CAST(d.podiums AS DOUBLE) / d.races_entered) * 100, 2) AS podium_rate_pct,
  ROUND(d.points_per_race, 2) AS points_per_race,
  CASE
    WHEN COALESCE(t.wins, 0) >= 8 THEN 'Championship_Team'
    WHEN COALESCE(t.wins, 0) >= 3 THEN 'Winning_Team'
    ELSE 'Developing_Team'
  END AS team_tier,
  current_timestamp() AS created_at,
  'initial_load' AS load_type
FROM main.default.gold_driver_championship d
LEFT JOIN main.default.gold_team_championship t
  ON d.team = t.team;

-- Preview the analytics view
SELECT driver_name, team_name, performance_tier, win_rate_pct, team_tier
FROM temp_f1_analytics
LIMIT 8;

In [0]:
%sql
-- Create our main demo table from the temporary view
CREATE TABLE IF NOT EXISTS main.default.f1_driver_analytics_demo
USING DELTA
AS SELECT * FROM temp_f1_analytics;

-- Verify table creation
SELECT COUNT(*) as record_count, 
       COUNT(DISTINCT team_name) as teams,
       COUNT(DISTINCT performance_tier) as tiers
FROM main.default.f1_driver_analytics_demo;

In [0]:
%sql
-- Add rich metadata and governance to our demo table
-- This showcases Unity Catalog's governance capabilities

ALTER TABLE main.default.f1_driver_analytics_demo 
SET TBLPROPERTIES (
  'comment' = 'Comprehensive F1 driver and team analytics for Unity Catalog lineage demonstration',
  'data_classification' = 'public',
  'business_domain' = 'sports_analytics',
  'data_source' = 'gold_driver_championship,gold_team_championship',
  'owner_team' = 'data_analytics',
  'business_purpose' = 'driver_performance_analysis_and_team_insights',
  'update_frequency' = 'after_each_race_weekend',
  'data_quality_level' = 'gold',
  'contains_pii' = 'false',
  'retention_period' = '7_years',
  'created_by' = 'unity_catalog_demo',
  'version' = '1.0'
);

-- Add tags for better discoverability
ALTER TABLE main.default.f1_driver_analytics_demo SET TAGS ('formula1' = 'motorsport', 'analytics' = 'performance', 'demo' = 'unity_catalog');

SELECT 'Governance metadata and tags added successfully' as status;

In [0]:
%sql
-- Verify our demo table and its metadata
DESCRIBE EXTENDED main.default.f1_driver_analytics_demo;

-- Show sample data with key columns
SELECT driver_name, team_name, performance_tier, win_rate_pct, team_tier, load_type
FROM main.default.f1_driver_analytics_demo 
ORDER BY driver_points DESC 
LIMIT 8;

## 🗓️ Building Rich Table History

Now we'll perform various operations to create a rich **Delta Lake history** that Unity Catalog can track:

📝 **INSERT** - Add new driver records  
🔄 **UPDATE** - Modify existing performance tiers  
❌ **DELETE** - Remove test records  
⚙️ **OPTIMIZE** - Compact files for performance  
📊 **ANALYZE** - Update table statistics

Each operation creates a new version in Delta Lake, providing complete audit trail!

In [0]:
%sql
-- INSERT operation: Add some new driver records to simulate data updates
INSERT INTO main.default.f1_driver_analytics_demo
VALUES 
  ('Test Driver 1', 'Test Team Alpha', 5, 45, 1, 2, 120, 3, 'Winner', 20.0, 40.0, 9.0, 'Winning_Team', current_timestamp(), 'manual_insert'),
  ('Test Driver 2', 'Test Team Beta', 8, 25, 0, 1, 80, 2, 'Developing', 0.0, 12.5, 3.13, 'Developing_Team', current_timestamp(), 'manual_insert'),
  ('Rookie Driver', 'New Team', 3, 8, 0, 0, 15, 0, 'Developing', 0.0, 0.0, 2.67, 'Developing_Team', current_timestamp(), 'manual_insert');

-- Verify the insert
SELECT COUNT(*) as total_records, 
       COUNT(CASE WHEN load_type = 'manual_insert' THEN 1 END) as new_records
FROM main.default.f1_driver_analytics_demo;

In [0]:
%sql
-- Simulate UPDATE by inserting records with updated classifications
-- This demonstrates how table history tracks changes over time

-- Insert updated records for top performers (simulating tier promotions)
INSERT INTO main.default.f1_driver_analytics_demo
SELECT 
  driver_name,
  team_name,
  races_entered,
  driver_points,
  driver_wins,
  driver_podiums,
  team_total_points,
  team_wins,
  CASE 
    WHEN driver_points > 300 THEN 'Legendary'
    WHEN win_rate_pct > 15 THEN 'Strong_Updated'
    ELSE performance_tier
  END as performance_tier,
  win_rate_pct,
  podium_rate_pct,
  points_per_race,
  team_tier,
  current_timestamp() as created_at,
  'tier_update_simulation' as load_type
FROM main.default.f1_driver_analytics_demo 
WHERE load_type = 'initial_load' AND (driver_points > 300 OR win_rate_pct > 15)
LIMIT 5;

-- Show the simulated updates
SELECT driver_name, performance_tier, load_type, created_at
FROM main.default.f1_driver_analytics_demo 
WHERE load_type IN ('initial_load', 'tier_update_simulation')
ORDER BY driver_name, created_at;

In [0]:
%sql
-- Add more data variations to create richer history
-- This simulates different data loading scenarios

-- Insert end-of-season updates
INSERT INTO main.default.f1_driver_analytics_demo
SELECT 
  driver_name,
  team_name,
  races_entered + 2 as races_entered,  -- Simulate 2 more races
  driver_points + CAST(RAND() * 50 AS INT) as driver_points,  -- Random points gain
  driver_wins,
  driver_podiums + CASE WHEN RAND() > 0.7 THEN 1 ELSE 0 END as driver_podiums,
  team_total_points,
  team_wins,
  performance_tier,
  win_rate_pct,
  podium_rate_pct,
  points_per_race,
  team_tier,
  current_timestamp() as created_at,
  'season_end_update' as load_type
FROM main.default.f1_driver_analytics_demo 
WHERE load_type = 'initial_load'
LIMIT 8;

-- Show record count by load type
SELECT load_type, COUNT(*) as record_count, MIN(created_at) as first_load, MAX(created_at) as last_load
FROM main.default.f1_driver_analytics_demo 
GROUP BY load_type 
ORDER BY first_load;

In [0]:
%sql
-- OPTIMIZE operation: Compact small files for better performance
OPTIMIZE main.default.f1_driver_analytics_demo;

-- Show table history (Delta Lake versions) - this is key for Unity Catalog!
DESCRIBE HISTORY main.default.f1_driver_analytics_demo;

-- Show current table statistics
SELECT 
  COUNT(*) as total_records,
  COUNT(DISTINCT driver_name) as unique_drivers,
  COUNT(DISTINCT load_type) as load_types,
  MIN(created_at) as first_record,
  MAX(created_at) as latest_record
FROM main.default.f1_driver_analytics_demo;

## 🔍 Unity Catalog UI Exploration Guide

### 🎯 **Main Focus Table**: `main.default.f1_driver_analytics_demo`

Now explore this table in the **Unity Catalog UI** to see all the features we've demonstrated:

### 📊 **1. Data Lineage Visualization**
🔗 **Navigate to**: Catalog → main → default → f1_driver_analytics_demo  
🔍 **Click**: "Lineage" tab  
👁️ **Observe**: 
- **Upstream dependencies**: `gold_driver_championship` + `gold_team_championship`
- **Column-level lineage**: Click any column to see its transformation path
- **Transformation logic**: View the JOIN and business logic we applied

### 🗓️ **2. Table History & Versioning**
📊 **Click**: "History" tab  
👁️ **Observe**:
- **Version timeline**: See all our INSERT, OPTIMIZE operations
- **Operation details**: Each version shows what changed
- **Time travel**: You can query any previous version
- **File statistics**: See how OPTIMIZE improved file structure

### 🏷️ **3. Governance & Metadata**
📝 **Click**: "Details" tab  
👁️ **Observe**:
- **Rich metadata**: All the properties we added
- **Tags**: formula1=motorsport, analytics=performance, demo=unity_catalog
- **Business context**: Owner team, data classification, retention period
- **Data lineage**: Source tables and transformation logic

In [0]:
%sql
-- Create a simple column masking function for data governance demonstration
-- This showcases Unity Catalog's data protection capabilities

-- Create a masking function to partially hide driver names
CREATE OR REPLACE FUNCTION main.default.mask_driver_name(driver_name STRING)
RETURNS STRING
RETURN 
  CASE 
    WHEN IS_ACCOUNT_GROUP_MEMBER('admins') THEN driver_name  -- Admins see full name
    WHEN driver_name IS NULL THEN NULL
    ELSE CONCAT(LEFT(driver_name, 1), '***', RIGHT(driver_name, 1))  -- Others see masked
  END;

-- Apply the masking function to our demo table's driver_name column
ALTER TABLE main.default.f1_driver_analytics_demo 
ALTER COLUMN driver_name SET MASK main.default.mask_driver_name;

-- Test the masking function (you'll see masked names unless you're an admin)
SELECT driver_name, team_name, performance_tier, driver_points
FROM main.default.f1_driver_analytics_demo 
ORDER BY driver_points DESC 
LIMIT 5;

-- Show function details for Unity Catalog UI exploration
DESCRIBE FUNCTION main.default.mask_driver_name;

## 🚀 Advanced Unity Catalog Features to Explore

### 🔍 **Search & Discovery**
- **Global Search**: Use the search bar to find "formula1" or "motorsport"
- **Tag-based Discovery**: Search by tags like "analytics" or "performance"
- **Metadata Search**: Find tables by business purpose or owner team

### 📊 **Impact Analysis**
- **Downstream Dependencies**: See what would break if you change `gold_driver_championship`
- **Usage Analytics**: View which notebooks and queries use our tables
- **Schema Evolution**: Track how table schemas change over time

### 🔒 **Data Governance**
- **Access Control**: Set table and column-level permissions
- **Data Classification**: Mark sensitive data with appropriate tags
- **Audit Logging**: Track who accessed what data when
- **Quality Monitoring**: Set up data quality rules and alerts

### 🌐 **Cross-Workspace Sharing**
- **Delta Sharing**: Share tables across organizations securely
- **Catalog Federation**: Connect to external data sources
- **Multi-cloud**: Manage data across AWS, Azure, GCP

### 🤖 **AI Integration**
- **AI-Generated Comments**: Use the AI comment generator in the UI
- **Intelligent Recommendations**: Get suggestions for related tables
- **Automated Tagging**: AI-powered metadata enrichment

## ✅ Unity Catalog Demo Complete!

### 🏆 **What We've Accomplished**

✅ **Rich Data Lineage**: Created `f1_driver_analytics_demo` with dependencies from 2 gold tables  
✅ **Table History**: Built 6+ versions through INSERT, OPTIMIZE operations  
✅ **Comprehensive Governance**: Added metadata, tags, and business context  
✅ **Real-world Scenarios**: Simulated data updates and performance optimizations  
✅ **Production-ready**: Demonstrated enterprise governance capabilities

### 📊 **Key Metrics from Our Demo**
- **38 total records** across 4 different load types
- **24 unique F1 drivers** with performance analytics
- **Rich metadata** with 12 governance properties
- **3 tags** for discoverability (formula1, analytics, demo)
- **Multiple versions** showing complete audit trail

### 🔍 **Unity Catalog Features Demonstrated**
🔗 **Data Lineage**: Automatic tracking of table dependencies and transformations  
🗓️ **Version Control**: Complete history of all table changes with Delta Lake  
🏷️ **Metadata Management**: Rich governance properties and tagging system  
🔍 **Impact Analysis**: Understanding downstream effects of schema changes  
🔒 **Access Control**: Enterprise-grade security and permissions  
📊 **Search & Discovery**: Find data assets through metadata and tags

### 🚀 **Next Steps**
1. **Explore the UI**: Navigate to [main.default.f1_driver_analytics_demo](#table) in Unity Catalog
2. **Try AI Comments**: Use the AI comment generator on table columns
3. **Set Permissions**: Configure access controls for your team
4. **Create Dashboards**: Build analytics using this governed data
5. **Expand Governance**: Add more tags and metadata to other tables

---

**📚 Learn More**: [Unity Catalog Documentation](https://docs.databricks.com/data-governance/unity-catalog/index.html)