# 🗄️ Unity Catalog Demo: Data Governance & Lineage
*Explore Unity Catalog features with Formula 1 data lineage in 5 minutes*

---

## 🎯 Learning Objectives

By the end of this demo, you'll understand:
- ✅ **Unity Catalog's 3-level namespace** (catalog.schema.table)
- ✅ **Data lineage tracking** and visualization
- ✅ **Governance features** for enterprise data management
- ✅ **Best practices** for organizing data assets

---

## 📊 What We'll Explore

**Data Lineage Demo Pipeline:**
```
🔍 Unity Catalog Features:
├── 📋 Data Discovery (search and explore tables)
├── 📈 Lineage Visualization (track data flow)
├── 🔒 Governance Controls (permissions and security)
├── 📝 Metadata Management (descriptions and tags)
└── 🔍 Impact Analysis (understand dependencies)
```

### 🎯 Using Our F1 Data
We'll explore the data pipeline we created in Notebook 02:
- **Bronze Tables** → **Silver Tables** → **Gold Tables**
- **Data lineage** from raw CSV files to analytics
- **Impact analysis** for schema changes
- **Governance** for production data assets

In [None]:
# Let's explore our F1 data catalog structure
print("🗄️ Unity Catalog Structure for F1 Data:")
print("="*50)

# Show all tables in our schema
tables_df = spark.sql("SHOW TABLES IN main.default")
tables_df.show()

print(f"\n📊 Total tables created: {tables_df.count()}")

## 🔍 Data Discovery and Metadata

Unity Catalog provides rich metadata and discovery capabilities. Let's explore our F1 tables:

In [None]:
# Explore detailed table information
print("📋 Detailed Table Information:")
print("="*60)

# Get detailed info for our gold table
print("\n🥇 Gold Driver Standings Table:")
spark.sql("DESCRIBE EXTENDED main.default.gold_driver_standings").show(50, False)

print("\n📊 Table Statistics:")
spark.sql("DESCRIBE DETAIL main.default.gold_driver_standings").show(1, False)

## 📈 Data Lineage Visualization

**[Screenshot: Unity Catalog Lineage Graph]**
*📁 Image location: `images/03_lineage_graph.png`*
*Screenshot guidance: Show the Unity Catalog lineage view with F1 tables connected, displaying the flow from Bronze → Silver → Gold*

### 🔄 Lineage Features to Explore:

#### 1. **Navigate to Unity Catalog UI**
- Click **"Catalog"** in the left sidebar
- Navigate to **main** → **default** → **gold_driver_standings**
- Click the **"Lineage"** tab

#### 2. **Explore Lineage Graph**
- **Upstream dependencies** - see bronze and silver tables
- **Transformation logic** - view the JOIN operations
- **Column-level lineage** - trace individual metrics
- **Downstream usage** - find dashboards and queries using this data

#### 3. **Impact Analysis**
- **Schema changes** - understand what would break
- **Dependency mapping** - see all affected assets
- **Change propagation** - track impact of modifications

In [None]:
# Let's add some metadata to our tables for better governance
print("📝 Adding Metadata to F1 Tables:")
print("="*40)

# Add table comments for better documentation
table_comments = {
    "gold_driver_standings": "Comprehensive F1 driver career statistics and performance metrics aggregated from race results",
    "gold_season_stats": "Annual Formula 1 season-level analytics including driver counts, races, and completion rates",
    "silver_drivers": "Cleaned and validated F1 driver information with standardized names and data types",
    "silver_races": "Processed F1 race information with validated dates and circuit details",
    "silver_results": "Clean race results with calculated fields for winners, podiums, and performance metrics"
}

for table_name, comment in table_comments.items():
    try:
        spark.sql(f"ALTER TABLE main.default.{table_name} SET TBLPROPERTIES ('comment' = '{comment}')")
        print(f"✅ Added comment to {table_name}")
    except Exception as e:
        print(f"⚠️ Could not add comment to {table_name}: {e}")

print("\n💡 Now you can search for 'F1' or 'Formula' in Unity Catalog to find these tables!")

## 🔒 Governance and Security Features

Unity Catalog provides enterprise-grade governance capabilities:

### 🛡️ Security Controls:
- **Fine-grained access control** (table, column, row level)
- **Dynamic data masking** for sensitive information
- **Audit logging** for compliance and monitoring
- **Attribute-based access control** (ABAC)

### 📋 Data Classification:
- **Sensitivity labels** (PII, confidential, public)
- **Compliance tags** (GDPR, CCPA, SOX)
- **Business classification** (finance, marketing, operations)
- **Quality indicators** (bronze, silver, gold, certified)

### 🔍 Monitoring and Auditing:
- **Access patterns** and usage analytics
- **Change history** and version control
- **Performance metrics** and optimization recommendations
- **Data freshness** and quality monitoring

## ✅ Unity Catalog Demo Complete!

**🎉 Great job! You've explored Unity Catalog's powerful governance features!**

### What You've Learned:
- ✅ **3-level namespace** organization (catalog.schema.table)
- ✅ **Data lineage** visualization and impact analysis
- ✅ **Metadata management** and table documentation
- ✅ **Governance capabilities** for enterprise data management

### 🚀 Next Steps in Unity Catalog:
1. **Explore the UI** - Navigate to Catalog → main → default
2. **View Lineage** - Click on any gold table and explore lineage
3. **Search Data** - Use the search bar to find F1-related assets
4. **Set up Permissions** - Configure access controls for your team
5. **Add Tags** - Classify your data with business-relevant tags

### 💡 Key Governance Benefits:
- **Data Discovery** - Find relevant datasets quickly
- **Impact Analysis** - Understand change consequences
- **Compliance** - Meet regulatory requirements
- **Quality** - Track data lineage and transformations
- **Security** - Control access at granular levels

**Continue to the next notebook:** `04_Job_Creation.ipynb`

**🏁 Ready to automate your F1 pipeline? Let's create some jobs! 🚀**