# Databricks Unity Catalog and Table Management Lab Exercise

## Overview
This comprehensive lab covers Unity Catalog concepts, table types, views, CTAS operations, and Delta Lake constraints. Students will work with a realistic HR and sales dataset to understand enterprise data management patterns.

## Prerequisites
- Databricks workspace with Unity Catalog enabled
- Running compute cluster with Unity Catalog access
- Basic knowledge of SQL and Spark

## Learning Objectives
By the end of this lab, you will be able to:
- Understand the difference between managed and external tables in Unity Catalog
- Create and manage external locations and storage credentials
- Work with different types of views (standard, temporary, global temporary)
- Use CREATE TABLE AS SELECT (CTAS) statements effectively
- Implement and understand Delta Lake constraints
- Apply data governance best practices with Unity Catalog

---

## Lab Setup

### Step 1: Environment Setup
Create a new notebook in your Databricks workspace and run the following setup commands:

```python
# Import required libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from datetime import datetime, timedelta
import random

# Clear any existing configurations
spark.sql("SET spark.sql.execution.arrow.pyspark.enabled = false")

print("Lab environment initialized successfully")
```

---

## Exercise 1: Unity Catalog Setup and Managed vs External Tables

### Task 1.1: Create Catalog and Schema Structure
```sql
-- Create a catalog for this lab
CREATE CATALOG IF NOT EXISTS hr_analytics_lab;

-- Use the catalog
USE CATALOG hr_analytics_lab;

-- Create schemas for different business units
CREATE SCHEMA IF NOT EXISTS hr_data;
CREATE SCHEMA IF NOT EXISTS sales_data;
CREATE SCHEMA IF NOT EXISTS analytics;

-- Set default schema
USE hr_analytics_lab.hr_data;
```

### Task 1.2: Create Sample Employee Data
```python
# Generate comprehensive employee dataset
def generate_employee_data(num_employees=1000):
    cities = ["New York", "San Francisco", "Chicago", "Austin", "Seattle", "Boston", "Los Angeles", "Miami"]
    departments = ["Engineering", "Sales", "Marketing", "HR", "Finance", "Operations"]
    job_levels = ["Junior", "Senior", "Lead", "Manager", "Director"]
    
    employees = []
    for i in range(1, num_employees + 1):
        birth_date = (datetime.now() - timedelta(days=random.randint(8000, 15000))).strftime("%Y-%m-%d")
        hire_date = (datetime.now() - timedelta(days=random.randint(30, 2000))).strftime("%Y-%m-%d")
        
        employee = {
            "employee_id": i,
            "first_name": f"Employee_{i}",
            "last_name": f"LastName_{i}",
            "email": f"employee{i}@company.com",
            "phone": f"555-{random.randint(1000, 9999)}",
            "hire_date": hire_date,
            "birth_date": birth_date,
            "salary": random.randint(40000, 200000),
            "department": random.choice(departments),
            "job_title": f"{random.choice(job_levels)} {random.choice(departments)} Specialist",
            "city": random.choice(cities),
            "is_active": random.choice([True, True, True, False])  # 75% active
        }
        employees.append(employee)
    
    return spark.createDataFrame(employees)

# Create the employee dataset
employee_df = generate_employee_data(1000)
employee_df.show(10)
```

### Task 1.3: Create Managed Table (Unity Catalog Default)
```sql
-- Drop table if exists for clean start
DROP TABLE IF EXISTS hr_analytics_lab.hr_data.employees_managed;

-- This will be created as a managed table automatically
CREATE TABLE hr_analytics_lab.hr_data.employees_managed (
    employee_id INT NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    email STRING,
    phone STRING,
    hire_date DATE,
    birth_date DATE,
    salary DOUBLE,
    department STRING,
    job_title STRING,
    city STRING,
    is_active BOOLEAN
) 
USING DELTA
COMMENT "Managed table containing employee information"
TBLPROPERTIES (
    'department'='HR',
    'data_classification'='PII',
    'created_by'='data_engineering_team'
);
```

### Task 1.4: Insert Data into Managed Table
```python
# Insert data into managed table
employee_df.write \
    .format("delta") \
    .mode("append") \
    .saveAsTable("hr_analytics_lab.hr_data.employees_managed")

print("Data inserted into managed table")
```

### Task 1.5: Examine Managed Table Properties
```sql
-- Describe the managed table in detail
DESCRIBE EXTENDED hr_analytics_lab.hr_data.employees_managed;

-- Show table properties
SHOW TBLPROPERTIES hr_analytics_lab.hr_data.employees_managed;

-- Check the location (notice it's managed by Unity Catalog)
DESCRIBE DETAIL hr_analytics_lab.hr_data.employees_managed;
```

### Questions for Exercise 1:
1. Where is the data for the managed table stored?
2. What are the benefits of using managed tables?
3. Who owns the lifecycle of managed table data?

---

## Exercise 2: External Locations and Storage Credentials

### Task 2.1: Understand External Location Concepts
```sql
-- Show existing external locations (if any)
SHOW EXTERNAL LOCATIONS;

-- Show storage credentials (if any)
SHOW STORAGE CREDENTIALS;
```

### Task 2.2: Create External Table with Cloud Storage Path
```sql
-- For demonstration, we'll create an external table pointing to a specific path
-- Note: In production, you would need proper external locations and storage credentials

-- Create external table structure (this may require admin privileges for external locations)
CREATE TABLE IF NOT EXISTS hr_analytics_lab.hr_data.employees_external (
    employee_id INT NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    email STRING,
    phone STRING,
    hire_date DATE,
    birth_date DATE,
    salary DOUBLE,
    department STRING,
    job_title STRING,
    city STRING,
    is_active BOOLEAN
) 
USING DELTA
COMMENT "External table containing employee information"
LOCATION 's3://your-bucket/hr-data/employees/'  -- Replace with actual external location
TBLPROPERTIES (
    'department'='HR',
    'data_classification'='PII',
    'storage_type'='external'
);
```

### Task 2.3: Benefits Analysis - External vs Managed Tables
```sql
-- Create a comparison query
WITH table_comparison AS (
    SELECT 
        'managed' as table_type,
        COUNT(*) as record_count,
        'Unity Catalog managed storage' as location_type
    FROM hr_analytics_lab.hr_data.employees_managed
    
    UNION ALL
    
    SELECT 
        'external' as table_type,
        COUNT(*) as record_count,
        'External cloud storage' as location_type
    FROM hr_analytics_lab.hr_data.employees_managed  -- Using managed for demo since external might not be accessible
)
SELECT * FROM table_comparison;
```

### Questions for Exercise 2:
1. What are the main benefits of external locations in Unity Catalog?
2. When would you choose external tables over managed tables?
3. What security advantages do storage credentials provide?

---

## Exercise 3: Working with Different Types of Views

### Task 3.1: Create Standard (Stored) Views
```sql
-- Create a standard view for active employees
CREATE OR REPLACE VIEW hr_analytics_lab.analytics.active_employees_view
AS
SELECT 
    employee_id,
    CONCAT(first_name, ' ', last_name) as full_name,
    email,
    department,
    job_title,
    city,
    salary,
    hire_date,
    DATEDIFF(CURRENT_DATE(), hire_date) as days_employed
FROM hr_analytics_lab.hr_data.employees_managed
WHERE is_active = true;

-- Create a view for department statistics
CREATE OR REPLACE VIEW hr_analytics_lab.analytics.department_stats_view
AS
SELECT 
    department,
    COUNT(*) as employee_count,
    AVG(salary) as avg_salary,
    MIN(salary) as min_salary,
    MAX(salary) as max_salary,
    AVG(DATEDIFF(CURRENT_DATE(), hire_date)) as avg_tenure_days
FROM hr_analytics_lab.hr_data.employees_managed
WHERE is_active = true
GROUP BY department
ORDER BY avg_salary DESC;

-- Query the views
SELECT * FROM hr_analytics_lab.analytics.active_employees_view LIMIT 10;
SELECT * FROM hr_analytics_lab.analytics.department_stats_view;
```

### Task 3.2: Create Temporary Views
```sql
-- Create a temporary view (session-scoped)
CREATE OR REPLACE TEMPORARY VIEW temp_salary_analysis
AS
SELECT 
    department,
    city,
    COUNT(*) as employee_count,
    PERCENTILE_APPROX(salary, 0.5) as median_salary,
    PERCENTILE_APPROX(salary, 0.9) as p90_salary
FROM hr_analytics_lab.hr_data.employees_managed
WHERE is_active = true
GROUP BY department, city;

-- Query temporary view
SELECT * FROM temp_salary_analysis WHERE employee_count >= 5;
```

### Task 3.3: Create Global Temporary Views
```sql
-- Create a global temporary view (accessible across sessions)
CREATE OR REPLACE GLOBAL TEMPORARY VIEW global_temp.employee_performance_metrics
AS
SELECT 
    employee_id,
    first_name,
    last_name,
    department,
    salary,
    CASE 
        WHEN salary >= 150000 THEN 'High Performer'
        WHEN salary >= 100000 THEN 'Mid Performer'
        ELSE 'Standard Performer'
    END as performance_tier,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as dept_salary_rank
FROM hr_analytics_lab.hr_data.employees_managed
WHERE is_active = true;

-- Query global temporary view
SELECT * FROM global_temp.employee_performance_metrics 
WHERE performance_tier = 'High Performer' 
LIMIT 20;
```

### Task 3.4: View Metadata and Management
```sql
-- Show all views in the analytics schema
SHOW VIEWS IN hr_analytics_lab.analytics;

-- Describe view definition
DESCRIBE EXTENDED hr_analytics_lab.analytics.active_employees_view;

-- Show view dependencies
SHOW TABLES IN hr_analytics_lab.analytics;
```

### Questions for Exercise 3:
1. What is the difference between temporary and global temporary views?
2. When would you use a stored view vs a temporary view?
3. How do views contribute to data governance in Unity Catalog?

---

## Exercise 4: CREATE TABLE AS SELECT (CTAS) Operations

### Task 4.1: Basic CTAS with Filtering
```sql
-- Create a table for high-salary employees using CTAS
CREATE OR REPLACE TABLE hr_analytics_lab.analytics.high_salary_employees
COMMENT "Employees with salary above 120K"
AS
SELECT 
    employee_id,
    first_name,
    last_name,
    email,
    department,
    job_title,
    salary,
    city,
    hire_date
FROM hr_analytics_lab.hr_data.employees_managed
WHERE salary > 120000 
  AND is_active = true;

-- Verify the created table
SELECT COUNT(*) as high_earners FROM hr_analytics_lab.analytics.high_salary_employees;
SELECT * FROM hr_analytics_lab.analytics.high_salary_employees LIMIT 10;
```

### Task 4.2: CTAS with Column Renaming and Transformation
```sql
-- Create a marketing-friendly employee contact table
CREATE OR REPLACE TABLE hr_analytics_lab.sales_data.employee_contacts
COMMENT "Employee contact information for internal directory"
AS
SELECT 
    employee_id as emp_id,
    CONCAT(first_name, ' ', last_name) as full_name,
    UPPER(email) as email_address,
    phone as contact_number,
    department as dept_name,
    CASE 
        WHEN job_title LIKE '%Manager%' OR job_title LIKE '%Director%' THEN 'Leadership'
        WHEN job_title LIKE '%Senior%' OR job_title LIKE '%Lead%' THEN 'Senior Level'
        ELSE 'Individual Contributor'
    END as employee_level,
    city as office_location
FROM hr_analytics_lab.hr_data.employees_managed
WHERE is_active = true;

-- Check the results
SELECT employee_level, COUNT(*) as count 
FROM hr_analytics_lab.sales_data.employee_contacts 
GROUP BY employee_level;
```

### Task 4.3: Advanced CTAS with Partitioning and Custom Location
```sql
-- Create a partitioned table with CTAS (demonstrates the example from your curriculum)
CREATE OR REPLACE TABLE hr_analytics_lab.analytics.employees_by_location
COMMENT "Contains PII - Employee data partitioned by city and birth year"
PARTITIONED BY (city, birth_year)
AS
SELECT 
    employee_id as id,
    CONCAT(first_name, ' ', last_name) as name,
    email,
    birth_date,
    city,
    YEAR(birth_date) as birth_year,
    department,
    salary
FROM hr_analytics_lab.hr_data.employees_managed
WHERE is_active = true;

-- Verify partitioning
SHOW PARTITIONS hr_analytics_lab.analytics.employees_by_location;

-- Query specific partition
SELECT COUNT(*) 
FROM hr_analytics_lab.analytics.employees_by_location 
WHERE city = 'New York' AND birth_year = 1990;
```

### Task 4.4: CTAS with Aggregations and Window Functions
```sql
-- Create department summary table using CTAS
CREATE OR REPLACE TABLE hr_analytics_lab.analytics.department_analytics
COMMENT "Comprehensive department-level analytics"
AS
SELECT 
    department,
    COUNT(*) as total_employees,
    AVG(salary) as avg_salary,
    MEDIAN(salary) as median_salary,
    MIN(salary) as min_salary,
    MAX(salary) as max_salary,
    STDDEV(salary) as salary_stddev,
    COUNT(CASE WHEN salary > 100000 THEN 1 END) as high_earners,
    AVG(DATEDIFF(CURRENT_DATE(), hire_date)) as avg_tenure_days,
    COUNT(DISTINCT city) as cities_represented,
    CURRENT_TIMESTAMP() as analysis_timestamp
FROM hr_analytics_lab.hr_data.employees_managed
WHERE is_active = true
GROUP BY department;

-- Query the analytics table
SELECT * FROM hr_analytics_lab.analytics.department_analytics
ORDER BY avg_salary DESC;
```

### Questions for Exercise 4:
1. How does CTAS differ from regular CREATE TABLE followed by INSERT?
2. What are the benefits of partitioning in the CTAS example?
3. When would you use CTAS vs creating views?

---

## Exercise 5: Delta Lake Constraints Implementation

### Task 5.1: Understanding Constraint Enforcement Types
```python
# First, let's understand the enforcement concepts with a demonstration
print("=== Delta Lake Constraint Enforcement ===")
print("WHO: Delta Engine performs the checks")
print("WHEN: At write time (when data is written to table)")
print("\nConstraint Types:")
print("- NOT NULL: Enforced by Delta Engine at write time")
print("- CHECK: Enforced by Delta Engine at write time") 
print("- PRIMARY KEY: Metadata-only (Unity Catalog)")
print("- FOREIGN KEY: Metadata-only (Unity Catalog)")
print("- UNIQUE: Metadata-only (Unity Catalog)")
print("- DEFAULT: Limited SQL support only")
```

### Task 5.2: Create Table with NOT NULL Constraints
```sql
-- Create a table with NOT NULL constraints
DROP TABLE IF EXISTS hr_analytics_lab.hr_data.employees_with_constraints;

CREATE TABLE hr_analytics_lab.hr_data.employees_with_constraints (
    employee_id INT NOT NULL,
    first_name STRING NOT NULL,
    last_name STRING NOT NULL,
    email STRING NOT NULL,
    department STRING NOT NULL,
    salary DOUBLE NOT NULL,
    hire_date DATE NOT NULL,
    city STRING,
    is_active BOOLEAN NOT NULL DEFAULT true
) 
USING DELTA
COMMENT "Employee table with NOT NULL constraints";
```

### Task 5.3: Test NOT NULL Constraint Enforcement
```python
# Test NOT NULL constraint - this should succeed
valid_data = [(1, "John", "Doe", "john.doe@company.com", "Engineering", 75000.0, "2023-01-15", "New York", True)]
valid_df = spark.createDataFrame(valid_data, 
    ["employee_id", "first_name", "last_name", "email", "department", "salary", "hire_date", "city", "is_active"])

# Insert valid data
valid_df.write.format("delta").mode("append").saveAsTable("hr_analytics_lab.hr_data.employees_with_constraints")
print("Valid data inserted successfully")

# Test NOT NULL constraint violation - this should fail
try:
    invalid_data = [(None, "Jane", "Smith", "jane.smith@company.com", "Sales", 65000.0, "2023-02-01", "Chicago", True)]
    invalid_df = spark.createDataFrame(invalid_data, 
        ["employee_id", "first_name", "last_name", "email", "department", "salary", "hire_date", "city", "is_active"])
    
    invalid_df.write.format("delta").mode("append").saveAsTable("hr_analytics_lab.hr_data.employees_with_constraints")
    print("ERROR: Invalid data was inserted (this shouldn't happen)")
except Exception as e:
    print(f"NOT NULL constraint enforced successfully: {str(e)}")
```

### Task 5.4: Implement CHECK Constraints
```sql
-- Add CHECK constraints to existing table
ALTER TABLE hr_analytics_lab.hr_data.employees_with_constraints 
ADD CONSTRAINT salary_positive CHECK (salary > 0);

ALTER TABLE hr_analytics_lab.hr_data.employees_with_constraints 
ADD CONSTRAINT valid_email CHECK (email LIKE '%@%.%');

ALTER TABLE hr_analytics_lab.hr_data.employees_with_constraints 
ADD CONSTRAINT reasonable_salary CHECK (salary >= 30000 AND salary <= 500000);

-- Show table constraints
DESCRIBE EXTENDED hr_analytics_lab.hr_data.employees_with_constraints;
```

### Task 5.5: Test CHECK Constraint Enforcement
```python
# Test CHECK constraint enforcement

# Test 1: Negative salary (should fail)
try:
    negative_salary_data = [(2, "Bob", "Wilson", "bob.wilson@company.com", "Marketing", -50000.0, "2023-03-01", "Austin", True)]
    negative_salary_df = spark.createDataFrame(negative_salary_data, 
        ["employee_id", "first_name", "last_name", "email", "department", "salary", "hire_date", "city", "is_active"])
    
    negative_salary_df.write.format("delta").mode("append").saveAsTable("hr_analytics_lab.hr_data.employees_with_constraints")
    print("ERROR: Negative salary was accepted")
except Exception as e:
    print(f"CHECK constraint (salary_positive) enforced: {str(e)[:100]}...")

# Test 2: Invalid email format (should fail)
try:
    invalid_email_data = [(3, "Alice", "Johnson", "invalid-email", "HR", 80000.0, "2023-04-01", "Seattle", True)]
    invalid_email_df = spark.createDataFrame(invalid_email_data, 
        ["employee_id", "first_name", "last_name", "email", "department", "salary", "hire_date", "city", "is_active"])
    
    invalid_email_df.write.format("delta").mode("append").saveAsTable("hr_analytics_lab.hr_data.employees_with_constraints")
    print("ERROR: Invalid email was accepted")
except Exception as e:
    print(f"CHECK constraint (valid_email) enforced: {str(e)[:100]}...")

# Test 3: Valid data (should succeed)
try:
    valid_data2 = [(4, "Sarah", "Brown", "sarah.brown@company.com", "Finance", 95000.0, "2023-05-01", "Miami", True)]
    valid_df2 = spark.createDataFrame(valid_data2, 
        ["employee_id", "first_name", "last_name", "email", "department", "salary", "hire_date", "city", "is_active"])
    
    valid_df2.write.format("delta").mode("append").saveAsTable("hr_analytics_lab.hr_data.employees_with_constraints")
    print("Valid data inserted successfully")
except Exception as e:
    print(f"Unexpected error: {str(e)}")
```

### Task 5.6: Metadata-Only Constraints (Unity Catalog)
```sql
-- Add metadata-only constraints (these are for documentation and Unity Catalog governance)
ALTER TABLE hr_analytics_lab.hr_data.employees_with_constraints 
ADD CONSTRAINT pk_employee_id PRIMARY KEY (employee_id);

-- Note: UNIQUE and FOREIGN KEY constraints would be added similarly
-- These are metadata-only and provide governance information but are not enforced at write time

-- Show all constraints
SHOW TBLPROPERTIES hr_analytics_lab.hr_data.employees_with_constraints;
```

### Task 5.7: DEFAULT Constraint Example
```sql
-- Create a new table demonstrating DEFAULT constraints
CREATE OR REPLACE TABLE hr_analytics_lab.hr_data.employee_status (
    employee_id INT NOT NULL,
    status_date DATE NOT NULL,
    employment_status STRING NOT NULL DEFAULT 'Active',
    last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
    updated_by STRING DEFAULT 'system'
) 
USING DELTA
COMMENT "Employee status tracking with DEFAULT constraints";

-- Insert data without specifying default columns
INSERT INTO hr_analytics_lab.hr_data.employee_status (employee_id, status_date)
VALUES (1, '2023-01-15'), (2, '2023-02-01');

-- Check the results
SELECT * FROM hr_analytics_lab.hr_data.employee_status;
```

### Questions for Exercise 5:
1. What is the difference between enforced constraints and metadata-only constraints?
2. Why are CHECK constraints enforced at write time by the Delta Engine?
3. How do constraints contribute to data quality in a data lake?

---

## Exercise 6: Advanced Constraint Scenarios

### Task 6.1: Complex CHECK Constraints
```sql
-- Create a table with complex business rule constraints
CREATE OR REPLACE TABLE hr_analytics_lab.hr_data.employee_compensation (
    employee_id INT NOT NULL,
    base_salary DOUBLE NOT NULL,
    bonus_percentage DOUBLE,
    commission_rate DOUBLE,
    total_compensation DOUBLE,
    effective_date DATE NOT NULL,
    CONSTRAINT positive_base_salary CHECK (base_salary > 0),
    CONSTRAINT reasonable_bonus CHECK (bonus_percentage >= 0 AND bonus_percentage <= 100),
    CONSTRAINT reasonable_commission CHECK (commission_rate >= 0 AND commission_rate <= 50),
    CONSTRAINT logical_compensation CHECK (total_compensation >= base_salary),
    CONSTRAINT future_effective_date CHECK (effective_date >= '2020-01-01')
) 
USING DELTA
COMMENT "Employee compensation with complex business rules";
```

### Task 6.2: Test Complex Constraints
```python
# Test complex constraint scenarios
test_cases = [
    # Valid case
    (1, 80000.0, 10.0, 2.5, 90000.0, "2023-01-01", "Should succeed"),
    # Invalid: total_compensation less than base_salary
    (2, 90000.0, 5.0, 1.0, 80000.0, "2023-01-01", "Should fail - total < base"),
    # Invalid: bonus_percentage over 100
    (3, 70000.0, 150.0, 2.0, 175000.0, "2023-01-01", "Should fail - bonus > 100%"),
    # Invalid: future effective date before 2020
    (4, 60000.0, 8.0, 1.5, 65000.0, "2019-12-31", "Should fail - date too old")
]

for case in test_cases:
    try:
        test_data = [case[:6]]  # First 6 elements are the data
        test_df = spark.createDataFrame(test_data, 
            ["employee_id", "base_salary", "bonus_percentage", "commission_rate", "total_compensation", "effective_date"])
        
        test_df.write.format("delta").mode("append").saveAsTable("hr_analytics_lab.hr_data.employee_compensation")
        print(f"✅ Case {case[0]}: {case[6]} - SUCCESS")
    except Exception as e:
        print(f"❌ Case {case[0]}: {case[6]} - FAILED: {str(e)[:80]}...")
```

### Questions for Exercise 6:
1. How do complex CHECK constraints help enforce business rules?
2. What happens when multiple constraints are violated simultaneously?
3. How would you design constraints for a real-world scenario?

---

## Exercise 7: Data Governance and Best Practices

### Task 7.1: Table Documentation and Metadata
```sql
-- Add comprehensive metadata to tables
ALTER TABLE hr_analytics_lab.hr_data.employees_managed 
SET TBLPROPERTIES (
    'data_steward' = 'HR Data Team',
    'data_classification' = 'PII',
    'retention_policy' = '7_years',
    'last_quality_check' = '2023-10-01',
    'quality_score' = '95'
);

-- Add column comments
ALTER TABLE hr_analytics_lab.hr_data.employees_managed 
ALTER COLUMN salary COMMENT "Annual salary in USD";

ALTER TABLE hr_analytics_lab.hr_data.employees_managed 
ALTER COLUMN email COMMENT "Primary business email address";
```

### Task 7.2: Create Governance Dashboard
```sql
-- Create a view for data governance monitoring
CREATE OR REPLACE VIEW hr_analytics_lab.analytics.data_governance_dashboard
AS
SELECT 
    'hr_analytics_lab.hr_data.employees_managed' as table_name,
    COUNT(*) as total_records,
    COUNT(CASE WHEN email IS NULL THEN 1 END) as missing_emails,
    COUNT(CASE WHEN salary < 30000 OR salary > 500000 THEN 1 END) as salary_outliers,
    COUNT(CASE WHEN hire_date > CURRENT_DATE() THEN 1 END) as future_hire_dates,
    ROUND((COUNT(*) - COUNT(CASE WHEN email IS NULL THEN 1 END)) * 100.0 / COUNT(*), 2) as email_completeness_pct,
    CURRENT_TIMESTAMP() as last_checked
FROM hr_analytics_lab.hr_data.employees_managed;

-- Query the governance dashboard
SELECT * FROM hr_analytics_lab.analytics.data_governance_dashboard;
```

### Task 7.3: Cleanup and Security
```sql
-- Show all objects created in this lab
SHOW TABLES IN hr_analytics_lab.hr_data;
SHOW TABLES IN hr_analytics_lab.analytics;
SHOW VIEWS IN hr_analytics_lab.analytics;

-- Example of dropping sensitive views (if needed)
-- DROP VIEW IF EXISTS hr_analytics_lab.analytics.active_employees_view;
```

### Questions for Exercise 7:
1. How do table properties contribute to data governance?
2. What metadata should you track for production tables?
3. How do Unity Catalog constraints support compliance requirements?

---

## Summary and Review

### Key Concepts Covered:
1. **Managed vs External Tables**: Understanding Unity Catalog storage models
2. **External Locations & Storage Credentials**: Security and governance for external data
3. **Views Types**: Standard, temporary, and global temporary views
4. **CTAS Operations**: Creating tables with filtering, renaming, and partitioning
5. **Delta Constraints**: NOT NULL, CHECK (enforced), PRIMARY KEY, FOREIGN KEY, UNIQUE (metadata-only)
6. **Data Governance**: Metadata management and quality monitoring

### Lab Cleanup (Optional)
```sql
-- Clean up lab resources if needed
-- DROP CATALOG hr_analytics_lab CASCADE;
```

### Next Steps
- Practice creating external locations with proper storage credentials
- Experiment with more complex CHECK constraints
- Explore Unity Catalog security features (grants, row-level security)
- Study partition strategies for large datasets
- Learn about Delta Lake advanced features (CDC, liquid clustering)