# Monitor and Govern Databricks Workspaces

Use system tables to monitor usage, costs, and implement governance with Unity Catalog.

## What You'll Learn

✅ Query system tables for observability  
✅ Analyze billing and cost allocation  
✅ Monitor workspace usage and performance  
✅ Implement Unity Catalog security  
✅ Create governance dashboards  

**Note**: Since students won't have access to actual system tables, we'll use synthetic data that matches the schema.

---

**References:**
- [System Tables](https://docs.databricks.com/aws/en/admin/system-tables/)
- [Billing Tables](https://docs.databricks.com/aws/en/admin/system-tables/billing)
- [Unity Catalog Governance](https://docs.databricks.com/aws/en/data-governance/unity-catalog/)
- [Observability Dashboards](https://github.com/CodyAustinDavis/dbsql_sme/tree/main/Observability%20Dashboards%20and%20DBA%20Resources)

## 1. System Tables Overview

### What are System Tables?

**System Tables** provide observability into:
- Billing and usage
- Query execution
- Warehouse performance
- Audit logs
- Lineage information

### Available Schemas

```
system.billing.*        - Cost and usage data
system.compute.*        - Cluster and warehouse metrics
system.query.*          - Query execution logs
system.audit.*          - Audit logs
system.lineage.*        - Data lineage
```

### Access Requirements

**In Production:**
- Account admin privileges
- Unity Catalog enabled
- System tables schema access

**In This Training:**
- We'll use synthetic data with matching schemas
- Demonstrates real-world queries and patterns

---

## 2. Billing and Cost Analysis

### Synthetic Billing Data Setup

In [None]:
# Create synthetic billing data for training
from pyspark.sql import functions as F
from datetime import datetime, timedelta
import random

# Generate sample billing records
dates = [(datetime.now() - timedelta(days=x)).strftime('%Y-%m-%d') for x in range(30)]
workspaces = ['prod-workspace', 'dev-workspace', 'staging-workspace']
sku_names = ['JOBS_COMPUTE', 'ALL_PURPOSE_COMPUTE', 'SQL_COMPUTE', 'DELTA_LIVE_TABLES']
users = ['user1@company.com', 'user2@company.com', 'user3@company.com', 'system']

billing_data = []
for date in dates:
    for _ in range(50):
        billing_data.append({
            'usage_date': date,
            'workspace_id': random.choice(workspaces),
            'sku_name': random.choice(sku_names),
            'usage_quantity': round(random.uniform(0.1, 10.0), 2),
            'usage_unit': 'DBU',
            'list_price': round(random.uniform(0.1, 2.0), 2),
            'usage_metadata': {
                'job_id': f'job_{random.randint(1, 100)}',
                'cluster_id': f'cluster_{random.randint(1, 20)}',
                'user': random.choice(users)
            }
        })

billing_df = spark.createDataFrame(billing_data)
billing_df.write.mode('overwrite').saveAsTable('training.system_billing')

print('✅ Synthetic billing data created')
billing_df.limit(10).display()

### Cost Analysis Queries

**Total Cost by Day:**
```sql
SELECT 
  usage_date,
  SUM(usage_quantity * list_price) as total_cost
FROM training.system_billing
GROUP BY usage_date
ORDER BY usage_date DESC;
```

**Cost by Workspace:**
```sql
SELECT 
  workspace_id,
  SUM(usage_quantity * list_price) as total_cost,
  COUNT(*) as num_operations
FROM training.system_billing
WHERE usage_date >= CURRENT_DATE - 30
GROUP BY workspace_id
ORDER BY total_cost DESC;
```

**Cost by SKU Type:**
```sql
SELECT 
  sku_name,
  SUM(usage_quantity) as total_dbus,
  SUM(usage_quantity * list_price) as total_cost,
  AVG(usage_quantity * list_price) as avg_cost_per_operation
FROM training.system_billing
WHERE usage_date >= CURRENT_DATE - 30
GROUP BY sku_name
ORDER BY total_cost DESC;
```

**Cost by User:**
```sql
SELECT 
  usage_metadata.user,
  COUNT(*) as operations,
  SUM(usage_quantity * list_price) as total_cost
FROM training.system_billing
WHERE usage_date >= CURRENT_DATE - 30
GROUP BY usage_metadata.user
ORDER BY total_cost DESC
LIMIT 10;
```

---

## 3. Usage Monitoring

### Warehouse Usage

**Query Execution Metrics:**
```sql
-- Most expensive queries
SELECT 
  query_id,
  query_text,
  execution_time_ms,
  rows_produced,
  bytes_scanned,
  compute_cost
FROM training.query_history
WHERE query_start_time >= CURRENT_DATE - 7
ORDER BY compute_cost DESC
LIMIT 20;
```

**Query Patterns:**
```sql
-- Most common query patterns
SELECT 
  REGEXP_EXTRACT(query_text, 'FROM\s+(\w+)', 1) as table_name,
  COUNT(*) as query_count,
  AVG(execution_time_ms) as avg_duration_ms
FROM training.query_history
GROUP BY table_name
HAVING table_name IS NOT NULL
ORDER BY query_count DESC;
```

---

## 4. Unity Catalog Security

### Access Control

**Grant Privileges:**
```sql
-- Grant SELECT on schema
GRANT SELECT ON SCHEMA default.db_crash_course TO `data-analysts`;

-- Grant table access
GRANT SELECT ON TABLE default.db_crash_course.sensor_enriched TO `data-analysts`;

-- Grant usage on catalog
GRANT USAGE ON CATALOG default TO `data-analysts`;
```

**Row-Level Security:**
```sql
-- Create row filter
CREATE FUNCTION default.db_crash_course.filter_by_region(region STRING)
RETURN region IN (
  SELECT region FROM user_permissions 
  WHERE user_email = current_user()
);

-- Apply filter
ALTER TABLE default.db_crash_course.sensor_enriched
SET ROW FILTER default.db_crash_course.filter_by_region(region) ON (region);
```

**Column Masking:**
```sql
-- Mask sensitive columns
CREATE FUNCTION default.db_crash_course.mask_device_id(device_id STRING)
RETURN CASE 
  WHEN is_member('admin') THEN device_id
  ELSE CONCAT('***', SUBSTRING(device_id, -4, 4))
END;

-- Apply mask
ALTER TABLE default.db_crash_course.sensor_enriched
ALTER COLUMN device_id
SET MASK default.db_crash_course.mask_device_id;
```

### Audit Logging

**Track Data Access:**
```sql
SELECT 
  event_time,
  user_identity.email as user,
  request_params.table_full_name as table_accessed,
  action_name
FROM system.access.audit
WHERE action_name IN ('SELECT', 'UPDATE', 'DELETE')
  AND event_date >= CURRENT_DATE - 7
ORDER BY event_time DESC;
```

---

## 5. Observability Dashboards

### Create Cost Dashboard

**Key Visualizations:**

1. **Daily Cost Trend** (Line Chart)
```sql
SELECT usage_date, SUM(usage_quantity * list_price) as cost
FROM training.system_billing
GROUP BY usage_date
ORDER BY usage_date;
```

2. **Cost by Workspace** (Bar Chart)
```sql
SELECT workspace_id, SUM(usage_quantity * list_price) as cost
FROM training.system_billing
WHERE usage_date >= CURRENT_DATE - 30
GROUP BY workspace_id;
```

3. **Top Cost Drivers** (Table)
```sql
SELECT 
  usage_metadata.job_id,
  sku_name,
  SUM(usage_quantity * list_price) as total_cost
FROM training.system_billing
WHERE usage_date >= CURRENT_DATE - 7
GROUP BY usage_metadata.job_id, sku_name
ORDER BY total_cost DESC
LIMIT 20;
```

4. **User Cost Allocation** (Pie Chart)
```sql
SELECT 
  usage_metadata.user,
  SUM(usage_quantity * list_price) as cost
FROM training.system_billing
WHERE usage_date >= CURRENT_DATE - 30
GROUP BY usage_metadata.user;
```

---

## Summary

✅ **System tables** - Observability into usage and costs  
✅ **Billing analysis** - Track and allocate costs  
✅ **Usage monitoring** - Query patterns and performance  
✅ **Unity Catalog security** - Row filters and column masking  
✅ **Governance dashboards** - Visual cost tracking  

### Key Takeaways:

1. **Monitor costs regularly** - Daily/weekly review
2. **Implement cost allocation** - Tag and track by team/project
3. **Use row-level security** - Protect sensitive data
4. **Audit access** - Track who accesses what
5. **Create dashboards** - Visualize key metrics

### Cost Optimization Tips:

- Use job clusters instead of all-purpose
- Enable auto-termination
- Right-size clusters
- Use spot instances where possible
- Schedule non-urgent jobs for off-peak hours
- Archive old data to cheaper storage

---

**Additional Resources:**
- [System Tables Guide](https://docs.databricks.com/aws/en/admin/system-tables/)
- [Unity Catalog Security](https://docs.databricks.com/aws/en/data-governance/unity-catalog/access-control)
- [Observability Examples](https://github.com/CodyAustinDavis/dbsql_sme/tree/main/Observability%20Dashboards%20and%20DBA%20Resources)
- [Cost Management](https://docs.databricks.com/aws/en/admin/account-settings/usage-detail-tags)