# Governing Data with Unity Catalog

### **What Is Unity Catalog?**

* An **open-source, centralized governance** solution across all workspaces and clouds.
* Manages files, tables, ML models, and dashboards uniformly.
* Ensures consistent **access control** and simplified **data lifecycle management**.

[Learn More](https://docs.databricks.com/aws/en/data-governance/unity-catalog/)

### **Unity Catalog Architecture**

#### **Before UC**

* Governance was isolated per workspace.
* Identity, metastore, and access control required separate setup.
* Led to **redundancy and inconsistencies**.

#### **With UC**

* Centralized management via **account console**.
* Decoupled from individual workspaces.
* Improves **consistency, security, and operational efficiency**.

### **Key Architectural Advancements**

1. **Centralized Identity Management**

   * Users and groups managed via account console.
   * Reused across multiple workspaces.

2. **Centralized Metastore**

   * Single UC metastore per region can serve multiple workspaces.
   * Reduces data duplication and enhances data sharing.

3. **Centralized Access Control**

   * Access policies defined once and enforced across all workspaces.
   * Promotes consistency and reduces risk.


### **Three-Level Namespace**

* Transition from Hive’s `schema.table` to UC’s `catalog.schema.table`.
* Enables **granular organization** and **better scalability**.

### The Unity Catalog object model

1. **Metastore**

   * Top-level container for catalogs and access policies.
   * Managed independently of workspaces.

1. **Catalogs**

   * Group of schemas; 1st level in namespace.

1. **Schemas (Databases)**

   * Contain related tables, views, ML models, etc.
   * 2nd level in namespace.



1. **Storage Access Objects**

   * **Storage Credentials**: Abstract cloud credentials.
   * **External Locations**: Map credentials to storage paths.

1. **Delta Sharing Entities**

   * **Shares**: Collections of shareable data assets.
   * Facilitate data exchange with external consumers.

![The Unity Catalog object model](https://docs.databricks.com/aws/en/assets/images/object-model-40d730065eefed283b936a8664f1b247.png)

### **Identity Management**

* **Users**: Human users; identified by email.
* **Service Principals**: App/tool identities; support automation.
* **Groups**: Logical units combining users/service principals.

  * Groups can be **nested** for better role delegation.

### **Identity Federation**

* Identities defined at **account level**.
* Federated to multiple workspaces.
* Reduces duplication and enhances **security + administrative efficiency**.


### **UC Security Model**

* Based on **ANSI SQL GRANT** statements.

#### **Privileges**

* **Core Privileges:**

  * `CREATE`: Create objects (catalog, schema, table, function).
  * `USE`: Access catalogs or schemas.
  * `SELECT`: Read data.
  * `MODIFY`: Insert, update, delete.

* **Storage Privileges:**

  * `READ FILES`: Read from volumes/storage.
  * `WRITE FILES`: Write to storage.

* **Execution Privilege:**

  * `EXECUTE`: Run functions or models.

### Create Catalog

In [0]:
-- Create a catalog named 'sales_catalog'
DROP CATALOG IF EXISTS sales_catalog CASCADE;
CREATE CATALOG sales_catalog
MANAGED LOCATION 'abfss://catalog@storage33e.dfs.core.windows.net/sales_db'
COMMENT 'Catalog to manage sales-related data';

### Create Schema (Database)

In [0]:
-- Create a schema under the catalog
DROP SCHEMA IF EXISTS sales_catalog.retail_db;
CREATE SCHEMA sales_catalog.retail_db
COMMENT 'Schema for retail sales data';

### Create Table with Sample Data

In [0]:
-- Create a table in the schema
DROP TABLE IF EXISTS sales_catalog.retail_db.fact_sales;
CREATE TABLE sales_catalog.retail_db.fact_sales (
  sale_id INT,
  product_name STRING,
  quantity INT,
  unit_price DOUBLE,
  sale_date DATE
);

-- Insert sample data
INSERT INTO sales_catalog.retail_db.fact_sales VALUES
(1, 'Mobile', 2, 19999.0, '2024-01-01'),
(2, 'Laptop', 1, 49999.0, '2024-01-02'),
(3, 'Tablet', 3, 14999.0, '2024-01-03');


### Query the Data

In [0]:
-- Query sample data
SELECT * FROM sales_catalog.retail_db.fact_sales;

### Grant Object-Level Permissions

In [0]:
-- Grant USAGE on the catalog
GRANT USAGE ON CATALOG sales_catalog TO `user1@pankajacksgmail.onmicrosoft.com`;

In [0]:
-- Grant USAGE on the schema
GRANT USAGE ON SCHEMA sales_catalog.retail_db TO `retail_analysts`;

In [0]:
-- Grant SELECT on the table
GRANT SELECT ON TABLE sales_catalog.retail_db.fact_sales TO `user1@pankajacksgmail.onmicrosoft.com`;

### Deny or Revoke Access

In [0]:
-- Revoking SELECT privileges on the table from the user
REVOKE SELECT ON
TABLE sales_catalog.retail_db.fact_sales
FROM `user1@pankajacksgmail.onmicrosoft.com`;

In [0]:
-- Denying SELECT privileges on the table to the user
DENY SELECT ON 
TABLE sales_catalog.retail_db.fact_sales
TO `user1@pankajacksgmail.onmicrosoft.com`;