# Introduction to Unity Catalog & Data Governance

**Objective:** Understand the concept of "Unified Governance" in Databricks using Unity Catalog. We will learn about the object hierarchy (Metastore, Catalog, Schema, Table) and the importance of the 3-level namespace.

---

## 1. What is Data Governance?

Data Governance ensures your data is:
1.  **Secure:** Only authorized users can access it.
2.  **Available:** Users can find the data they need (Discovery).
3.  **Accurate:** You know where the data came from (Lineage) and who changed it (Audit).

### The Problem "Without" Unity Catalog
*   Governance was tied to a specific **Workspace**.
*   If you had a `Dev` workspace and a `Prod` workspace, you had to define users and permissions twice.
*   Data was siloed within workspaces.

### The Solution "With" Unity Catalog
*   **Unified Governance:** Define permissions once at the Account level, and they apply across all Workspaces attached to the Metastore.
*   **Centralized Metadata:** Metadata lives in the Control Plane, while actual data remains in your Cloud Storage (Data Plane).

## 2. The Unity Catalog Object Model

Unity Catalog introduces a strict hierarchy to organize data assets.

| Level | Object | Description |
| :--- | :--- | :--- |
| **0** | **Metastore** | The top-level container for metadata. Usually created **one per region** (e.g., East US). |
| **1** | **Catalog** | The first level of data grouping. Useful for separating environments (e.g., `dev_catalog`, `prod_catalog`) or business units. |
| **2** | **Schema** | (Also called Database). Contains tables, views, and volumes. |
| **3** | **Data Assets** | **Tables** (Structured data), **Volumes** (Unstructured files), **Models** (ML), **Functions**. |

---

## 3. The 3-Level Namespace

To access any data in Unity Catalog, you must use the **3-level namespace**:

$$ \text{catalog\_name} \cdot \text{schema\_name} \cdot \text{table\_name} $$

*Example:* `prod_catalog.sales_schema.orders_table`

In [None]:
-- Concept: Addressing data using 3-level namespace
-- Note: This code will only run if you have a Catalog created (which we will do in the next video).

-- 1. Select from a specific catalog and schema
SELECT * FROM main.default.my_table;

-- 2. Switch context to a specific catalog
USE CATALOG main;

-- 3. Switch context to a specific schema
USE SCHEMA default;

-- Now you can query directly (implicit context)
SELECT * FROM my_table;

## 4. Managed vs. External Data

Unity Catalog manages data in two ways:

### A. Managed Tables
*   **What:** You create a table without specifying a location.
*   **Where:** Data is stored in the "Root Storage Account" (Managed Storage) configured for the Metastore or Catalog.
*   **Behavior:** If you drop the table in Databricks, the underlying files are **deleted**.

### B. External Tables
*   **What:** You create a table pointing to a specific path (e.g., `s3://my-bucket/data` or `abfss://...`).
*   **Where:** Data stays in your existing storage location.
*   **Behavior:** If you drop the table, only the metadata is removed. The actual files **remain**.

---

## 5. Key Benefits of Unity Catalog

1.  **Define Once, Secure Everywhere:** Grant access to a group on a Catalog, and they get access in every workspace.
2.  **Data Lineage:** Automatically tracks how data flows (e.g., Table A -> Notebook -> Table B).
3.  **Data Discovery:** Search for data assets across the entire organization.
4.  **Delta Sharing:** Securely share data with external organizations without copying it.

## Next Steps
Now that we understand the theory and the 3-level namespace, in the next video, we will perform the **Hands-on Setup of Unity Catalog**. We will:
1.  Create a Metastore.
2.  Create an Access Connector (Managed Identity).
3.  Create our first Catalog and Schema.