# Creating Catalogs & External Locations in Unity Catalog

**Objective:** Learn how to create the first level of the Unity Catalog hierarchy: the **Catalog**. We will create two types of catalogs:
1.  **Standard Catalog:** Uses the default storage location defined at the Metastore level.
2.  **Isolated Catalog:** Uses a specific **External Location** (Azure ADLS Gen2) to store its managed tables, separating physically from other data.

---

## 1. The Unity Catalog Object Model (Recap)

$$ \text{Metastore} \rightarrow \text{Catalog} \rightarrow \text{Schema} \rightarrow \text{Table/Volume} $$

*   **Catalog:** The highest container in the 3-level namespace (e.g., `prod_catalog`.`sales`.`orders`).
*   **Storage Hierarchy:** If you don't define a storage location at the Catalog level, it falls back to the Metastore's root storage. If you define it, all managed tables in that catalog are stored there.

---

## 2. Managing the Legacy `hive_metastore`
Before creating new catalogs, note that every workspace comes with a `hive_metastore`. This is the legacy layer.

In [None]:
-- Check existing catalogs
-- You should see 'hive_metastore' (legacy) and 'main' (default UC catalog) if enabled.
SHOW CATALOGS;

## 3. Creating a Standard Catalog
A standard catalog does not have a specific `MANAGED LOCATION` defined. It will inherit the storage path from the Metastore setup.

In [None]:
-- Create a new catalog named 'dev'
CREATE CATALOG IF NOT EXISTS dev
COMMENT 'This is a standard development catalog using Metastore root storage';

-- Verify creation and check the 'Storage Root' property
DESCRIBE CATALOG EXTENDED dev;

## 4. Setting up Storage for an Isolated Catalog

To create a catalog that stores data in a *different* storage container (e.g., for separation of duties or specific region requirements), we need to set up:
1.  **Azure Storage Container:** A physical folder in ADLS Gen2.
2.  **Access Connector:** An Azure resource (Managed Identity) to allow Databricks access.
3.  **Storage Credential:** A Unity Catalog object wrapping the Access Connector.
4.  **External Location:** A Unity Catalog object mapping the Storage Credential to the specific Container URL.

### Step 4.1: Azure Setup (Manual Steps)
*   **Create Container:** Create a container named `data` in your Storage Account. Create a folder `catalog` inside it.
*   **Create Access Connector:** Create an "Access Connector for Azure Databricks" in Azure Portal.
*   **IAM Assignment:** Go to your Storage Account -> IAM -> Add Role Assignment -> **Storage Blob Data Contributor** -> Assign to the Access Connector created above.

### Step 4.2: Create Storage Credential
*Note: Replace the `access_connector_id` with your specific Resource ID from Azure.*

In [None]:
-- Create the credential that Unity Catalog uses to talk to Azure
-- Replace <resource-id> with the ID from the Azure Access Connector
CREATE STORAGE CREDENTIAL IF NOT EXISTS sc_catalog_storage
    USING 'azure_managed_identity'
    OPTIONS (access_connector_id '</subscriptions/.../resourceGroups/.../providers/Microsoft.Databricks/accessConnectors/adb-uc-connector>');
    
-- Verify
DESCRIBE STORAGE CREDENTIAL sc_catalog_storage;

### Step 4.3: Create External Location
This authorizes Unity Catalog to write data to the specific cloud path using the credential we just made.

In [None]:
-- Define the path where this catalog's data will live
-- Format: abfss://<container>@<storage_account>.dfs.core.windows.net/<path>
CREATE EXTERNAL LOCATION IF NOT EXISTS ext_catalog_loc
    URL 'abfss://data@adbeasewithdata01.dfs.core.windows.net/adb/catalog'
    WITH STORAGE CREDENTIAL sc_catalog_storage;

-- Verify
DESCRIBE EXTERNAL LOCATION ext_catalog_loc;

## 5. Creating a Catalog with Managed Location
Now we create the catalog `dev_ext` and explicitly tell it to use the `ext_catalog_loc` we just created. Any managed table created in this catalog will be stored physically in the `data` container, NOT the root `metastore` container.

In [None]:
-- Create catalog with specific storage isolation
CREATE CATALOG IF NOT EXISTS dev_ext
MANAGED LOCATION 'abfss://data@adbeasewithdata01.dfs.core.windows.net/adb/catalog';

-- Verify the 'Storage Root' property. 
-- It should point to your new 'data' container, not the default 'metastore' container.
DESCRIBE CATALOG EXTENDED dev_ext;

## 6. Summary of Created Objects

| Object | Name | Purpose |
| :--- | :--- | :--- |
| **Catalog** | `dev` | Uses default Metastore storage. Good for general use. |
| **Storage Credential** | `sc_catalog_storage` | The key/identity to access the secondary storage container. |
| **External Location** | `ext_catalog_loc` | The bridge allowing Databricks to write to that specific container path. |
| **Catalog** | `dev_ext` | Isolated catalog. All data resides in its own dedicated storage path. |

## Next Steps
In the next video, we will create **Schemas** (Databases) inside these catalogs and start creating tables to see how data is physically stored in these different locations.