<!-- SPDX-License-Identifier: CC-BY-4.0 -->
<!-- Copyright Contributors to the ODPi Egeria project 2024. -->

![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/main/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria Workbook

# Surveying and Cataloguing Unity Catalog (UC)

## Introduction

Both Unity Catalog and Egeria are open source projects with the LF AI and Data.  The difference between these technologies is:

 * Unity Catalog is responsible for governing access to data; whereas Egeria governs the exchange of metadata between tools and systems, such as Unity Catalog.

 * Similarly, Unity Catalog maintains a metadata repository describing the data it is protecting.  In contrast, Egeria maintains a distributed network of metadata repositories containing metadata about the technology (systems, tools, data), the processes that are operating on them, along with the people and organizations involved.

This notebook shows the integration between Egeria and Unity Catalog. Egeria can run surveys of the catalogs within a Unity Catalog server, it can catalog the contents of these catalogs (so that they can be incorporated in larger governance processes and searches) and it is possible to provision catalogs, schemas and volumes to a Unity Catalog Server.

Run the code below to initialize Egeria's python libaries and create a client to the Egeria servers.

---


In [1]:
# Initialize pyegeria

%run ../../pyegeria/initialize-pyegeria.ipynb


In [2]:

egeria_tech = EgeriaTech(view_server, url, user_id, user_pwd)
token = egeria_tech.create_egeria_bearer_token()


---

## Loading support for Unity Catalog

The definition of the Unity Catalog connectors, templates and associated reference data are loaded via a [Content Pack](https://egeria-project.org/content-packs/) called `UnityCatalogContentPack.omarchive`.  The content pack can be loaded multiple times without ill-effect so run the following command to make sure it is loaded.

---

In [3]:

egeria_tech.add_archive_file("content-packs/UnityCatalogContentPack.omarchive", None, "qs-metadata-store")

print("Archive loaded!")


Archive loaded!


----

Run the command command below to confirm that the connectors that synchronize metadata between Egeria and Unity Catalog are running, waiting for work:

* UnityCatalogServerSynchronizer synchronizes catalog information from a Unity Catalog Server. It passes details of the catalogs it finds onto UnityCatalogInsideCatalogSynchronizer.
* UnityCatalogInsideCatalogSynchronizer synchronizes the schema, volume, table and function metadata between Egeria and Unity Catalog.

----

In [4]:
display_integration_daemon_status(['UnityCatalogServerSynchronizer', 'UnityCatalogInsideCatalogSynchronizer'], paging=True, width=150)

                                                 Integration Daemon Status @ Fri Jun 20 13:08:53 2025                                                 
╭────────────────────────────────────────────────┬──────────┬──────────────────────────┬──────────┬───────────────────────────┬──────────────────────╮
│                                                │          │                          │ Min      │                           │                      │
│                                                │          │                          │ Refresh  │                           │                      │
│ Connector Name                                 │ Status   │ Last Refresh Time        │ (mins)   │ Target Element            │ Exception Message    │
├────────────────────────────────────────────────┼──────────┼──────────────────────────┼──────────┼───────────────────────────┼──────────────────────┤
│ UnityCatalogInsideCatalogSynchronizer          │ WAITING  │ 2025-06-20T13:01:52      │ 60   

----

## Gathering data about your Unity Catalog Server

You need to provide Egeria with some basic information about your Unity Catalog server.  Fill in the details below.

-----

In [5]:
# This is the technology type if the server is the Open Source version of Unity Catalog.
# Use "Databricks Unity Catalog Server" if this is a Databricks cloud service version of Unity Catalog.
serverTechnologyType = "Unity Catalog Server"

# This is the network location used to call the Unity Catalog Server.
# For Unity Catalog running on your local machine, use:
#   * "http://localhost" if Egeria is also running natively on your local machine, or
#   * "http://host.docker.internal" if Egeria is running in a docker image.
#
hostURL="http://localhost"
#hostURL="http://host.docker.internal"
portNumber="8087"

# This is a unique name that the server is known as
serverName="Unity Catalog 1"

# Add a short description of the server
description="Local instance of the Unity Catalog (UC) Server."

# This is the verion of Unity Catalog
versionIdentifier="v0.3.0-SNAPSHOT"

# This is the userId that the connectors will use when creating metadata in Egeria that describes resources in Unity Catalog.
serverUserId="uc1"

# These values are used in later sections
serverQualifiedName=serverTechnologyType + ":" + serverName
serverNetworkAddress=hostURL + ":" + portNumber


-----

## Survey a Unity Catalog Server

The Unity Catalog support includes the ability to survey the contents of a Unity Catalog Server.  This command creates a description of the Unity Catalog Server and runs a survey to understand its contents.  A summary of the survey results can be found in /distribution-hub/surveys.

---

In [6]:

createAndSurveyServerName="UnityCatalogServer:CreateAndSurveyGovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(createAndSurveyServerName)

process_graph = egeria_tech.get_gov_action_process_graph(process_guid)
print_governance_action_process_graph(process_graph)


In [7]:

requestParameters = {
    "hostURL" : hostURL,
    "portNumber" : portNumber,
    "serverName" : serverName,  
    "versionIdentifier" : versionIdentifier,
    "description" : description,
    "serverUserId" : serverUserId
}

egeria_tech.initiate_gov_action_process(createAndSurveyServerName, None, None, None, requestParameters, None, None)


'696d0b59-78ba-49a8-bb11-670835b05bb0'

----

The command below displays the latest governance actions.  You should see they are in **ACTIONED** status.  If you see failures it means that either the Unity Catalog server is not running or the values describing its location are not correct.  If Unity Catalog is down, restart it and re-run the cell above.  If you realize one or more of the values describing Unity Catalog is not right, go to the bottom of this notebook to the section **Starting again ...** and run the delete process.  Then you can go back up to the section called **Gathering data about your Unity Catalog Server** and carry on from there.

----

In [8]:
display_engine_activity_c(row_limit=3, width=150)

                            Engine Action Status for Platform https://host.docker.internal:9443 @ Fri Jun 20 13:09:11 2025                            
╭─────────────────────┬─────────────────────────────┬─────────────────────────────┬───────────────┬─────────────────────┬────────────────────────────╮
│ Requested Time      │ Core Info                   │ Target Elements             │ Action Status │ Completion Time     │ Core Results               │
├─────────────────────┼─────────────────────────────┼─────────────────────────────┼───────────────┼─────────────────────┼────────────────────────────┤
│ 2025-06-20T13:09:11 │                             │                             │ APPROVED      │                     │                            │
│                     │  • Start Time:              │  • Target Name: newAsset    │               │                     │  • Completion Guards:      │
│                     │  • Engine Name:             │     • Target GUID:          │           

----

Open up the survey file and review the contents of the Unity Catalog Server. Notice there can be multiple catalogs in a Unity Catalog Server.  Also notice the hierarchical naming of the unity catalog elements.  Catalogs have schemas inside them and the schemas can have tables, functions and/or volumes within them.

----

Use the command `hey_egeria_ops show engines activity --compressed` to view the governance actions that ran as a result of the survey requests.  
There were two steps.  First it created a `SoftwareServer` element to represent the Unity Catalog Server. 
This stores the network address of the server.  Then the survey was run using this element.

-----

Now navigate to `distribution-hub/logs/openlineage/GovernanceActions`.  This directory stores the open lineage events created when the surveys were run.  Each event record the start or stop of a governance action.



---

## Cataloguing Unity Catalog metadata in Egeria

If the surveys look interesting, it is possible to synchronize the metadata between Unity Catalog and Egeria.  Run the command command below. 

The process shown below will request that the contents of the first Unity Catalog server is catalogued into Egeria.

---

In [9]:

createAndCatalogServerName="UnityCatalogServer:CreateAsCatalogTargetGovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(createAndCatalogServerName)

process_graph = egeria_tech.get_gov_action_process_graph(process_guid)
print_governance_action_process_graph(process_graph)


----

The code below will configure these connectors to catalog the server.

----

In [10]:

requestParameters = {
    "hostURL" : hostURL,
    "portNumber" : portNumber,
    "serverName" : serverName,  
    "versionIdentifier" : versionIdentifier,
    "description" : description,
    "serverUserId" : serverUserId
}

egeria_tech.initiate_gov_action_process(createAndCatalogServerName, None, None, None, requestParameters, None, None)


'5eb9468b-b56e-4d23-9fca-e17e87b03bca'

----

Again it is possible to watch the execution of the process.  Notice that only 2 engine actions are activated.

----

In [12]:
display_engine_activity_c(row_limit=2,width=150)

                            Engine Action Status for Platform https://host.docker.internal:9443 @ Fri Jun 20 13:09:27 2025                            
╭─────────────────────┬─────────────────────────────┬─────────────────────────────┬───────────────┬─────────────────────┬────────────────────────────╮
│ Requested Time      │ Core Info                   │ Target Elements             │ Action Status │ Completion Time     │ Core Results               │
├─────────────────────┼─────────────────────────────┼─────────────────────────────┼───────────────┼─────────────────────┼────────────────────────────┤
│ 2025-06-20T13:09:24 │                             │                             │ ACTIONED      │ 2025-06-20T13:09:27 │                            │
│                     │  • Start Time:              │  • Target Name:             │               │                     │  • Completion Guards:      │
│                     │    2025-06-20T13:09:27      │    integrationConnector     │           

----

The effect of the process is to configure the *UnityCatalogServerSynchronizer* connector to extract the catalog found in the Unity Catalog Server and then configure the *UnityCatalogInsideCatalogSynchronizer* to catalog each one it finds.
The Target Element column shows the details of the server/catalogs they are working with.

----

In [13]:
display_integration_daemon_status(['UnityCatalogServerSynchronizer', 'UnityCatalogInsideCatalogSynchronizer'], paging=True, width=170)

                                                           Integration Daemon Status @ Fri Jun 20 13:09:30 2025                                                           
╭───────────────────────────────────────┬────────┬─────────────────────┬────────┬────────────────────────────────────────────────────────────────────┬───────────────────╮
│                                       │        │                     │ Min    │                                                                    │                   │
│                                       │        │                     │ Refre… │                                                                    │                   │
│ Connector Name                        │ Status │ Last Refresh Time   │ (mins) │ Target Element                                                     │ Exception Message │
├───────────────────────────────────────┼────────┼─────────────────────┼────────┼────────────────────────────────────────────────────────────────

---

Below is a graph of the elements found during the cataloguing process.  You can see the schemas and the tables, volumes and functions underneath them.  The types of the elements are shown using [open metadata types](https://egeria-project.org/connectors/unity-catalog/#open-metadata-type-mapping-for-unity-catalog).

---

In [15]:
unity_default_qualified_name="Unity Catalog Schema::" + serverNetworkAddress + "::unity.default"

print_asset_lineage_graph(egeria_tech,unity_default_qualified_name)




----

You can use the following commands to show the elements from Unity Catalog in Egeria:

* `hey_egeria cat show assets elements-of-tech-type --tech_type 'Unity Catalog Server'` for the Unity Catalog Servers.
* `hey_egeria cat show assets elements-of-tech-type --tech_type 'Unity Catalog Catalog'` for the Unity Catalog Catalogs.
* `hey_egeria cat show assets elements-of-tech-type --tech_type 'Unity Catalog Schema'` for the Unity Catalog Schemas.
* `hey_egeria cat show assets elements-of-tech-type --tech_type 'Unity Catalog Table'` for the Unity Catalog Tables.
* `hey_egeria cat show assets elements-of-tech-type --tech_type 'Unity Catalog Volume'` for the Unity Catalog Volumes.
* `hey_egeria cat show assets elements-of-tech-type --tech_type 'Unity Catalog Function'` for the Unity Catalog Functions.

You can also use `hey_egeria cat show assets in-asset-domain inventory` to search for assets that include the word `inventory` in it.

----

It is also possible to use python functions to retrieve information about the Unity Catalog resources.  This uses the `egeria_cat` client.

----

In [None]:
from commands.ops.list_catalog_targets import display_catalog_targets

display_catalog_targets("UnityCatalogInsideCatalogSynchronizer",
                       view_server,
                       url,
                       user_id,
                       user_pwd)


In [None]:

egeria_cat = EgeriaCat(view_server, url, user_id, user_pwd)
egeria_cat.set_bearer_token(token)


----

The command below retrieves assets that match the search query.  If you see the message "no assets found" then re-run the request as the synchronization process may not have been comopleted.

----

In [None]:

search_results = egeria_cat.find_in_asset_domain("numbers")

print_search_results(search_results)
       

----
This shows that metadata can be copied from Unity Catalog into Egeria's metadata repository and represented using Open Metadata Types.  The next part of the demonstration show metadata flowing from Egeria to Unity Catalog.


----

# Provisioning Unity Catalog (UC) from Egeria

Egeria has the ability to provision resources into Unity Catalog.  The desired resources are described in Egeria using Open Metadata Elements and linked to the representation of the server, catalog or schema where the new resource is to be created.  Once the description is in place, the appropriate Unity Catalog Integration Connector will create a matching definition in Unity Catalog.

The sections below go through creating a catalog, then a schema within that catalog and then a volume within that schema.  Notice that the ordering is important.  The catalog must be created before it schemas etc.

-----

----

## Provision a new catalog into a Unity Catalog (UC) Server

There is only one step to provision a new catalog into a Unity Catalog Server:

-----

In [None]:

provisionCatalogName="Provision:UnityCatalogCatalog:GovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(provisionCatalogName)

process_graph = egeria_tech.get_gov_action_process_graph(process_guid)
print_governance_action_process_graph(process_graph)


----

Fill in details about the new catalog that you want to create.

----

In [None]:

catalogName = "new_catalog"
catalogDescription = "My new catalog."


----

Now run the governance process to create the catalog.

----

In [None]:

requestParameters = {
    "ucServerQualifiedName" : serverQualifiedName,
    "serverNetworkAddress" : serverNetworkAddress,
    "ucCatalogName" : catalogName,
    "versionIdentifier" : versionIdentifier,
    "description" : catalogDescription
}

egeria_tech.initiate_gov_action_process(provisionCatalogName, None, None, None, requestParameters, None, None)


------

The code below shows the catalog you now have defined.

------

In [None]:

catalog_qualified_name="Unity Catalog Catalog:" + serverNetworkAddress + ":" + catalogName

element=egeria_tech.get_element_by_unique_name(catalog_qualified_name)

print_element("", element)


----

The next command lists the relationships to other elements that this catalog has:

----

In [None]:

catalog_guid=egeria_tech.get_element_guid_by_unique_name(catalog_qualified_name)
print(catalog_guid)

related_elements = egeria_tech.get_related_elements(catalog_guid)

print_related_elements("", related_elements)


----

## Provision a new schema into a Unity Catalog (UC) Catalog

This is the process to provision a new schema into a Unity Catalog Catalog:

-----

In [None]:

provisionSchemaName="Provision:UnityCatalogSchema:GovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(provisionSchemaName)

process_graph = egeria_tech.get_gov_action_process_graph(process_guid)
print_governance_action_process_graph(process_graph)


----

Fill in details about the new schema that you want to create.

----

In [None]:

schemaName = "new_schema"
schemaDescription = "My new schema."


----

Now run the governance process to create the schema.

----

In [None]:

requestParameters = {
    "serverNetworkAddress" : serverNetworkAddress,
    "ucCatalogName" : catalogName,
    "ucSchemaName" : schemaName,
    "versionIdentifier" : versionIdentifier,
    "description" : schemaDescription
}

egeria_tech.initiate_gov_action_process(provisionSchemaName, None, None, None, requestParameters, None, None)


------

The code below shows the schema you now have defined.

------

In [None]:

schema_qualified_name="Unity Catalog Schema:" + serverNetworkAddress + ":" + catalogName + "." + schemaName

element=egeria_tech.get_element_by_unique_name(schema_qualified_name)

print_element("", element)


----

And the elements related to it ...

____

In [None]:

schema_guid=egeria_tech.get_element_guid_by_unique_name(schema_qualified_name)
print(schema_guid)

related_elements = egeria_tech.get_related_elements(schema_guid)

print_related_elements("", related_elements)


----

## Provision a new volume into a Unity Catalog (UC) Schema

This is the process to provision a new volume into a Unity Catalog Schema:

-----

In [None]:

provisionVolumeName="Provision:UnityCatalogVolume:GovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(provisionVolumeName)

process_graph = egeria_tech.get_gov_action_process_graph(process_guid)
print_governance_action_process_graph(process_graph)


----

Fill in details about the new volume that you want to create.

----

In [None]:

volumeName = "new_volume"
volumeDescription = "My new volume."
storageLocation = "data/new_volume"
volumeType = "EXTERNAL"


----

Now run the governance process to create the volume.

----

In [None]:

requestParameters = {
    "serverNetworkAddress" : serverNetworkAddress,
    "ucCatalogName" : catalogName,
    "ucSchemaName" : schemaName,
    "ucVolumeName" : volumeName,
    "versionIdentifier" : versionIdentifier,
    "ucStorageLocation" : storageLocation,
    "description" : volumeDescription,
    "ucVolumeType" : volumeType
}

egeria_tech.initiate_gov_action_process(provisionVolumeName, None, None, None, requestParameters, None, None)


------

The code below shows the volume you now have defined.

------

In [None]:

volume_qualified_name="Unity Catalog Volume:" + serverNetworkAddress + ":" + catalogName + "." + schemaName + "." + volumeName

element=egeria_tech.get_element_by_unique_name(volume_qualified_name)

print_element("", element)


----

And the elements related to this schema ...

----

In [None]:

volume_guid=egeria_tech.get_element_guid_by_unique_name(volume_qualified_name)
print(volume_guid)

related_elements = egeria_tech.get_related_elements(volume_guid)

print_related_elements("", related_elements)


----

# Reviewing the integration

As Egeria is exchanging messages with Unity Catalog, it is building a map of the identifiers from Unity Catalog and mapping them to the elements that have been created in the open metadata ecosystem.

The functions below retrieve the mappings for each catalog within Unity Catalog (UC) serevers known to Egeria.

-----

In [None]:

unity_catalog_catalogs = egeria_tech.get_technology_type_elements("Unity Catalog Catalog")

if unity_catalog_catalogs:
    for catalog in unity_catalog_catalogs:
        print()
        print("----------------------------")
        print_external_id_map(catalog)
        

----

# Starting again ...

If you made a mistake in the server details supplied for Unity Catalog, it is possible to remove the server definition using the delete process as follows.  Leave all of the settings unchanged and run this process.  Then you can go to the top, change the settings and re-run the survey and/or catalog processes are desired.

-----

In [None]:
deleteCatalogName="UnityCatalogServer:DeleteAssetWithTemplateGovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(deleteCatalogName)

process_graph = egeria_tech.get_gov_action_process_graph(process_guid)
print_governance_action_process_graph(process_graph)


In [None]:
requestParameters = {
    "hostURL" : hostURL,
    "portNumber" : portNumber,
    "serverName" : serverName,  
    "versionIdentifier" : versionIdentifier,
    "description" : description,
    "serverUserId" : serverUserId
}

egeria_tech.initiate_gov_action_process(deleteCatalogName, None, None, None, requestParameters, None, None)

------