<!-- SPDX-License-Identifier: CC-BY-4.0 -->
<!-- Copyright Contributors to the ODPi Egeria project 2024. -->

![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/main/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria Workbook

# Surveying and Cataloguing Unity Catalog (UC)

## Introduction

Both Unity Catalog and Egeria are open source projects with the LF AI and Data.  The difference between these technologies is:

 * Unity Catalog is responsible for governing access to data; whereas Egeria governs the exchange of metadata between tools and systems, such as Unity Catalog.

 * Similarly, Unity Catalog maintains a metadata repository describing the data it is protecting.  In contrast, Egeria maintains a distributed network of metadata repositories containing metadata about the technology (systems, tools, data), the processes that are operating on them, along with the people and organizations involved.

Run the code below to create a client to the Egeria severs.

---


In [1]:
import os
view_server = os.environ.get("VIEW_SERVER","view-server")
url = os.environ.get("EGERIA_VIEW_SERVER_URL","https://localhost:9443")
user_id = os.environ.get("EGERIA_USER", "peterprofile")
user_pwd = os.environ.get("EGERIA_USER_PASSWORD")

from pyegeria import EgeriaTech, EgeriaCat
import asyncio
import nest_asyncio
nest_asyncio.apply()

from pyegeria import load_mermaid, render_mermaid, generate_process_graph
load_mermaid()


In [2]:
# print functions

def print_property(indent, property_name, property_value):
    print(indent + property_name + ": " + property_value)

def get_property(indent, property_name, properties):
    if properties:
       property_value=properties.get(property_name)
       if property_value:
           print_property(indent, property_name, property_value)

def print_element_header(indent, element_header):
    type_name = element_header["type"]["typeName"]
    guid = element_header["guid"]
    print(indent + type_name + " [" + guid + "]")

def print_properties(indent, property_name, properties):
    if properties:
        if type(properties) is str:
            print_property(indent, property_name, properties)
        elif type(properties) is int:
            print_property(indent, property_name, "{}".format(properties))
        elif type(properties) is dict:
            for key in properties.keys():
                print_properties(indent, key, properties[key])
        else:
            print(f"Funny property: type is {type(properties)}")
            
def print_element(indent, element):
    if element:
        print()
        print_element_header(indent + "", element.get("elementHeader"))
        print_properties(indent + " > ", "properties", element.get("properties"))
        print_properties(indent + " > ", "referenceableProperties", element.get("referenceableProperties"))

def print_related_elements(indent, related_elements):
    if related_elements:
        for related_element in related_elements:
            if related_element:
                print()
                print_element_header(indent, related_element.get("relationshipHeader"))
                print_properties(indent + " > ", "relationshipProperties", related_element.get("relationshipProperties"))
                print_element(indent + "   ", related_element.get("relatedElement"))
                                
def print_search_results(search_results):
    if search_results:
        if search_results == "no assets found":
            print(search_results)
        else:
            for asset in search_results:
                if asset:
                    print()
                    print("Asset: ")
                    print_element_header(" > " , asset.get("elementHeader"))
                    print_properties(" > ", "properties", asset.get("properties"))
                    matchingElements = asset.get("matchingElements")
                    if matchingElements:
                        for matchingElement in matchingElements:
                            print("Matching Element:")
                            print_element_header("   > " , matchingElement.get("elementHeader"))
                            print_properties("   > " , "properties", matchingElement.get("properties"))

def print_external_id_map(catalog):
    if catalog:
        catalog_guid = catalog["elementHeader"]["guid"]
        print_element("", catalog)
        external_ids = egeria_tech.get_related_elements(catalog_guid, "ExternalIdScope")
        if external_ids:
            for external_id in external_ids:
                print()
                relationship_properties = external_id.get("relationshipProperties")
                if relationship_properties:
                    print_property(" > ", "Permitted Synchronization", relationship_properties.get("permittedSynchronization"))
                related_element = external_id.get("relatedElement")
                external_id_guid = related_element["elementHeader"]["guid"]
                open_metadata_elements = egeria_tech.get_related_elements(external_id_guid, "ExternalIdLink")
                print_related_elements("      ", open_metadata_elements)



In [3]:

egeria_tech = EgeriaTech(view_server, url, user_id, user_pwd)
token = egeria_tech.create_egeria_bearer_token()


---

## Loading support for Unity Catalog

The definition of the Unity Catalog connectors, templates and associated reference data are loaded via a [Content Pack](https://egeria-project.org/content-packs/) called `UnityCatalogContentPack.omarchive`.  The content pack can be loaded multiple times without ill-effect so run the following command to make sure it is loaded.

---

In [4]:

egeria_tech.add_archive_file("content-packs/UnityCatalogContentPack.omarchive", None, "active-metadata-store")

print("Archive loaded!")


Archive loaded!


----

## Gathering data about your Unity Catalog Server

You need to provide Egeria with some basic information about your Unity Catalog server.  Fill in the details below.

-----

In [5]:
# This is the technology type if the server is the Open Source version of Unity Catalog.
# Use "Databricks Unity Catalog Server" if this is a Databricks cloud service version of Unity Catalog.
serverTechnologyType = "Unity Catalog Server"

# This is the network location used to call the Unity Catalog Server.
# For Unity Catalog running on your local machine, use:
#   * "http://localhost" if Egeria is also running natively on your local machine, or
#   * "http://host.docker.internal" if Egeria is running in a docker image.
#
hostURL="http://localhost"
portNumber="8080"

# This is a unique name that the server is known as
serverName="Unity Catalog 1"

# Add a short description of the server
description="Local instance of the Unity Catalog (UC) Server."

# This is the verion of Unity Catalog
versionIdentifier="v0.1.0-SNAPSHOT"

# This is the userId that the connectors will use when creating metadata in Egeria that describes resources in Unity Catalog.
serverUserId="uc1"

# These values are user in later sections
serverQualifiedName=serverTechnologyType + ":" + serverName
serverNetworkAddress=hostURL + ":" + portNumber


-----

## Survey a Unity Catalog Server

The Unity Catalog support includes the ability to survey the contents of a Unity Catalog Server.  This command creates a description of the Unity Catalog Server and runs a survey to understand its contents.  A summary of the survey results can be found in /distribution-hub/surveys.

---

In [6]:

createAndSurveyServerName="UnityCatalogServer:CreateAndSurveyGovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(createAndSurveyServerName)

mermaid_graph = generate_process_graph(process_guid)
render_mermaid(mermaid_graph)


In [7]:

requestParameters = {
    "hostURL" : hostURL,
    "portNumber" : portNumber,
    "serverName" : serverName,  
    "versionIdentifier" : versionIdentifier,
    "description" : description,
    "serverUserId" : serverUserId
}

egeria_tech.initiate_gov_action_process(createAndSurveyServerName, None, None, None, requestParameters, None, None)


'344712e4-3037-4e0a-9336-f41768dfde3c'

----

Open up the survey file and review the contents of the Unity Catalog Server. Notice there can be multiple catalogs in a Unity Catalog Server.  Also notice the hierarchical naming of the unity catalog elements.  Catalogs have schemas inside them and the schemas can have tables, functions and/or volumes within them.

----

Use the command `hey_egeria_ops show engines activity --compressed` to view the governance actions that ran as a result of the survey requests.  
There were two steps.  First it created a `SoftwareServer` element to represent the Unity Catalog Server. 
This stores the network address of the server.  Then the survey was run using this element.

-----

Now navigate to `distribution-hub/logs/openlineage/GovernanceActions`.  This directory stores the open lineage events created when the surveys were run.  Each event record the start or stop of a governance action.

----

If the surveys look interesting, it is possible to synchronize the metadata between Unity Catalog and Egeria.  Run the command `hey_egeria_ops show integrations status` in a separate command window to start the monitor for the integration daemon.  You can see a list of connectors waiting to synchronize data with different types of technology.  At the bottom of this list are two connectors dedicated to synchronizing metadata between Egeria and Unity Catalog:

* **UnityCatalogServerSynchronizer** synchronizes catalog information from a Unity Catalog Server.  It passes details of the catalogs it finds onto **UnityCatalogInsideCatalogSynchronizer**.
* **UnityCatalogInsideCatalogSynchronizer** synchronizes the schema, volume, table and function metadata between Egeria and Unity Catalog.  

The code below will request that the contents of the first Unity Catalog server is catalogued into Egeria.

---

In [8]:

createAndCatalogServerName="UnityCatalogServer:CreateAndCatalogGovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(createAndCatalogServerName)

mermaid_graph = generate_process_graph(process_guid)
render_mermaid(mermaid_graph)


In [9]:

requestParameters = {
    "hostURL" : hostURL,
    "portNumber" : portNumber,
    "serverName" : serverName,  
    "versionIdentifier" : versionIdentifier,
    "description" : description,
    "serverUserId" : serverUserId
}

egeria_tech.initiate_gov_action_process(createAndCatalogServerName, None, None, None, requestParameters, None, None)


'72840e98-5595-47cd-8e45-d0ec17209e4a'

----

Switch back to the integration daemon monitor and you will see that there are now catalog targets for the server with UnityCatalogServerSynchronizer and for each Unity Catalog catlogs with UnityCatalogInsideCatalogSynchronizer.

----

You can uses the following commands to show the elements from Unity Catalog in Egeria:

* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Server'` for the Unity Catalog Servers.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Catalog'` for the Unity Catalog Catalogs.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Schema'` for the Unity Catalog Schemas.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Table'` for the Unity Catalog Tables.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Volume'` for the Unity Catalog Volumes.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Function'` for the Unity Catalog Functions.

You can also use `hey_egeria_cat show assets inventory` to search for assets that include the word `inventory` in it.

----

It is also possible to use python functions to retrieve information about the Unity Catalog resources.  This uses the `egeria_cat` client.

----

In [10]:

egeria_cat = EgeriaCat(view_server, url, user_id, user_pwd)
egeria_cat.set_bearer_token(token)


----

The command below retrieves assets that match the search query.  If you see the message "no assets found" then re-run the request as the synchronization process may not have been comopleted.

----

In [11]:

search_results = egeria_cat.find_assets_in_domain("numbers")

print_search_results(search_results)
       


Asset: 
 > DeployedDatabaseSchema [75a8ef4c-8ea0-43fc-a58f-1775364219d6]
 > class: AssetProperties
 > typeName: DeployedDatabaseSchema
 > qualifiedName: Unity Catalog Schema:http://localhost:8080:unity.default
 > displayName: default
 > displayDescription: Default schema
 > name: default
 > resourceName: unity.default
 > resourceDescription: Default schema
 > deployedImplementationType: Unity Catalog Schema
Matching Element:
   > VirtualRelationalTable [c93b6eb2-b477-4d7c-b7bd-942d7e1724aa]
   > qualifiedName: Unity Catalog Table:http://localhost:8080:unity.default.numbers
   > name: numbers
   > description: External table
   > resourceName: unity.default.numbers
   > deployedImplementationType: Unity Catalog Table


----
This shows that metadata can be copied from Unity Catalog into Egeria's metadata repository and represented using Open Metadata Types.  The next part of the demonstration show metadata flowing from Egeria to Unity Catalog.


----

# Provisioning Unity Catalog (UC) from Egeria

Egeria has the ability to provision resources into Unity Catalog.  The desired resources are described in Egeria using Open Metadata Elements and linked to the representation of the server, catalog or schema where the new resource is to be created.  Once the description is in place, the appropriate Unity Catalog Integration Connector will create a matching definition in Unity Catalog.

The sections below go through creating a catalog, then a schema within that catalog and then a volume within that schema.  Notice that the ordering is important.  The catalog must be created before it schemas etc.

-----

----

## Provision a new catalog into a Unity Catalog (UC) Server

There is only one step to provision a new catalog into a Unity Catalog Server:

-----

In [12]:

provisionCatalogName="Provision:UnityCatalogCatalog:GovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(provisionCatalogName)

mermaid_graph = generate_process_graph(process_guid)
render_mermaid(mermaid_graph)


----

Fill in details about the new catalog that you want to create.

----

In [13]:

catalogName = "new_catalog"
catalogDescription = "My new catalog."


----

Now run the governance process to create the catalog.

----

In [14]:

requestParameters = {
    "ucServerQualifiedName" : serverQualifiedName,
    "serverNetworkAddress" : serverNetworkAddress,
    "ucCatalogName" : catalogName,
    "versionIdentifier" : versionIdentifier,
    "description" : catalogDescription
}

egeria_tech.initiate_gov_action_process(provisionCatalogName, None, None, None, requestParameters, None, None)


'103c0ed0-af5a-450e-9bac-3b4bf99c1466'

------

The code below shows the catalog you now have defined.

------

In [15]:

catalog_qualified_name="Unity Catalog Catalog:" + serverNetworkAddress + ":" + catalogName

element=egeria_tech.get_element_by_unique_name(catalog_qualified_name)

print_element("", element)



Catalog [8ccf3201-c8cc-48a6-b53e-c84c7f4b24a2]
 > qualifiedName: Unity Catalog Catalog:http://localhost:8080:new_catalog
 > name: new_catalog
 > description: My new catalog.
 > capabilityVersion: v0.1.0-SNAPSHOT
 > deployedImplementationType: Unity Catalog Catalog


----

The next command lists the relationships to other elements that this catalog has:

----

In [16]:

catalog_guid=egeria_tech.get_element_guid_by_unique_name(catalog_qualified_name)
print(catalog_guid)

related_elements = egeria_tech.get_related_elements(catalog_guid)

print_related_elements("", related_elements)


8ccf3201-c8cc-48a6-b53e-c84c7f4b24a2

SupportedSoftwareCapability [f2de94f4-1b3f-4abb-bedd-0fa96dd07b5a]
 > operationalStatus: Enabled

   SoftwareServer [9a748d9d-4a3a-4d58-8c03-6fdfeeda4f55]
    > versionIdentifier: v0.1.0-SNAPSHOT
    > qualifiedName: Unity Catalog Server:Unity Catalog 1
    > name: Unity Catalog 1
    > description: Local instance of the Unity Catalog (UC) Server.
    > resourceName: Unity Catalog 1
    > deployedImplementationType: Unity Catalog Server

SourcedFrom [4f402baf-2772-4cba-afe5-8e45f3fc3910]
 > sourceVersionNumber: 1728633401237

   Catalog [5ee006aa-a6d6-411b-9b8d-5f720c079cae]
    > qualifiedName: Unity Catalog Catalog:{{serverNetworkAddress}}:{{ucCatalogName}}
    > name: {{ucCatalogName}}
    > description: {{description}}
    > capabilityVersion: {{versionIdentifier}}
    > deployedImplementationType: Unity Catalog Catalog


----

## Provision a new schema into a Unity Catalog (UC) Catalog

This is the process to provision a new schema into a Unity Catalog Catalog:

-----

In [17]:

provisionSchemaName="Provision:UnityCatalogSchema:GovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(provisionSchemaName)

mermaid_graph = generate_process_graph(process_guid)
render_mermaid(mermaid_graph)


----

Fill in details about the new schema that you want to create.

----

In [18]:

schemaName = "new_schema"
schemaDescription = "My new schema."


----

Now run the governance process to create the schema.

----

In [19]:

requestParameters = {
    "serverNetworkAddress" : serverNetworkAddress,
    "ucCatalogName" : catalogName,
    "ucSchemaName" : schemaName,
    "versionIdentifier" : versionIdentifier,
    "description" : schemaDescription
}

egeria_tech.initiate_gov_action_process(provisionSchemaName, None, None, None, requestParameters, None, None)


'54299bd4-fbdb-461c-98f5-d8a5af55afed'

------

The code below shows the schema you now have defined.

------

In [20]:

schema_qualified_name="Unity Catalog Schema:" + serverNetworkAddress + ":" + catalogName + "." + schemaName

element=egeria_tech.get_element_by_unique_name(schema_qualified_name)

print_element("", element)



DeployedDatabaseSchema [e5bcb9b3-b2ff-4c5e-8507-81033a388472]
 > versionIdentifier: v0.1.0-SNAPSHOT
 > qualifiedName: Unity Catalog Schema:http://localhost:8080:new_catalog.new_schema
 > name: new_schema
 > description: My new schema.
 > resourceName: new_catalog.new_schema
 > deployedImplementationType: Unity Catalog Schema


----

And the elements related to it ...

____

In [21]:

schema_guid=egeria_tech.get_element_guid_by_unique_name(schema_qualified_name)
print(schema_guid)

related_elements = egeria_tech.get_related_elements(schema_guid)

print_related_elements("", related_elements)


e5bcb9b3-b2ff-4c5e-8507-81033a388472

ServerAssetUse [c709d5b4-e926-41bc-a40a-2988a57943da]
 > useType: Owns

   Catalog [8ccf3201-c8cc-48a6-b53e-c84c7f4b24a2]
    > qualifiedName: Unity Catalog Catalog:http://localhost:8080:new_catalog
    > name: new_catalog
    > description: My new catalog.
    > capabilityVersion: v0.1.0-SNAPSHOT
    > deployedImplementationType: Unity Catalog Catalog

SourcedFrom [5494d673-e5bd-4e3a-becc-dac263bcaf55]
 > sourceVersionNumber: 1728633401237

   DeployedDatabaseSchema [5bf92b0f-3970-41ea-b0a3-aacfbf6fd92e]
    > versionIdentifier: {{versionIdentifier}}
    > qualifiedName: Unity Catalog Schema:{{serverNetworkAddress}}:{{ucCatalogName}}.{{ucSchemaName}}
    > name: {{ucSchemaName}}
    > description: {{description}}
    > resourceName: {{ucCatalogName}}.{{ucSchemaName}}
    > deployedImplementationType: Unity Catalog Schema


----

## Provision a new volume into a Unity Catalog (UC) Schema

This is the process to provision a new volume into a Unity Catalog Schema:

-----

In [22]:

provisionVolumeName="Provision:UnityCatalogVolume:GovernanceActionProcess"

process_guid = egeria_tech.get_element_guid_by_unique_name(provisionVolumeName)

mermaid_graph = generate_process_graph(process_guid)
render_mermaid(mermaid_graph)


----

Fill in details about the new volume that you want to create.

----

In [23]:

volumeName = "new_volume"
volumeDescription = "My new volume."
storageLocation = "data/new_volume"
volumeType = "EXTERNAL"


----

Now run the governance process to create the volume.

----

In [24]:

requestParameters = {
    "serverNetworkAddress" : serverNetworkAddress,
    "ucCatalogName" : catalogName,
    "ucSchemaName" : schemaName,
    "ucVolumeName" : volumeName,
    "versionIdentifier" : versionIdentifier,
    "ucStorageLocation" : storageLocation,
    "description" : volumeDescription,
    "ucVolumeType" : volumeType
}

egeria_tech.initiate_gov_action_process(provisionVolumeName, None, None, None, requestParameters, None, None)


'f17e21ce-c250-480e-8f7f-85096d3d87da'

------

The code below shows the volume you now have defined.

------

In [25]:

volume_qualified_name="Unity Catalog Volume:" + serverNetworkAddress + ":" + catalogName + "." + schemaName + "." + volumeName

element=egeria_tech.get_element_by_unique_name(volume_qualified_name)

print_element("", element)



DataFolder [8434e9f1-b071-4bbc-a0f4-11f1905ed1f5]
 > pathName: data/new_volume
 > versionIdentifier: v0.1.0-SNAPSHOT
 > qualifiedName: Unity Catalog Volume:http://localhost:8080:new_catalog.new_schema.new_volume
 > name: new_volume
 > description: My new volume.
 > resourceName: new_catalog.new_schema.new_volume
 > deployedImplementationType: Unity Catalog Volume


----

And the elements related to this schema ...

----

In [26]:

volume_guid=egeria_tech.get_element_guid_by_unique_name(volume_qualified_name)
print(volume_guid)

related_elements = egeria_tech.get_related_elements(volume_guid)

print_related_elements("", related_elements)


8434e9f1-b071-4bbc-a0f4-11f1905ed1f5

ReferenceableFacet [a117c401-e38f-497e-a778-20a011d7cc37]
 > source: Unity Catalog (UC)

   PropertyFacet [ec6b158c-cb08-4990-8e00-057d56d0442d]
    > schemaVersion: 1.0
    > qualifiedName: Unity Catalog Volume:http://localhost:8080:new_catalog.new_schema.new_volume_propertyFacetFrom_Unity Catalog (UC)@1.0
    > description: vendorProperties
    > properties: {ucStorageLocation=data/new_volume, ucVolumeType=EXTERNAL}

DataContentForDataSet [9cf81002-09f3-4c44-b2e0-e9aeffa2ac27]

   DeployedDatabaseSchema [e5bcb9b3-b2ff-4c5e-8507-81033a388472]
    > versionIdentifier: v0.1.0-SNAPSHOT
    > qualifiedName: Unity Catalog Schema:http://localhost:8080:new_catalog.new_schema
    > name: new_schema
    > description: My new schema.
    > resourceName: new_catalog.new_schema
    > deployedImplementationType: Unity Catalog Schema

SourcedFrom [521fbe9f-c8df-46ff-b9c4-a63582064d0d]
 > sourceVersionNumber: 1728633401237

   DataFolder [92d2d2dc-0798-41f0-9512

----

# Reviewing the integration

As Egeria is exchanging messages with Unity Catalog, it is building a map of the identifiers from Unity Catalog and mapping them to the elements that have been created in the open metadata ecosystem.

The functions below retrieve the mappings for each catalog within Unity Catalog (UC) serevers known to Egeria.

-----

In [27]:

unity_catalog_catalogs = egeria_tech.get_technology_type_elements("Unity Catalog Catalog")

if unity_catalog_catalogs:
    for catalog in unity_catalog_catalogs:
        print()
        print("----------------------------")
        print_external_id_map(catalog)
        


----------------------------

Catalog [8ccf3201-c8cc-48a6-b53e-c84c7f4b24a2]
 > class: ReferenceableProperties
 > typeName: Catalog
 > name: new_catalog
 > description: My new catalog.
 > capabilityVersion: v0.1.0-SNAPSHOT
 > deployedImplementationType: Unity Catalog Catalog
 > qualifiedName: Unity Catalog Catalog:http://localhost:8080:new_catalog

 > Permitted Synchronization: ToThirdParty

      ExternalIdLink [8ca0cf26-371b-458f-a652-a0887360ec23]
       > lastSynchronized: 1728916791937
       > mappingProperties: {ucCatalogName=new_catalog, ucSchemaName=new_schema, serverNetworkAddress=http://localhost:8080, schemaName=new_schema}
       > source: Unity Catalog Catalog

         DeployedDatabaseSchema [e5bcb9b3-b2ff-4c5e-8507-81033a388472]
          > versionIdentifier: v0.1.0-SNAPSHOT
          > qualifiedName: Unity Catalog Schema:http://localhost:8080:new_catalog.new_schema
          > name: new_schema
          > description: My new schema.
          > resourceName: new_catal