<!-- SPDX-License-Identifier: CC-BY-4.0 -->
<!-- Copyright Contributors to the ODPi Egeria project 2024. -->

![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/main/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria Workbook

# Cataloguing Unity Catalog (UC)

## Introduction

Both Unity Catalog and Egeria are open source projects with the LF AI and Data.  The difference between these technologies is:

 * Unity Catalog is responsible for governing access to data; whereas Egeria governs the exchange of metadata between tools and systems, such as Unity Catalog.

 * Similarly, Unity Catalog maintains a metadata repository describing the data it is protecting.  In contrast, Egeria maintains a distributed network of metadata repositories containing metadata about the technology (systems, tools, data), the processes that are operating on them, along with the people and organizations involved.

Run the code below to create a client to the Egeria severs.

---


In [None]:
import os
view_server = os.environ.get("VIEW_SERVER","view-server")
url = os.environ.get("EGERIA_VIEW_SERVER_URL","https://localhost:9443")
user_id = os.environ.get("EGERIA_USER", "peterprofile")
user_pwd = os.environ.get("EGERIA_USER_PASSWORD")

from pyegeria import EgeriaTech
import asyncio
import nest_asyncio
nest_asyncio.apply()

In [None]:

egeria_tech = EgeriaTech(view_server, url, user_id, user_pwd)
token = egeria_tech.create_egeria_bearer_token()


---

## Loading support for Unity Catalog

The definition of the Unity Catalog connectors, templates and associated reference data are loaded via a [Content Pack](https://egeria-project.org/content-packs/) called `UnityCatalogContentPack.omarchive`.  The content pack can be loaded multiple times without ill-effect so run the following command to make sure it is loaded.

---

In [None]:

egeria_tech.add_archive_file("content-packs/UnityCatalogContentPack.omarchive", None, "active-metadata-store")

print("Archive loaded!")


----

## Survey a Unity Catalog Server

The Unity Catalog support includes the ability to survey the contents of a Unity Catalog Server.  This command creates a description of the Unity Catalog Server and runs a survey to understand its contents.  A summary of the survey results can be found in /distribution-hub/surveys.

---

In [None]:
createAndSurveyServerName="UnityCatalogServer:CreateAndSurveyGovernanceActionProcess"

requestParameters = {
    "hostURL" : "http://host.docker.internal",
    "portNumber" : "8080",
    "serverName" : "Unity Catalog 1",  
    "versionIdentifier" : "V1.0",
    "description" : "Local instance of the Unity Catalog (UC) Server.",
    "serverUserId" : "uc1"
}

egeria_tech.initiate_gov_action_process(createAndSurveyServerName, None, None, None, requestParameters, None, None)


----

Open up the survey file and review the contents of the Unity Catalog Server. Notice there can be multiple catalogs in a Unity Catalog Server.  Also notice the hierarchical naming of the unity catalog elements.  Catalogs have schemas inside them and the schemas can have tables, functions and/or volumes within them.

----
Use the command `hey_egeria_ops show engines activity --compressed` to view the governance actions that ran as a result of the survey requests.  
There were two steps.  First it created a `SoftwareServer` entity to represent the Unity Catalog Server. 
This stores the network address of the server.  Then the survey was run using this information.

-----

Now navigate to `distribution-hub/logs/openlineage/GovernanceActions`.  This directory stores the open lineage events created when the surveys were run.  Each event record the start or stop of a governance action.

----

If the surveys look interesting, it is possible to synchronize the metadata between Unity Catalog and Egeria.  Run the command `hey_egeria_ops show integrations status` in a separate command window to start the monitor for the integration daemon.  You can see a list of connectors waiting to synchronize data with different types of technology.  At the bottom of this list are two connectors dedicated to synchronizing metadata between Egeria and Unity Catalog:

* **UnityCatalogServerSynchronizer** synchronizes catalog information from a Unity Catalog Server.  It passes details of the catalogs it finds onto **UnityCatalogInsideCatalogSynchronizer**.
* **UnityCatalogInsideCatalogSynchronizer** synchronizes the schema, volume, table and function metadata between Egeria and Unity Catalog.  

The code below will request that the contents of the first Unity Catalog server is catalogued into Egeria.

---

In [None]:

createAndCatalogServerName="UnityCatalogServer:CreateAndCatalogGovernanceActionProcess"

requestParameters = {
    "hostURL" : "http://host.docker.internal",
    "portNumber" : "8080",
    "serverName" : "Unity Catalog 1",  
    "versionIdentifier" : "V1.0",
    "description" : "Local instance of the Unity Catalog (UC) Server.",
    "serverUserId" : "uc1"
}

egeria_tech.initiate_gov_action_process(createAndCatalogServerName, None, None, None, requestParameters, None, None)


----

Switch back to the integration daemon monitor and you will see that there are now catalog targets for the server with UnityCatalogServerSynchronizer and for each Unity Catalog catlogs with UnityCatalogInsideCatalogSynchronizer.

----


You can uses the following commands to show the elements from Unity Catalog in Egeria:

* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Server'` for the Unity Catalog Servers.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Catalog'` for the Unity Catalog Catalogs.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Schema'` for the Unity Catalog Schemas.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Table'` for the Unity Catalog Tables.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Volume'` for the Unity Catalog Volumes.
* `hey_egeria cat show tech-type-elements --tech_type 'Unity Catalog Function'` for the Unity Catalog Functions.

You can also use `hey_egeria_cat show assets inventory` to search for assets that include the word `inventory` in it.

This shows that metadata can be copied from Unity Catalog into Egeria's metadata repository and represented using Open Metadata Types.  The next part of the demonstration show metadata flowing from Egeria to Unity Catalog.



# Provisioning Unity Catalog (UC) from Egeria



In [None]:
token = egeria_client.create_egeria_bearer_token()