![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/master/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria Hands-On Lab
# Welcome to the Automated Curation Lab

**NOTE - this lab is still under construction and should not be used**

## Introduction

Egeria is an open source project that provides open standards and implementation libraries to connect tools, catalogs and platforms together so they can share information about data and technology (called metadata).

In the [Building a Data Catalog](building-a-data-catalog.ipynb) lab, Peter Profile and Erin Overivew
manually catalogued the weekly measurement files for the Drop Foot clinical trial.

In this hands-on lab you will get a chance to work with Egeria's governance servers to
automate this onboarding process.

## The scenario

[Coco Pharmaceuticals](https://opengovernance.odpi.org/coco-pharmaceuticals/)
is conducting a clinical trial with two hospitals: Oak Dean Hospital and Old Market Hospital.
Each week the two hospitals send Coco Pharmaceuticals a set of measurements from the patients
involved in the trial.  These measurements are located in a CSV file that the hospital sends through
secure file transfer to a folder in Coco Pharmaceutical's landing area.

These files need to be copied into the data lake and catalogued so that they are only visible to the
staff involved in the clinical trial.  It is also important that the lineage of these files is
maintained so the source of the data can be traced.  This process is shown in Figure 1.

![Scenario](../images/automated-curation-scenario.png)
> **Figure 1:** Clinical trial weekly measurements onboarding process

Peter Profile and Erin Overview are responsible for this onboarding process.
![Peter and Erin](../images/peter-and-erin.png)

They have defined a list of requirements for the process:

* Files must be in the landing area for a minimum amount of time.
* As a new file is received, it needs to be catalogued, including:
   * Description
   * Connection details to enable the data scientists to access the contents
   * Column details
   * Governance zones defining the files' visibility
   * Owner 
* A file is not accessible by any of the data lake users until the cataloguing process is complete.
* They must record lineage of each measurements file so they know which hospital it came from.

They have been [manually cataloguing the measurements files](building-a-data-catalog.ipynb) for
the first few weeks to prove the approach but now it is time to automate the process since:
* This clinical trial is planned to run for two years.
* There is a expected to be a ramp up of other clinical trials running simultaneously and the
  file onboarding workload would soon become overwhelming if they continued with the manual approach.

They plan to use an
[Integration Daemon](https://egeria.odpi.org/open-metadata-implementation/admin-services/docs/concepts/integration-daemon.html)
called **exchangeDL01** to capture the technical metadata of the files.
Then the 
[Engine Host](https://egeria.odpi.org/open-metadata-implementation/admin-services/docs/concepts/engine-host.html)
server called **governDL01** will manage the move of the file into the data lake,
the augmentation of the metadata properties of the files and the creation of the lineage.

## Setting up

Coco Pharmaceuticals make widespread use of Egeria for tracking and managing their data and related assets.
Figure 2 below shows their servers and the Open Metadata and Governance (OMAG) Server Platforms that are hosting them.

![Figure 2](../images/coco-pharmaceuticals-systems-omag-server-platforms.png)
> **Figure 2:** Coco Pharmaceuticals' OMAG Server Platforms

The code below checks that the platforms are running.  It checks that the servers are configured and then if they are running on the platform.  If a server is configured, but not running, it will start it.

Look for the "Done." message that is displayed after the governance servers have started.

In [None]:
# Start up the metadata servers and the view server
%run ../common/environment-check.ipynb

print("Start up the Governance Servers")
activatePlatform(dataLakePlatformName, dataLakePlatformURL, [governDL01Name, exchangeDL01Name])

print("Done. ")

----
You should see that both the metadata servers `cocoMDS1` and `cocoMDS2` along with the integration daemon `exchangeDL01` and the engine host server `governDL01` have started.

If any of the platforms are not running, follow [this link to set up and run the platform](https://egeria.odpi.org/open-metadata-resources/open-metadata-labs/).  If any server is reporting that it is not configured then
run the steps in the [Server Configuration](../egeria-server-config.ipynb) lab to configure
the servers.  Then re-run the previous step to ensure all of the servers are started.

## Review the status of the integration daemon

At this point, even though both `exchangeDL01` and `governDL01` are running, there is still work to set up the full data pipeline.  The command below queries the status of `exchangeDL01`.

In [None]:
getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)

----
Notice that the Files Integrator Open Metadata Integration Service (OMIS) is running three connectors and they are all failing because the directories (folders) that they are supposed to be monitoring do not exist.  This is because no data files
have arrived from either hospital.  As the data pipeline is set up in this lab, the directories will get created and we will be able to restart the connectors to get them working.

In [None]:

OakDeneConnectorFolder   = fileSystemRoot + '/landing-area/hospitals/oak-dene/clinical-trials/drop-foot'
OldMarketConnectorFolder = fileSystemRoot + '/landing-area/hospitals/old-market/clinical-trials/drop-foot'
folderConnectorFolder    = fileSystemRoot + '/data-lake/research/clinical-trials/drop-foot/weekly-measurements'


If you know the class name of an integration connector's provider, it is possible to check if the connector is of the right type for an integration service.  This function also returns full details of the connector type, which often includes descriptive information as well as the configuration properties that it supports.

The connectors configured in the Files Integrator OMIS are shown in figure 3:

![Figure 3](../images/integration-daemon.png)
> **Figure 3:** exchangeDL01 with its connectors

The class names of these integration connectors' providers can be seen in the connection object embedded in the error message displayed with the connectors' status.

The commands below request that the Files Integrator OMIS service validate and return the connector type for each of these connectors.

In [None]:
dataFilesMonitorProviderClassName = "org.odpi.openmetadata.adapters.connectors.integration.basicfiles.DataFilesMonitorIntegrationProvider"
dataFolderMonitorProviderClassName = "org.odpi.openmetadata.adapters.connectors.integration.basicfiles.DataFolderMonitorIntegrationProvider"

print("Data Files Monitor Integration Connector Type:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, dataFilesMonitorProviderClassName)

print("")
print("Data Folder Monitor Integration Connector Type:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, dataFolderMonitorProviderClassName)

----
Both connectors supports the `templateQualifiedName` and the `allowCatalogDelete` configuration properties.  If you are curious
about their meaning, review the definitions in the connector catalog:

* [Data Files Monitor Integration Connector](https://egeria.odpi.org/open-metadata-publication/website/connector-catalog/data-files-monitor-integration-connector.html)
* [Data Folder Monitor Integration Connector](https://egeria.odpi.org/open-metadata-publication/website/connector-catalog/data-folder-monitor-integration-connector.html)

Later in the lab, we will be setting up the `templateQualifiedName`.

----
The command below tries to validate a connector that does not exist.  You can see that the request fails with
a class not found exception.

In [None]:
invalidProviderClassName = "org.odpi.openmetadata.adapters.connectors.integration.basicfiles.DummyProvider"

print("Invalid Integration Connector:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, invalidProviderClassName)

----
The next command tries to validate the data files connector with the Database Integrator OMIS.  This
request also fails since this connector is not compatible with the Database Integrator OMIS.

In [None]:
print("Data Files Monitor Integration Connector:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "database-integrator", petersUserId, dataFilesMonitorProviderClassName)


----
## Setting up the file transfer into the landing area

The directories (folders) that the integration connector are configured to monitor could be created by using a file system command.
In this lab, however, the creation of a landing area folder will occur when the first file is received from the corresponding hospital.
This is to avoid needing to provide this notebook with access to the file system.

We are going to use a provisioning governance action service called `Move/Copy File Governance Action Service` to simulate the
file transfer from a hospital to its folder in the landing zone.  This service
runs in a governance engine that is supported by the Governance Action Open Metadata Engine Service (OMES).  As with the integration services,
it is possible to validate and understand the connector through a call to the appropriate server.


In [None]:
moveCopyFileProviderClassName = "org.odpi.openmetadata.adapters.connectors.governanceactions.provisioning.MoveCopyFileGovernanceActionProvider"

print("Move/Copy File Governance Action Service:")
validateGovernanceActionEngineConnector(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, moveCopyFileProviderClassName)


## Configuring the governance engine

Figure 4 shows the three governance engines configured in the `governDL01` engine host server.

![Figure 4](../images/engine-host.png)
> **Figure 4:** Governance Engines for governDL01

The command below queries the status of each governance engine running in `governDL01`.
The governance action services that will support the onboarding of files for clinical trials will run in the `AssetGovernance`
governance engine.  The other two governance engines are the subject of the [Open Discovery Lab](../administration-labs/open-discovery-config.ipynb).

In [None]:

getGovernanceEngineStatuses(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId)


----
The status code `ASSIGNED` means that the governance engine was listed in Engine Host's configuration
document - ie the governance engine was assigned to this server - but Engine Host has not been
able to retrieve the configuration for the governance engine from the metadata server (`cocoMDS1`).

When the basic governance engine properties have been retrieved from the metadata server then the status code
becomes `CONFIGURING` and more decriptive information is returned with the status.

When governance services are registered with the governance engine, the status moves to `RUNNING` and it is possible to see the list of supported request types for the governance engine.

The next step in the lab is to add configuration for the governance engine to `cocoMDS2` until the
`AssetGovernance` governance engine is running.

In [None]:
assetGovernanceEngineName = "AssetGovernance"
assetGovernanceEngineDisplayName = "Asset Governance Action Engine"
assetGovernanceEngineDescription = "Monitors, validates and enriches metadata relating to assets."

assetGovernanceEngineGUID = createGovernanceEngine(cocoMDS2Name,
                                                   cocoMDS2PlatformName,
                                                   cocoMDS2PlatformURL,
                                                   erinsUserId,
                                                   "GovernanceActionEngine",
                                                   assetGovernanceEngineName,
                                                   assetGovernanceEngineDisplayName,
                                                   assetGovernanceEngineDescription)

if (assetGovernanceEngineGUID):
    print (" ")
    print ("The guid for the " + assetGovernanceEngineName + " governance engine is: " + assetGovernanceEngineGUID)
    print (" ")

print ("Done. ")    

----

Now the governance engine is defined, its status moves to `CONFIGURING`.

----

In [None]:

getGovernanceEngineStatuses(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId)


----

Next, the description of the `Move/Copy File Governance Action Service` is added to cocoMDS2.  There are two parts to
registering a governance action service.  The first is to create a GovernanceService definition that identifies
the implementation of the service and the second part registers this GovernanceService definition with the governance engine.
This registration maps one or more of the governance engine's request types, along with the default request parameters to
the GovernanceService definition.  This mapping is used to translate a request to the governance engine into an
invocation of a governance action service.

Figure 5 shows the structure of the resulting definitions for a governance engine.  A governance action service may be
registered with multiple governance engines, using the same or different request types.

![Figure 5](../images/governance-action-request-type.png)
> **Figure 5:** Structure of the governance services within a governance engine

Since a governance action service is implemented as a connector, part of the GovernanceService definition includes the
connection object used to initialize the service.
`Move/Copy File Governance Action Service` is highly
configurable through the configuration properties supplied in its connection object and so we can use it in two modes.

The first instance will simulate the secure file transfer from the hospital to the
landing area. In this case, it should not make use of any metadata and not create lineage.

In [None]:

ftpGovernanceServiceName = "ftp-governance-action-service"
ftpGovernanceServiceDisplayName = "FTP Governance Action Service"
ftpGovernanceServiceDescription = "Simulates FTP from an external party."
ftpGovernanceServiceProviderClassName = moveCopyFileProviderClassName
ftpGovernanceServiceConfigurationProperties = {
        "provisionUncataloguedFiles" : "",
        "noLineage" : ""
    }
ftpGovernanceServiceRequestType = "ftp-to-landing-area"


The second instance is part of Coco Pharmaceuticals onboarding process and will be driven from the appearance of the
Asset created by the integration daemon when a file arrives in the landing area.
This instance will also produce lineage and change the resulting filename so that the files are sequenced according to their
arrival.  For example:
 * DropFoot_000001.csv
 * DropFoot_000002.csv

This aids the time-based loading of the files into a database by ensuring any corrections to the readings are applied in the
correct order.

In [None]:

dlGovernanceServiceName = "provision-weekly-measurements-governance-action-service"
dlGovernanceServiceDisplayName = "FTP Governance Action Service"
dlGovernanceServiceDescription = "Provisions weekly measurment files from the landing area to the "
dlGovernanceServiceProviderClassName = moveCopyFileProviderClassName
dlGovernanceServiceConfigurationProperties = {
        "targetFileNamePattern" : "DropFoot_{1, number,000000}.csv"
    }
dlGovernanceServiceRequestType = "provision-to-data-lake"



This next code issues the calls to create the two versions of the governance action service and add them to the AssetGovernance
governance engine


In [None]:


governanceServiceGUID = createGovernanceService(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                "GovernanceActionService",
                                                ftpGovernanceServiceName,
                                                ftpGovernanceServiceDisplayName,
                                                ftpGovernanceServiceDescription,
                                                ftpGovernanceServiceProviderClassName,
                                                ftpGovernanceServiceConfigurationProperties)

if (governanceServiceGUID):
    print (" ")
    print ("The guid for the " + governanceServiceName + " governance service is: " + governanceServiceGUID)
    registerGovernanceServiceWithEngine(cocoMDS1Name,
                                        cocoMDS1PlatformName,
                                        cocoMDS1PlatformURL,
                                        petersUserId,
                                        assetGovernanceEngineGUID,
                                        governanceServiceGUID,
                                        ftpGovernanceServiceRequestType)
    print ("Service registered as: " + ftpGovernanceServiceRequestType)
    print (" ")
    

governanceServiceGUID = createGovernanceService(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                "GovernanceActionService",
                                                dlGovernanceServiceName,
                                                dlGovernanceServiceDisplayName,
                                                dlGovernanceServiceDescription,
                                                dlGovernanceServiceProviderClassName,
                                                dlGovernanceServiceConfigurationProperties)

if (governanceServiceGUID):
    print (" ")
    print ("The guid for the " + governanceServiceName + " governance service is: " + governanceServiceGUID)
    registerGovernanceServiceWithEngine(cocoMDS1Name,
                                        cocoMDS1PlatformName,
                                        cocoMDS1PlatformURL,
                                        petersUserId,
                                        assetGovernanceEngineGUID,
                                        governanceServiceGUID,
                                        dlGovernanceServiceRequestType)
    print ("Service registered as: " + dlGovernanceServiceRequestType)
    print (" ")   
  

Next we need a governance action service to listen for the 
    

In [None]:
def initiateGovernanceAction(serverName, serverPlatformName, serverPlatformURL, userId, governanceEngineName, qualifiedName, requestType, requestProperties):
    commandURLRoot = serverPlatformURL + "/servers/" + serverName + "/open-metadata/access-services/governance-engine/users/" + userId        
    initiateGovernanceActionURL = configCommandURLRoot + '/governance-engines/' + governanceEngineName + '/governance-actions/initiate'
    governanceActionGUID = None
    try:
        body = {
            "class" : "GovernanceActionRequestBody ",
            "qualifiedName" : qualifiedName,
            "requestType" : requestType,
            "requestProperties" : requestProperties
        }
        response=issuePost(initiateGovernanceActionURL, body)
        if response.status_code == 200:
            relatedHTTPCode = response.json().get('relatedHTTPCode')
            if relatedHTTPCode == 200:
                governanceActionGUID = response.json().get('guid')
            else:
                printUnexpectedResponse(serverName, serverPlatformName, serverPlatformURL, response)
        else:
            printUnexpectedResponse(serverName, serverPlatformName, serverPlatformURL, response)
    except Exception as error:
        print("Exception: %s" % error )
        print("Platform " + serverPlatformURL + " is returning an error")
    return governanceActionGUID
     
def getGovernanceAction(serverName, serverPlatformName, serverPlatformURL, userId, governanceActionGUID):
    commandURLRoot = serverPlatformURL + "/servers/" + serverName + "/open-metadata/access-services/governance-engine/users/" + userId        
    getGovernanceActionURL = configCommandURLRoot + '/governance-actions/' + governanceActionGUID
    try:
        response=issueGet(getGovernanceActionURL)
        if response.status_code == 200:
            relatedHTTPCode = response.json().get('relatedHTTPCode')
            if relatedHTTPCode == 200:
                element = response.json().get('element')
                if element:
                    print(element)
            else:
                printUnexpectedResponse(serverName, serverPlatformName, serverPlatformURL, response)
        else:
            printUnexpectedResponse(serverName, serverPlatformName, serverPlatformURL, response)
    except Exception as error:
        print("Exception: %s" % error )
        print("Platform " + serverPlatformURL + " is returning an error")


In [None]:
ftpRequestType = "copy-file"

if governanceServiceGUID:
    registerGovernanceServiceWithEngine(cocoMDS2Name,
                                        cocoMDS2PlatformName,
                                        cocoMDS2PlatformURL,
                                        erinsUserId,
                                        assetGovernanceEngineGUID,
                                        governanceServiceGUID,
                                        ftpRequestType)
    print (" ")
    print ("Service registered as: " + ftpRequestType)
    print (" ")
    
print ("Done. ")

----
**Stop here**

Plan of the lab

* Review the status of the integration daemon and engine host - show that they are both in need to work
* Create the folders for the integration daemon connectors and restart them - validate that they are working
* Set up the FTP provisioning in the engine host
* Add the first file into Oak Dene's landing area
* Validate that the file is created in open metadata by the integration connector
* Set up the provisioning ga service and test it by creating a governance action to drive it.
* Validate that the file is transferred and the new asset is catalogued and the folder is updated
* View the lineage
* Set up the provisioning service in a simple governance action process and test it
* Set up the watch dog service and check it drives the governance action process
* Set up the origin and template - then configure the landing area integration connector to use a template - show the template affecting the set up of the landing area asset
* Extend the governance action process to use the lineage to set up the origin. Show the origin on the new file asset.
* Explain the use of triage if the origin can not be set.
* Set up a template the owner and reconfigure the data lake folder
* Extend the governance action to create a verification point and set the zone - with a triage alternate.
----


In [None]:
# Test the status of all of the connectors

csvDiscoveryProviderClassName = "org.odpi.openmetadata.adapters.connectors.discoveryservices.CSVDiscoveryServiceProvider"

print("CSV Discovery Service:")
validateAssetAnalysisEngineConnector(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, csvDiscoveryProviderClassName)




----


---

----

----

----


----


----

----