![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/master/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria Hands-On Lab
# Welcome to the Automated Curation Lab

**NOTE - this lab is still under construction and should not be used**

## Introduction

Egeria is an open source project that provides open standards and implementation libraries to connect tools, catalogs and platforms together so they can share information about data and technology (called metadata).

In the [Building a Data Catalog](building-a-data-catalog.ipynb) lab, Peter Profile and Erin Overivew
manually catalogued the weekly measurement files for the Drop Foot clinical trial.

In this hands-on lab you will get a chance to work with Egeria's governance servers to
automate this onboarding process.

## The scenario

[Coco Pharmaceuticals](https://opengovernance.odpi.org/coco-pharmaceuticals/)
is conducting a clinical trial with two hospitals: Oak Dene Hospital and Old Market Hospital.
Each week the two hospitals send Coco Pharmaceuticals a set of measurements from the patients
involved in the trial.  These measurements are located in a CSV file that the hospital sends through
secure file transfer to a folder in Coco Pharmaceutical's landing area.

These files need to be copied into the data lake and catalogued so that they are only visible to the
staff involved in the clinical trial.  It is also important that the lineage of these files is
maintained so the source of the data can be traced.  This process is shown in Figure 1.

![Scenario](../images/automated-curation-scenario.png)
> **Figure 1:** Clinical trial weekly measurements onboarding process

Peter Profile and Erin Overview are responsible for this onboarding process.
![Peter and Erin](../images/peter-and-erin.png)

They have defined a list of requirements for the process:

* Files must be in the landing area for a minimum amount of time.
* As a new file is received, it needs to be catalogued, including:
   * Description
   * Connection details to enable the data scientists to access the contents
   * Column details
   * Governance zones defining the files' visibility
   * Owner 
   * Origin
* A file is not accessible by any of the data lake users until the cataloguing process is complete.
* They must record lineage of each measurements file so they know which hospital it came from.

They have been [manually cataloguing the measurements files](building-a-data-catalog.ipynb) for
the first few weeks to prove the approach but now it is time to automate the process since:
* This clinical trial is planned to run for two years.
* There is a expected to be a ramp up of other clinical trials running simultaneously and the
  file onboarding workload would soon become overwhelming if they continued with the manual approach.

They plan to use an
[Integration Daemon](https://egeria.odpi.org/open-metadata-implementation/admin-services/docs/concepts/integration-daemon.html)
called **exchangeDL01** to capture the technical metadata of the files.
Then the 
[Engine Host](https://egeria.odpi.org/open-metadata-implementation/admin-services/docs/concepts/engine-host.html)
server called **governDL01** will manage the move of the file into the data lake,
the augmentation of the metadata properties of the files and the creation of the lineage.

This lab sets up the automated onbourding process using 6 phases as shown in Figure 2:

![Phases](../images/automated-curation-scenario-6.png)
> **Figure 2:** All six phases

1. Bring files in from "outside" using the move/copy file governance action service
2. Create a template used to catalog files in the landing areas
3. Trigger a governance action process when a file appears in a landing folder
4. Move the file into the data lake folder using a governance action process
5. Create a template used to catalog files in the data lake folder
6. Enrich the metadata using governance action services


## Setting up

Coco Pharmaceuticals make widespread use of Egeria for tracking and managing their data and related assets.
Figure 3 below shows their servers and the Open Metadata and Governance (OMAG) Server Platforms that are hosting them.

![Figure 3](../images/coco-pharmaceuticals-systems-omag-server-platforms.png)
> **Figure 3:** Coco Pharmaceuticals' OMAG Server Platforms

The code below checks that the platforms are running.  It checks that the servers are configured and whether they are running on the platform.  If a server is configured, but not running, it will start it.

Look for the second "Done." message that is displayed just after the governance servers have started.  It may take up to a minute to start up all of the servers.  If `cocoMDS2` seems to be very slow starting, check that Apache Kafka is running.  The metadata servers will wait for Kafka during start up if they are connected to a cohort.  When they are waiting in this manner, they periodically output a message on the console.

In [None]:
# Start up the metadata servers and the view server
%run ../common/environment-check.ipynb

print("Start up the Governance Servers")
activatePlatform(dataLakePlatformName, dataLakePlatformURL, [governDL01Name, exchangeDL01Name])

print("Done. ")

----
You should see that both the metadata servers `cocoMDS1` and `cocoMDS2` along with the integration daemon `exchangeDL01` and the engine host server `governDL01` have started.  These are the servers that will be used in this lab and they are connected together
as shown in figure 4.  The `cocoMDS1` metadata server is where the new files are catalogued, whereas the definitions used
to drive the governance processes come from `cocoMDS2`.  

![Figure 4](../images/governance-servers-for-automated-onboarding.png)
> **Figure 4:** Governance servers for automated curation

If any of the platforms are not running, follow [this link to set up and run the platform](https://egeria.odpi.org/open-metadata-resources/open-metadata-labs/).  If any server is reporting that it is not configured then
run the steps in the [Server Configuration](../egeria-server-config.ipynb) lab to configure
the servers.  Then re-run the previous step to ensure all of the servers are started.

----
**Stop here**

Plan of the lab

* Review the status of the integration daemon and engine host - show that they are both in need to work
* Create the folders for the integration daemon connectors and restart them - validate that they are working
* Set up the FTP provisioning in the engine host
* Add the first file into Oak Dene's landing area
* Validate that the file is created in open metadata by the integration connector
* Set up the provisioning ga service and test it by creating a governance action to drive it.
* Validate that the file is transferred and the new asset is catalogued and the folder is updated
* View the lineage
* Set up the provisioning service in a simple governance action process and test it
* Set up the watch dog service and check it drives the governance action process
* Set up the origin and template - then configure the landing area integration connector to use a template - show the template affecting the set up of the landing area asset
* Extend the governance action process to use the lineage to set up the origin. Show the origin on the new file asset.
* Explain the use of triage if the origin can not be set.
* Set up a template the owner and reconfigure the data lake folder
* Extend the governance action to create a verification point and set the zone - with a triage alternate.
----

## Review the status of the integration daemon

At this point, even though both `exchangeDL01` and `governDL01` are running, there is still work to set up the full data onboarding pipeline.  Lets start with the integration daemon that catalogs the files as they appear in a folder.

The command below queries the status of the integration daemon called `exchangeDL01`.

In [None]:

getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)


----
Notice that the Files Integrator Open Metadata Integration Service (OMIS) is running four integation connectors and they are all failing because the directories (folders) that they are supposed to be monitoring do not exist.  As the data onboarding pipeline is set up in this lab, the directories will get created and we will be able to restart the connectors to get them working.

Below are the names of the directories that the four integration connectors are monitoring.  There are two integration connectors monitoring the files in the data lake directory.  One is cataloguing the file and the other is maintaining the last update time for the [DataFolder](https://egeria.odpi.org/open-metadata-publication/website/modelling-technology/) asset that represents the collection of measurements received for the clinical trial.

----

In [None]:

OakDeneLandingDirectory   = fileSystemRoot + '/landing-area/hospitals/oak-dene/clinical-trials/drop-foot'
OldMarketLandingDirectory = fileSystemRoot + '/landing-area/hospitals/old-market/clinical-trials/drop-foot'
dataLakeDirectory         = fileSystemRoot + '/data-lake/research/clinical-trials/drop-foot/weekly-measurements'


----

At this time there are no assets catalogued for either the landing area or the weekly measurements directory.

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing.*")


assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*drop-foot/weekly-measurements.*")


----
## Setting up the file transfer into the landing area

The directories (folders) that the integration connectors are configured to monitor could be created using a file system command.
In this lab, however, the creation of a landing area folder will occur when the first file is received from the corresponding hospital.
This is simply to avoid needing to provide this notebook with access to the file system.

We are going to use a [provisioning governance action service](https://egeria.odpi.org/open-metadata-implementation/frameworks/governance-action-framework/docs/provisioning-governance-service.html) called `Move/Copy File Governance Action Service`
([link to catalog description](https://egeria.odpi.org/open-metadata-publication/website/connector-catalog/move-copy-file-provisioning-governance-action-service.html))
to simulate the file transfer from a hospital to its folder in the landing zone.  This service
runs in a governance engine that is supported by the [Governance Action Open Metadata Engine Service (OMES)](https://egeria.odpi.org/open-metadata-implementation/engine-services/governance-action/).

It is possible to validate and understand the function of the governance action service
through an API call to the Governance Action OMES running an engine host server such as `governDL01`.
This API call is designed to support tools and others services configuring governance action services.

Governance action services are implemented as a connector, which means they are initialized with a
[connection](https://egeria.odpi.org/open-metadata-implementation/frameworks/open-connector-framework/docs/concepts/connection.html) object.
This includes configuration properties that can be used to control its behavior.
`Move/Copy File Governance Action Service` is highly configurable. 

----

In [None]:

moveCopyFileProviderClassName = "org.odpi.openmetadata.adapters.connectors.governanceactions.provisioning.MoveCopyFileGovernanceActionProvider"

print("Move/Copy File Governance Action Service:")
validateGovernanceActionEngineConnector(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, moveCopyFileProviderClassName)


----

To simulate the secure file transfer from the hospital to the
landing area the `Move/Copy File Governance Action Service` will be configured so it expects that the name of the
source file to copy will be supplied when it is called (rather than using the catalog entry for the
source file) and it does not create lineage.

----

In [None]:

ftpGovernanceServiceName = "ftp-governance-action-service"
ftpGovernanceServiceDisplayName = "FTP Governance Action Service"
ftpGovernanceServiceDescription = "Simulates FTP from an external party."
ftpGovernanceServiceProviderClassName = moveCopyFileProviderClassName
ftpGovernanceServiceConfigurationProperties = {
        "provisionUncataloguedFiles" : "",
        "noLineage" : ""
    }
ftpGovernanceServiceRequestType = "copy-file"


----

This governance action service
runs in a governance engine supported by the Governance Action Open Metadata Engine Services (OMES).
Figure 5 shows the three governance engines configured in the `governDL01` engine host server during the
[Configuring Egeria Servers Lab](../egeria-server-config.ipynb).

![Figure 5](../images/engine-host.png)
> **Figure 5:** Governance Engines for governDL01

The command below queries the status of each governance engine running in `governDL01`.
The governance action services that will support the onboarding of files for clinical trials will run in the `AssetGovernance`
governance engine.  The other two governance engines are the subject of the [Improving Data Quality](../administration-labs/open-discovery-config.ipynb) lab.

----

In [None]:

printGovernanceEngineStatuses(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId)


----
The status code `ASSIGNED` means that the governance engine was listed in Engine Host's configuration
document - ie the governance engine was assigned to this server - but Engine Host has not been
able to retrieve the configuration for the governance engine from the governance metadata server (`cocoMDS2`).

The next step in the lab is to add configuration for the governance engine to `cocoMDS2` until the
`AssetGovernance` governance engine is running.  The first step is to add a GovernanceEngine definition
to the metadata server for `AssetGovernance`.  

----

In [None]:
assetGovernanceEngineName = "AssetGovernance"
assetGovernanceEngineDisplayName = "Asset Governance Action Engine"
assetGovernanceEngineDescription = "Monitors, validates and enriches metadata relating to assets."

assetGovernanceEngineGUID = createGovernanceEngine(cocoMDS2Name,
                                                   cocoMDS2PlatformName,
                                                   cocoMDS2PlatformURL,
                                                   erinsUserId,
                                                   "GovernanceActionEngine",
                                                   assetGovernanceEngineName,
                                                   assetGovernanceEngineDisplayName,
                                                   assetGovernanceEngineDescription)

if (assetGovernanceEngineGUID):
    print (" ")
    print ("The guid for the " + assetGovernanceEngineName + " governance engine is: " + assetGovernanceEngineGUID)
    print (" ")

print ("Done. ")    

----

Now the governance engine is defined, its definition is sent to `governDL01` in an event
and the engine's status moves to `CONFIGURING`.
Notice more descriptive information is returned with the status.

----

In [None]:

waitForConfiguringGovernanceEngine(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, assetGovernanceEngineName)

printGovernanceEngineStatuses(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId)


----

Now we add the description of the `Move/Copy File Governance Action Service` to `cocoMDS2`.  There are two parts to
registering a governance action service.  The first is to create a GovernanceService definition that identifies
the implementation of the governance action service and the second part registers this
GovernanceService definition with the GovernanceEngine definition.
This registration maps one or more of the governance engine's request types, along with the default request parameters to
the GovernanceService definition.  This mapping is used to translate a request to the governance engine into an
invocation of a governance action service.

Figure 6 shows the structure of the resulting definitions for a governance engine.  A governance action service may be
registered with multiple governance engines, using the same or different request types.

![Figure 6](../images/governance-action-request-type.png)
> **Figure 6:** Structure of the governance services within a governance engine

This next code issues the calls to create a definition of the GovernanceService and add it to the AssetGovernance
GovernanceEngine definition.

----

In [None]:
governanceServiceGUID = createGovernanceService(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                "GovernanceActionService",
                                                ftpGovernanceServiceName,
                                                ftpGovernanceServiceDisplayName,
                                                ftpGovernanceServiceDescription,
                                                ftpGovernanceServiceProviderClassName,
                                                ftpGovernanceServiceConfigurationProperties)

if (governanceServiceGUID):
    print (" ")
    print ("The guid for the " + governanceServiceName + " governance service is: " + governanceServiceGUID)
    registerGovernanceServiceWithEngine(cocoMDS1Name,
                                        cocoMDS1PlatformName,
                                        cocoMDS1PlatformURL,
                                        petersUserId,
                                        assetGovernanceEngineGUID,
                                        governanceServiceGUID,
                                        ftpGovernanceServiceRequestType)
    print ("Service registered as: " + ftpGovernanceServiceRequestType)
    print (" ")
    

----

When at least one governance service is registered with the governance engine, the status of the governance engine in
`governDL01` moves to `RUNNING` and it is possible to see the list of supported request types for the governance engine. 

----

In [None]:

waitForRunningGovernanceEngine(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, assetGovernanceEngineName)

printGovernanceEngineStatuses(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId)


----

With the governance action service defined in a running governance engine, all that remains is to call it.

Any defined governance action service can be called directly through the Governance Engine OMAS,
which in this lab is running in the `cocoMDS2` metadata server.  This call results in an event being sent to all
engine hosts running the named governance engine.  The first one that claims it gets to run it.  In this lab
there is only one engine host and so the request will run on `governDL01`.

----

In [None]:
OakDeneSourceFolder = 'sample-data/oak-dene-drop-foot-weekly-measurements'

sourceFileName = OakDeneSourceFolder + "/" + "week1.csv"

requestParameters = {
        "source-file" : sourceFileName,
        "destination-folder" : OakDeneLandingDirectory
    }
qualifiedName = "FTP Oak Dene Week 1"

governanceActionGUID = None
governanceActionGUID = initiateGovernanceAction(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                assetGovernanceEngineName,
                                                qualifiedName,
                                                ftpGovernanceServiceRequestType,
                                                requestParameters)


----

This governance action does not take long to run so you should soon see its status as completed.

----

In [None]:

waitForRunningGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)

printGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)
    

----
We now have the ability to provision new files into either landing area as shown in Figure 7.

![Phase 1](../images/automated-curation-scenario-1.png)
> **Figure 7:** Phase 1: Arrival of new files

Restarting the oak dene integration connector in the `exchangeDL01` integration daemon will cause it to
test again to see if the directory it is monitoring exists.

----

In [None]:
OakDeneLandingAreaConnectorName = 'OakDeneLandingAreaFilesMonitor'

restartIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId, "files-integrator", OakDeneLandingAreaConnectorName)

----

If we check the status of the integration connectors again, you can see this connector is now running.

----

In [None]:

getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)


----

The connector will begin cataloguing any files placed in the directory.  We should be able to see the `week1.csv` file in the catalog.

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing.*")


----
The action of the integration connector has been to catalog the new file and the folders above it.
The catalog definition is minimal, consisting of just what can be gleaned from the file system.

It is possible to provide the integration connector with a template metadata element to copy when it
is cataloguing files.   This template can include classifications such as zones, origin, confidentiality and attachments such as
connections, schemas and lineage mappings. (There is more information on templates in the
[Egeria website](https://egeria.odpi.org/open-metadata-publication/website/cataloging-assets/templated-cataloging.html)).

Templates are used to set up values for the assets created by the connector that are always the same.
In this example, all of the files that this connector encounters are from the Oak Dene Hospital so we can use a template to set up the
file's origin and the connection information needed to access the file.

Including this type of detail in the asset means that no-one has to remember that this landing area folder was used by
the Oak Dene Hospital and it simplifies the downstream cataloguing.
Other values that are useful to set up in a template are any licenses for the file, schema information, zones, known lineage to this
directory, plus other classifications.  Information about the types of information that can be attached to an
asset are available on the [Egeria website](https://egeria.odpi.org/open-metadata-publication/website/cataloging-assets/asset-catalog-contents.html).



## Working with templates

If you know the class name of an integration connector's provider, it is possible to check if the connector is of the right type for an integration service.  This function also returns full details of the connector type, which often includes descriptive information as well as the configuration properties that it supports.

The connectors configured in the Files Integrator OMIS are shown in figure 8:

![Figure 8](../images/integration-daemon.png)
> **Figure 8:** exchangeDL01 with its connectors

The class names of these integration connectors' providers can be seen in the connection object embedded in the error message displayed with the connectors' status.

The commands below request that the Files Integrator OMIS service validate and return the connector type for each of these connectors.

In [None]:
dataFilesMonitorProviderClassName = "org.odpi.openmetadata.adapters.connectors.integration.basicfiles.DataFilesMonitorIntegrationProvider"
dataFolderMonitorProviderClassName = "org.odpi.openmetadata.adapters.connectors.integration.basicfiles.DataFolderMonitorIntegrationProvider"

print("Data Files Monitor Integration Connector Type:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, dataFilesMonitorProviderClassName)

print("")
print("Data Folder Monitor Integration Connector Type:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, dataFolderMonitorProviderClassName)

----
Both connectors supports the `templateQualifiedName` and the `allowCatalogDelete` configuration properties.  If you are curious
about their meaning, review the definitions in the connector catalog:

* [Data Files Monitor Integration Connector](https://egeria.odpi.org/open-metadata-publication/website/connector-catalog/data-files-monitor-integration-connector.html)
* [Data Folder Monitor Integration Connector](https://egeria.odpi.org/open-metadata-publication/website/connector-catalog/data-folder-monitor-integration-connector.html)

We are going to set the `templateQualifiedName` with the qualified name of an asset that has the origin set up for the appropriate originating hospitals.

----
The commands below create the template assets for each of the hospitals.

In [None]:
t1AssetName     = "Oak Dene Template"
t1QualifiedName = "template:clinical-trials:drop-foot:weekly-measurments:oak-dene"
t1DisplayName   = "Drop Foot Clinical Trial Measurements Template Asset: Oak Dene"
t1Description   = "Template asset for Oak Dene Hospital."
t1Contact       = "Robbie Records"
t1Dept          = "Drop Foot Research Centre"
t1Org           = "Oak Dene Hospital"

t2AssetName     = "Old Market Template"
t2QualifiedName = "template:clinical-trials:drop-foot:weekly-measurments:old-market"
t2DisplayName   = "Drop Foot Clinical Trial Measurements Template Asset: Old Market"
t2Description   = "Template asset for Old Market Hospital."
t2Contact       = "TBD"
t2Dept          = "DFRG1F6"
t2Org           = "Old Market Hospital"


asset1guids = assetOwnerCreateCSVAsset(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t1DisplayName, t1Description, t1QualifiedName)
asset1guid = getLastGUID(asset1guids)
addOrigin(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t1AssetName, asset1guid, t1Contact, t1Dept, t1Org)

asset2guids = assetOwnerCreateCSVAsset(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t2DisplayName, t2Description, t2QualifiedName)
asset2guid = getLastGUID(asset2guids)
addOrigin(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t2AssetName, asset2guid, t2Contact, t2Dept, t2Org)

print ("\n\nNewTemplate Assets:")
assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*Template.*")


----

The next commands reconfigure the integration connectors that are monitoring for new files in the landing area to use the templates ...

---

In [None]:

t1ConnectorName = "OakDeneLandingAreaFilesMonitor"
t1ConfigurationProperties = {
    "templateQualifiedName" : t1QualifiedName
}

t2ConnectorName = "OldMarketLandingAreaFilesMonitor"
t2ConfigurationProperties = {
    "templateQualifiedName" : t2QualifiedName
}

updateConnectorConfigurationProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t1ConnectorName, t1ConfigurationProperties)
updateConnectorConfigurationProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t2ConnectorName, t2ConfigurationProperties)

getIntegrationConnectorConfigProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t1ConnectorName)
getIntegrationConnectorConfigProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t2ConnectorName)


----

Now lets add the file for week 2 into the landing area.

----

In [None]:

sourceFileName = OakDeneSourceFolder + "/" + "week2.csv"

requestParameters = {
        "source-file" : sourceFileName,
        "destination-folder" : OakDeneLandingDirectory
    }
qualifiedName = "FTP Oak Dene Week 2"

governanceActionGUID = None
governanceActionGUID = initiateGovernanceAction(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                assetGovernanceEngineName,
                                                qualifiedName,
                                                ftpGovernanceServiceRequestType,
                                                requestParameters)


waitForRunningGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)
printGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)
 

----

The newly catalogued file will show that the origin is set up.

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing-area/hospitals/oak-dene/clinical-trials/drop-foot.*")


----
We are now cataloguing files arriving in the landing folders and their origin is set up - and so phase 2 shown in figure 9 is complete.


![Phase 2](../images/automated-curation-scenario-2.png)
> **Figure 9:** Phase 2: Creating assets for newly arrived files with a template


At this point, the integration daemon is ready to support the onboarding of data from either hospital.
The next step is to configure the governance action process that provisions the newly arrived files in the data lake.

## Listening for new assets

The watchdog governance action services are responsible for listening for events that indicate specific types of activity and then acting on it.
They can initiate
a [governance action](https://egeria.odpi.org/open-metadata-implementation/access-services/governance-engine/docs/concepts/governance-action.html),
a [governance action process](https://egeria.odpi.org/open-metadata-implementation/access-services/governance-engine/docs/concepts/governance-action-process.html) or
an [incident report](https://egeria.odpi.org/open-metadata-implementation/access-services/governance-engine/docs/concepts/incident-report.html).

For this onboarding process, we are going to set up a watchdog governance action service called `GenericElementWatchdogGovernanceActionService`.
It can be configured to listen for specific types of events, on specific types of objects and even focus on changes to a specific instance.
Below is the definition of this governance action service.

----

In [None]:

genericElementWatchdogProviderClassName = "org.odpi.openmetadata.adapters.connectors.governanceactions.watchdog.GenericElementWatchdogGovernanceActionProvider"

print("Generic Element Watchdog Governance Action Service:")
validateGovernanceActionEngineConnector(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, genericElementWatchdogProviderClassName)


----

The calls below define this governance action service and create a governance action to cause it to run.

----

In [None]:
watchdogGovernanceServiceName = "new-asset-watchdog-governance-action-service"
watchdogGovernanceServiceDisplayName = "New Asset Watchdog Governance Action Service"
watchdogGovernanceServiceDescription = "Initiates a governance action process when a new asset arrives."
watchdogGovernanceServiceProviderClassName = "org.odpi.openmetadata.adapters.connectors.governanceactions.watchdog.GenericElementWatchdogGovernanceActionProvider"
watchdogGovernanceServiceConfigurationProperties = {}
watchdogGovernanceServiceRequestType = "process-multiple-events"
watchdogGovernanceServiceProcessToCall = "governance-action-process:clinical-trials:drop-foot:weekly-measurements:onboarding"


governanceServiceGUID = createGovernanceService(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                "GovernanceActionService",
                                                watchdogGovernanceServiceName,
                                                watchdogGovernanceServiceDisplayName,
                                                watchdogGovernanceServiceDescription,
                                                watchdogGovernanceServiceProviderClassName,
                                                watchdogGovernanceServiceConfigurationProperties)

if (governanceServiceGUID):
    print (" ")
    print ("The guid for the " + governanceServiceName + " governance service is: " + governanceServiceGUID)
    registerGovernanceServiceWithEngine(cocoMDS1Name,
                                        cocoMDS1PlatformName,
                                        cocoMDS1PlatformURL,
                                        petersUserId,
                                        assetGovernanceEngineGUID,
                                        governanceServiceGUID,
                                        watchdogGovernanceServiceRequestType)
    print ("Service registered as: " + ftpGovernanceServiceRequestType)
    print (" ")


requestParameters = {
        "interestingTypeName" : "Asset",
        "newElementProcessName" : watchdogGovernanceServiceProcessToCall
    }
qualifiedName = "Listen for new assets (Drop Foot Clinical Trial)"

governanceActionGUID = None
governanceActionGUID = initiateGovernanceAction(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                assetGovernanceEngineName,
                                                qualifiedName,
                                                watchdogGovernanceServiceRequestType,
                                                requestParameters)


----

With the watchdog in place, any new files added to the metadata repository will trigger a governance action process called `governance-action-process:clinical-trials:drop-foot:weekly-measurements:onboarding`.

![Phase 3](../images/automated-curation-scenario-3.png)
> **Figure 10:** Phase 3: Triggering provisioning of the file in to the data lake

## Creating a governance action process for new assets

A governance action process defines a flow of governance actions that are linked together by the guards that they produce when they run.
It is represented by a metadata element of type Process.  This gives the process a unique name and
the anchor point to connect it into the governance definitions.

----

In [None]:

qualifiedName = watchdogGovernanceServiceProcessToCall
displayName   = "Drop Foot Onboard Weekly Measurement Files"
description   = "Ensures that new weekly drop foot measurement files from the hospitals are correctly catalogued in the data lake."
technicalName = "DFOBWKLY01"
technicalDescription = """ This process performs the follow function:
 1) The physical file is moved to the data lake and renamed 
 2) A new asset is created for the new file 
 3) Lineage is created between the orginal file asset and the new file asset
 4) The owner and origin are assigned
 5) The governance zones are assigned to make the new asset visible to the research team."""

governanceActionProcessGUID = createGovernanceActionProcess(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, qualifiedName, displayName, description, technicalName, technicalDescription)




actionTypeGUID = createGovernanceActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceEngineGUID, qualifiedName, supportedGuards, requestType, requestParameters) 

setupFirstActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionProcessGUID, actionTypeGUID, optionalGuard)

setupNextActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, currentActionTypeGUID, nextActionTypeGUID, guard, mandatoryGuard, ignoreMultipleTriggers)


----

Lets provision week 4 to see if the process is triggered and the file is moved to the data dake folder.

----

In [None]:

sourceFileName = OakDeneSourceFolder + "/" + "week3.csv"

requestParameters = {
        "source-file" : sourceFileName,
        "destination-folder" : OakDeneLandingDirectory
    }
qualifiedName = "FTP Oak Dene Week 3"

governanceActionGUID = None
governanceActionGUID = initiateGovernanceAction(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                assetGovernanceEngineName,
                                                qualifiedName,
                                                ftpGovernanceServiceRequestType,
                                                requestParameters)


waitForRunningGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)
printGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)


In [None]:

sourceFileName = OakDeneSourceFolder + "/" + "week4.csv"

requestParameters = {
        "source-file" : sourceFileName,
        "destination-folder" : OakDeneLandingDirectory
    }
qualifiedName = "FTP Oak Dene Week 4"

governanceActionGUID = None
governanceActionGUID = initiateGovernanceAction(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                assetGovernanceEngineName,
                                                qualifiedName,
                                                ftpGovernanceServiceRequestType,
                                                requestParameters)


waitForRunningGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)
printGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)


The second instance is part of Coco Pharmaceuticals onboarding process and will be driven from the appearance of the
Asset created by the integration daemon when a file arrives in the landing area.
This instance will also produce lineage and change the resulting filename so that the files are sequenced according to their
arrival.  For example:
 * DropFoot_000001.csv
 * DropFoot_000002.csv

This aids the time-based loading of the files into a database by ensuring any corrections to the readings are applied in the
correct order.

In [None]:

dlGovernanceServiceName = "provision-weekly-measurements-governance-action-service"
dlGovernanceServiceDisplayName = "FTP Governance Action Service"
dlGovernanceServiceDescription = "Provisions weekly measurment files from the landing area to the "
dlGovernanceServiceProviderClassName = moveCopyFileProviderClassName
dlGovernanceServiceConfigurationProperties = {
        "targetFileNamePattern" : "DropFoot_{1, number,000000}.csv"
    }
dlGovernanceServiceRequestType = "move-file"


----

This next code issues the calls to add another definition of `Move/Copy File Governance Action Service` and add it to the AssetGovernance
governance engine.

----

In [None]:

governanceServiceGUID = createGovernanceService(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                "GovernanceActionService",
                                                dlGovernanceServiceName,
                                                dlGovernanceServiceDisplayName,
                                                dlGovernanceServiceDescription,
                                                dlGovernanceServiceProviderClassName,
                                                dlGovernanceServiceConfigurationProperties)

if (governanceServiceGUID):
    print (" ")
    print ("The guid for the " + governanceServiceName + " governance service is: " + governanceServiceGUID)
    registerGovernanceServiceWithEngine(cocoMDS1Name,
                                        cocoMDS1PlatformName,
                                        cocoMDS1PlatformURL,
                                        petersUserId,
                                        assetGovernanceEngineGUID,
                                        governanceServiceGUID,
                                        dlGovernanceServiceRequestType)
    print ("Service registered as: " + dlGovernanceServiceRequestType)
    print (" ")   
  

Next we need a governance action service to listen for the new assets being created in the landing area.
    

![Phase 4](../images/automated-curation-scenario-4.png)
> **Figure 11:** Phase 4: Move the file into the data lake

![Phase 5](../images/automated-curation-scenario-5.png)
> **Figure 12:** Phase 5: Set up the template for newly arrived files in the data lake

![Phase 6](../images/automated-curation-scenario-6.png)
> **Figure 12:** Phase 6: Extend the governance action process to enrich the asset

In [None]:
ftpRequestType = "copy-file"

if governanceServiceGUID:
    registerGovernanceServiceWithEngine(cocoMDS2Name,
                                        cocoMDS2PlatformName,
                                        cocoMDS2PlatformURL,
                                        erinsUserId,
                                        assetGovernanceEngineGUID,
                                        governanceServiceGUID,
                                        ftpRequestType)
    print (" ")
    print ("Service registered as: " + ftpRequestType)
    print (" ")
    
print ("Done. ")

In [None]:
# Test the status of all of the connectors

csvDiscoveryProviderClassName = "org.odpi.openmetadata.adapters.connectors.discoveryservices.CSVDiscoveryServiceProvider"

print("CSV Discovery Service:")
validateAssetAnalysisEngineConnector(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, csvDiscoveryProviderClassName)




----


---

----

----

----


----


----

----