![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/main/assets/img/ODPi_Egeria_Logo_color.png)

### Egeria Hands-On Lab
# Welcome to the Automated Curation Lab

## Introduction

Egeria is an open source project that provides open standards and implementation libraries to connect tools, catalogs and platforms together so they can share information about data and technology (called metadata).

In the [Building a Data Catalog](building-a-data-catalog.ipynb) lab, Peter Profile and Erin Overivew
manually catalogued the weekly measurement files for the Drop Foot clinical trial.

In this hands-on lab you will get a chance to work with Egeria's governance servers to
automate this onboarding process.

**Note: this Lab needs Egeria release 3.15 or later to run**

## The scenario

[Coco Pharmaceuticals](https://egeria-project.org/practices/coco-pharmaceuticals/)
is conducting a clinical trial with two hospitals: Oak Dene Hospital and Old Market Hospital.
Each week the two hospitals send Coco Pharmaceuticals a set of measurements from the patients
involved in the trial.  These measurements are located in a CSV file that the hospital sends through
secure file transfer to a folder in Coco Pharmaceutical's landing area.

These files need to be copied into the data lake and catalogued so that they are only visible to the
staff involved in the clinical trial.  It is also important that the lineage of these files is
maintained so the source of the data can be traced.  This process is shown in Figure 1.

![Scenario](../images/automated-curation-scenario.png)
> **Figure 1:** Clinical trial weekly measurements onboarding process

Peter Profile and Erin Overview are responsible for this onboarding process.

<figure style="margin-left: 7%; display:inline-block;">  
  <img src="https://raw.githubusercontent.com/odpi/egeria-docs/main/site/docs/practices/coco-pharmaceuticals/personas/peter-profile.png">
  <figcaption style="margin-left: 15%;"><strong>Peter Profile</strong></figcaption>
</figure>

<figure style="margin-left: 20%; display:inline-block;">  
  <img src="https://raw.githubusercontent.com/odpi/egeria-docs/main/site/docs/practices/coco-pharmaceuticals/personas/erin-overview.png">
  <figcaption style="margin-left: 15%;"><strong>Erin Overview</strong></figcaption>
</figure>


They have defined a list of requirements for the process:

* Files must be in the landing area for a minimum amount of time.
* As a new file is received, it needs to be catalogued, including:
   * Description
   * Connection details to enable the data scientists to access the contents
   * Column details
   * Governance zones defining the files' visibility
   * Owner 
   * Origin
* A file is not accessible by any of the data lake users until the cataloguing process is complete.
* They must record lineage of each measurements file so they know which hospital it came from.

They have been [manually cataloguing the measurements files](building-a-data-catalog.ipynb) for
the first few weeks to prove the approach but now it is time to automate the process since:
* This clinical trial is planned to run for two years.
* There is a expected to be a ramp up of other clinical trials running simultaneously and the
  file onboarding workload would soon become overwhelming if they continued with the manual approach.

They plan to use an
[Integration Daemon](https://egeria-project.org/concepts/integration-daemon)
called **exchangeDL01** to capture the technical metadata of the files.
Then the 
[Engine Host](https://egeria-project.org/concepts/engine-host)
server called **governDL01** will manage the move of the file into the data lake,
the augmentation of the metadata properties of the files and the creation of the lineage.

This lab sets up the automated onboarding process using 6 phases as shown in Figure 2:

![Phases](../images/automated-curation-scenario-6.png)
> **Figure 2:** All six phases

1. Bring files in from "outside" using the move/copy file governance action service
2. Create a template used to catalog files in the landing areas and data lake
3. Trigger a governance action process when a file appears in a landing folder
4. Move the file into the data lake folder using a governance action process
5. Detect new files as they are added to the data lake folder and catalog them as assets (using the template
6. Enrich the new assets using governance action services

## Setting up

Coco Pharmaceuticals make widespread use of Egeria for tracking and managing their data and related assets.
Figure 3 below shows their servers and the Open Metadata and Governance (OMAG) Server Platforms that are hosting them.

![Figure 3](../images/coco-pharmaceuticals-systems-omag-server-platforms.png)
> **Figure 3:** Coco Pharmaceuticals' OMAG Server Platforms

The code below checks that the platforms are running.  It checks that the servers are configured and whether they are running on the platform.  If a server is configured, but not running, it will start it.

Look for the second "Done." message that is displayed just after the governance servers have started.  It may take up to a minute to start up all of the servers.  If `cocoMDS2` seems to be very slow starting, check that Apache Kafka is running.  The metadata servers will wait for Kafka during start up if they are connected to a cohort.  When they are waiting in this manner, they periodically output a message on the console.

In [None]:
# Start up the metadata servers and the view server
%run ../common/common-functions.ipynb
%run ../common/environment-check.ipynb

print("Start up the Governance Servers")
activatePlatform(dataLakePlatformName, dataLakePlatformURL, [governDL01Name, exchangeDL01Name])

print("Done. ")

----
You should see that both the metadata servers `cocoMDS1` and `cocoMDS2` along with the integration daemon `exchangeDL01` and the engine host server `governDL01` have started.  These are the servers that will be used in this lab and they are connected together
as shown in figure 4.  The `cocoMDS1` metadata server is where the new files are catalogued, whereas the definitions used
to drive the governance processes come from `cocoMDS2`.  

![Figure 4](../images/governance-servers-for-automated-onboarding.png)
> **Figure 4:** Governance servers for automated curation

If any of the platforms are not running, follow [this link to set up and run the platform](https://egeria-project.org/education/open-metadata-labs/overview/).  If any server is reporting that it is not configured then
run the steps in the [Server Configuration](../egeria-server-config.ipynb) lab to configure
the servers.  Then re-run the previous step to ensure all of the servers are started.

## Review the status of the integration daemon

At this point, even though both `exchangeDL01` and `governDL01` are running, there is still work to set up the full data onboarding pipeline.  Lets start with the integration daemon that catalogs the files as they appear in a folder.

The command below queries the status of the integration daemon called `exchangeDL01`.

In [None]:

getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)


----

Notice that the [Files Integrator Open Metadata Integration Service (OMIS)](https://egeria-project.org/services/omis/files-integrator/overview/) is running three integation connectors and they are all failing because the directories (folders) that they are supposed to be monitoring do not exist.  

You may be wondering why the the integration connectors fail when the directories are not present.  Surely they could keep running and periodically check whether the directory has been created yet.  The answer is that they can.  It is a design choice made when the integration connectors are written.  In this example, the assumption made by the integration connector developer was that the folders should be present and there is an issue in the set-up if they are not.

As the data onboarding pipeline is developed in this lab, the directories will get created and we will be able to restart the connectors to get them working.

Peter records the names of the three integration connectors so he can reconfigure them and restart them later.

---

In [None]:

OakDeneLandingAreaConnectorName   = 'OakDeneLandingAreaFilesMonitor'
OldMarketLandingAreaConnectorName = 'OldMarketLandingAreaFilesMonitor'
DataLakeDirectoryConnectorName    = 'DropFootClinicalTrialResultsFolderMonitor'


----

Peter records the names of the directories that the three integration connectors are monitoring.

The `DropFootClinicalTrialResultsFolderMonitor` integration connector monitoring the the data lake directory is maintaining the last update time for the [DataFolder](https://egeria-project.org/types/2/0220-Files-and-Folders/) asset that represents the collection of measurements received from all hospitals for the clinical trial.

----

In [None]:

landingAreaDirectory      = fileSystemRoot + '/landing-area/hospitals'
dataLakeDirectory         = fileSystemRoot + '/data-lake/research/clinical-trials/drop-foot/weekly-measurements'

OakDeneLandingDirectory   = landingAreaDirectory + '/oak-dene/clinical-trials/drop-foot'
OldMarketLandingDirectory = landingAreaDirectory + '/old-market/clinical-trials/drop-foot'


----

For the purposes of this lab, we need to be able to simulate the arrival of files from the hospital.  These files are located in the sample data that is in the Egeria distribution.

----

In [None]:

OakDeneSourceFolder   = egeriaSampleDataRoot + '/sample-data/oak-dene-drop-foot-weekly-measurements'
OldMarketSourceFolder = egeriaSampleDataRoot + '/sample-data/old-market-drop-foot-weekly-measurements'


----

When the files are catalogued in Egeria, they are represented by specialized [assets](https://egeria-project.org/concepts/asset/) of type `CSVfile`.


At this time there are no assets catalogued for either the landing area or the weekly measurements directory.

----

In [None]:

print("Assets in the landing area:")
assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing.*")

print("\nAssets in the data lake:")
assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*drop-foot/weekly-measurements.*")

print()



----
## Setting up the file transfer into the landing area

The directories that the integration connectors are configured to monitor could be created using a file system command.
In this lab, however, the creation of a landing area folder will occur when the first file is received from the corresponding hospital.
This is to illustrate how changes in the physical data landscape can trigger actions in the open metadata ecosystem.

Peter is going to use a
[provisioning governance service](https://egeria-project.org/guides/developer/governance-action-services/provisioning-governance-service/) called
[Move/Copy File Governance Action Service](https://egeria-project.org/connectors/governance-action/move-copy-file-provisioning-governance-action-service/)
to simulate the file transfer from a hospital to its folder in the landing zone.
This service runs in a governance engine that is supported by the
[Governance Action Open Metadata Engine Service (OMES)](https://egeria-project.org/services/omes/governance-action/overview).

This governance service
runs in a governance engine supported by the Governance Action Open Metadata Engine Services (OMES).
Figure 5 shows the three governance engines configured in the `governDL01` engine host server during the
[Configuring Egeria Servers Lab](../egeria-server-config.ipynb).

![Figure 5](../images/engine-host.png)
> **Figure 5:** Governance Engines for governDL01

The command below queries the status of each governance engine running in `governDL01`.
The governance action services that will support the onboarding of files for clinical trials will run in the `AssetGovernance`
governance engine.  The other two governance engines are the subject of the [Improving Data Quality](../administration-labs/open-discovery-config.ipynb) lab.


----

In [None]:

printGovernanceEngineStatuses(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId)


----
If you see the status code `ASSIGNED`, it means that the governance engine was listed in the `governDL01` [Engine Host's configuration
document](https://egeria-project.org/guides/admin/servers/configuring-an-engine-host/#list-engine-services) (the governance engine was assigned to this server) but the Engine Host has not been
able to retrieve the configuration for the governance engine from its metadata server (`cocoMDS2`).

Peter adds the definitions for the governance engine to `cocoMDS2` to start the `AssetGovernance` governance engine running. This definition of a governance engine is stored in an [open metadata archive](https://egeria-project.org/guides/developer/open-metadata-archives/creating-governance-engine-packs/).  

The command below loads the definitions for `AssetGovernance`, `AssetDiscovery` and `AssetQuality` into `cocoMDS2`.

----

In [None]:

def loadArchive(serverName, serverPlatformName, serverPlatformURL, archiveFileName):
    loadArchiveURL = serverPlatformURL +  '/open-metadata/admin-services/users/' +  adminUserId + '/servers/' + serverName + '/instance/open-metadata-archives/file'
    print(" ")
    response = requests.post(loadArchiveURL, archiveFileName, verify=False)
    if response.status_code == 200:
        print("Archive loaded: " + archiveFileName)
    else:
        print ("Returns:")
        prettyResponse = json.dumps(response.json(), indent=4)
        print (prettyResponse)
    print(" ")
    
archiveFileName = 'content-packs/CocoGovernanceEngineDefinitionsArchive.json'

loadArchive(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, archiveFileName)

print("Sleeping to complete processing of archive ...")
time.sleep(3)

print("Done. ")

----

The archive contains five [*governance service definitions*](https://egeria-project.org/concepts/governance-service-definition/).  These describe reusable governance components called [*governance services*](https://egeria-project.org/concepts/governance-service/) that Peter and Erin will be using in the onboarding process.  

Peter lists the governance service definitions added by the archive.  (*If less than five are returned, re-run the cell to retrieve them all*).

----

In [None]:
print("")
print("Governance Service Definitions ...")
getGovernanceServiceDefinitions(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId)
print("")

----

The governance service definitions are linked into [*governance engine definitions*](https://egeria-project.org/concepts/governance-engine-definition/) that describe the functions that are available in a governance engine running on an Engine Host.


![Figure 6](../images/governance-engine-definition.png)
> **Figure 6:** Structure of the governance services within a governance engine

A governance engine's request types are the functions it supports.  They are mapped to the governance services that provide the implementation.  When a call is made to the governance engine, a request type is passed.  This is mapped to a governance service.  The governance engine runs the governance service.

----

Peter lists the governance engines that were defined in the archive.
They plan to use the *AssetGovernance* engine since the request types listed below match their needs for building the automated onboarding process.

----

In [None]:

print("Governance Engine Definitions ...")
getGovernanceEngineDefinitions(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId)


----

Once a governance engine definition is loaded, it is sent to the `governDL01` Engine Host in an event
and, on receipt, the engine's status moves to `CONFIGURING`.  The Engine Host then starts loading the governance services.

When at least one governance service definition is linked to the governance engine, the status of the
governance engine in `governDL01` moves to `RUNNING` and it is possible to see the list of supported
request types for the governance engine.

----

Peter checks that the governance engine definitions have been passed to the Engine Host.

Notice that the descriptions from the governance engine definitions have been picked up.  Since *AssetQuality* had no governance services registered with it, it is still in `CONFIGURING` state, where are the other engines are in `RUNNING` state.  The *FileProvisioning* governance engine has been ignored because it was not listed in the Engine Host's configuration.

----

In [None]:
assetGovernanceEngineName = "AssetGovernance"

waitForRunningGovernanceEngine(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId, assetGovernanceEngineName)

printGovernanceEngineStatuses(governDL01Name, governDL01PlatformName, governDL01PlatformURL, petersUserId)


----

With the governance action services defined in a running governance engine, all that remains is to call them.

Any defined governance action service can be called directly through the Governance Engine OMAS's `initiateGovernanceAction()` service. The `addFileToLandingArea()` function defined below, requests that the Governance Engine OMAS calls the *simulate-ftp* request running on the *AssetGovenance* governance engine.

----

In [None]:


def addFileToLandingArea(sourceFolder, destinationFolder, weekNumber, qualifiedName):
    sourceFileName = sourceFolder + "/" + "week" + weekNumber + ".csv"
    requestParameters = {
        "sourceFile" : sourceFileName,
        "destinationFolder" : destinationFolder
    }
    governanceActionGUID = None
    governanceActionGUID = initiateGovernanceAction(cocoMDS2Name,
                                                    cocoMDS2PlatformName,
                                                    cocoMDS2PlatformURL,
                                                    erinsUserId,
                                                    assetGovernanceEngineName,
                                                    qualifiedName,
                                                    "simulate-ftp",
                                                    requestParameters,
                                                    "Populate landing area")
    waitForRunningGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)
    printGovernanceAction(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)


----

Peter calls this function to add a file to the Oak Dene Hospital landing area directory.

----

In this lab, Governance Engine OMAS is running in the `cocoMDS2` metadata server.
The call below results in an event being sent to all
engine hosts running the named governance engine.  The first one that claims it, gets to run it.  In this lab
there is only one engine host and so the request will run on `governDL01`.

This governance action does not take long to run so you should soon see its status as `ACTIONED`.

----

In [None]:

addFileToLandingArea(OakDeneSourceFolder, OakDeneLandingDirectory, "1", "FTP Oak Dene Week 1")


----

A [governance action](https://egeria-project.org/concepts/governance-action/) identifies the governance engine that is/was called and the *requestType* and *requestParameters* used to set up the call to the associated governance service.  The *actionStatus* is showing the status of the the governance action (`ACTIONED` means it successfully completed) and the [*completionGuards*](https://egeria-project.org/concepts/guard/) describe the outcome of the governance action.

The *processingEngineUserId* identifies which Engine Host claimed the governance action.  (Multiple Engine Hosts can run the same governance engine for scalability and resillience.)  In this case `governDL01npa` is the userId of the `governDL01` Engine Host.

----

Peter and Erin now have the ability to provision new files into either landing area as shown in Figure 7.  This will allow them to test their onboarding process before real files start to arrive from the hospitals.

![Phase 1](../images/automated-curation-scenario-1.png)
> **Figure 7:** Phase 1: Arrival of new files


----

Now there is a file in the landing area directory, restarting the oak dene integration connector in the `exchangeDL01` integration daemon will cause it to test again to see if the directory it is monitoring exists.

----

In [None]:

restartIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId, "files-integrator", OakDeneLandingAreaConnectorName)


----

Peter checks the status of the integration connectors to be sure that the `OakDeneLandingAreaFilesMonitor` connector is now running.

----

In [None]:

getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)


----

The connector will begin cataloguing any files placed in the directory.  We should be able to see the `week1.csv` file in the catalog along with catalog entries for each of the nested directories it sits in (6 assets in total). 

*If less than 6 assets are shown, wait a short while and run the cell again and they will appear*.

----

In [None]:
assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing.*")

In [None]:
## If you would like to see more detail for any of these assets, set up the assetGUID value below with the unique identifier (guid)
## of the asset you are interested in and uncomment (remove #) from the front of printSelectiveAssetUniverse and then run this cell.

assetGUID = "Add guid here"

#printSelectiveAssetUniverse(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, 'asset-owner', petersUserId, assetGUID, True, True)

----

The call below captures the full path name of the landing area directory by extracting it from the catalogued asset for this directory.
This will be used when configuring a later part of the onboarding process.

----

In [None]:

landingAreaDirectoryName = assetOwnerFindAssetPathName(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*" + landingAreaDirectory)

print(landingAreaDirectoryName)


----

With the full path name, it is possible to list the files in the file system.

----

In [None]:
oakDeneLandingAreaFullPath = "/" + landingAreaDirectoryName + "/oak-dene/clinical-trials/drop-foot"

# When running in a kubernetes environment, the os.listdir command will fail 
# because the topology of the system is different. Consequently the default is to comment out this command.
# If your topology allows (e.g. you are running Egeria locally to your Jupyter server), feel free to uncomment this 
# so that you can watch the files change.
#
# os.listdir (oakDeneLandingAreaFullPath)

----
The action of the integration connector has been to catalog the new file and the folders above it.
The catalog definition is minimal, consisting of just what can be gleaned from the file system plus the setting of the default zone for assets catalogued by Data Manager OMAS on cocoMDS1.

It is possible to provide the integration connector with a template metadata element to copy when it
is cataloguing files.   This template can include classifications such as zones, origin, confidentiality and attachments such as
connections, schemas and lineage mappings. (There is more information about templates on the
[Egeria website](https://egeria-project.org/features/templated-cataloguing/overview/)).

Templates are used to set up values for the assets created by the connector that are always the same.
In this example, all of the files that this connector encounters are from the Oak Dene Hospital so we can use a template to set up the
file's origin and the connection information needed to access the file.

Including this type of detail in the asset means that no-one has to remember that this landing area folder was used by
the Oak Dene Hospital and it simplifies the downstream cataloguing.
Other values that are useful to set up in a template are any licenses for the file, schema information, zones, known lineage to this
directory, plus other classifications.  Information about the types of information that can be attached to an
asset are available on the [Egeria website](https://egeria-project.org/patterns/metadata-manager/overview/).



## Working with templates

If you know the class name of an integration connector's provider, it is possible to check if the connector is of the right type for an integration service.  This function also returns full details of the connector type, which often includes descriptive information as well as the configuration properties that it supports.

The connectors configured in the Files Integrator OMIS are shown in figure 8:

![Figure 8](../images/integration-daemon.png)
> **Figure 8:** exchangeDL01 with its connectors

The class names of these integration connectors' providers can be seen in the connection object embedded in the error message displayed with the connectors' status.

The commands below request that the Files Integrator OMIS service validate and return the connector type for each of these connectors.

In [None]:
dataFilesMonitorProviderClassName = "org.odpi.openmetadata.adapters.connectors.integration.basicfiles.DataFilesMonitorIntegrationProvider"
dataFolderMonitorProviderClassName = "org.odpi.openmetadata.adapters.connectors.integration.basicfiles.DataFolderMonitorIntegrationProvider"

print("")
print("Data Files Monitor Integration Connector Type:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, dataFilesMonitorProviderClassName)
print("")


----

In [None]:

print("")
print("Data Folder Monitor Integration Connector Type:")
validateIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, dataFolderMonitorProviderClassName)
print("")


----
Both connectors supports the `templateQualifiedName` and the `allowCatalogDelete` configuration properties.  If you are curious
about their meaning, review the definitions in the connector catalog:

* [Data Files Monitor Integration Connector](https://egeria-project.org/connectors/integration/data-files-monitor-integration-connector/)
* [Data Folder Monitor Integration Connector](https://egeria-project.org/connectors/integration/data-folder-monitor-integration-connector/)

Peter is going to set the `templateQualifiedName` with the qualified name of an asset that has the origin set up for the each of the appropriate originating hospitals and membership of the default governance zone of "quarantine".
This zone means that the asset is still being set up and it is not visible to the data lake users.

----

The commands below create the template assets for each of the hospitals. In addition, Peter creates a third template for the asset created to
represent the file when it is stored in the data lake folder.  It will set up Tanya Tidie as the owner of the files and marks the transition
of ownership from the data lake operations team to the clinical trials team.  It also has the zone membership set to default
governance zone of "quarantine".

This third template will be incorporated into the flow when we configure the governance action that moves the file from the landing area to the data lake.

----

In [None]:
newAssetZone    = "quarantine"

## Set up landing area templates

t1AssetName     = "Oak Dene Template"
t1PathName      = "template:clinical-trials:drop-foot:weekly-measurements:oak-dene.csv"
t1DisplayName   = "Drop Foot Clinical Trial Measurements Template Asset: Oak Dene"
t1Description   = "Measurement file from Oak Dene Hospital."
t1Contact       = "Robbie Records"
t1Dept          = "Drop Foot Research Centre"
t1Org           = "Oak Dene Hospital"

t2AssetName     = "Old Market Template"
t2PathName      = "template:clinical-trials:drop-foot:weekly-measurements:old-market.csv"
t2DisplayName   = "Drop Foot Clinical Trial Measurements Template Asset: Old Market"
t2Description   = "Measurement file from Old Market Hospital."
t2Contact       = "TBD"
t2Dept          = "DFRG1F6"
t2Org           = "Old Market Hospital"


template1guids = assetOwnerCreateCSVAsset(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t1DisplayName, t1Description, t1PathName)
template1guid = getLastGUID(template1guids)
addOrigin(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t1AssetName, template1guid, t1Contact, t1Dept, t1Org)
addZones(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, petersUserId, t1AssetName, template1guid, [newAssetZone])

template2guids = assetOwnerCreateCSVAsset(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t2DisplayName, t2Description, t2PathName)
template2guid = getLastGUID(template2guids)
addOrigin(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t2AssetName, template2guid, t2Contact, t2Dept, t2Org)
addZones(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, petersUserId, t2AssetName, template2guid, [newAssetZone])

# Set up the template for new files in the data lake

t3AssetName     = "Data Lake Measurements Template"
t3PathName      = "template:clinical-trials:drop-foot:weekly-measurements:data-lake.csv"
t3QualifiedName = "CSVFile:" + t3PathName
t3DisplayName   = "Drop Foot Clinical Trial Measurements Template Asset: Data Lake"
t3Description   = "Weekly measurements file from a single hospital."
t3Owner         = "tanyatidie"
t3OwnerType     = "USER_ID"

template3guids = assetOwnerCreateCSVAsset(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, t3DisplayName, t3Description, t3PathName)
template3guid = getLastGUID(template3guids)
addOwner(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, petersUserId, t3AssetName, template3guid, t3Owner, t3OwnerType)
addZones(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, petersUserId, t3AssetName, template3guid, [newAssetZone])

print ("\n\nNewTemplate Assets:")
assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*Template.*")

----

With the templates defined, Peter reconfigures the integration connectors that are monitoring for new files in the landing area to use the templates ...

---

In [None]:

t1ConnectorName = "OakDeneLandingAreaFilesMonitor"
t1ConfigurationProperties = {
    "templateQualifiedName" : t1PathName
}

t2ConnectorName = "OldMarketLandingAreaFilesMonitor"
t2ConfigurationProperties = {
    "templateQualifiedName" : t2PathName
}

updateConnectorConfigurationProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t1ConnectorName, t1ConfigurationProperties)
updateConnectorConfigurationProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t2ConnectorName, t2ConfigurationProperties)


----

Peter verifies that the configuration properties have been updated, using these next commands to retrieve the configuration properties for each integration connector.

----

In [None]:

print("Configuration properties update for connector: " + t1ConnectorName)
getIntegrationConnectorConfigProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t1ConnectorName)

print("Configuration properties update for connector: " + t2ConnectorName)
getIntegrationConnectorConfigProperties(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, "files-integrator", petersUserId, t2ConnectorName)


----

To test the new templates, Peter add the file for week 2 into the Oak Dene Hospital's landing area directory.

----

In [None]:

addFileToLandingArea(OakDeneSourceFolder, OakDeneLandingDirectory, "2", "FTP Oak Dene Week 2")
 

----

The newly catalogued file, `week2.csv`, has the origin set up.  This shows that the template is working.

Compare the `week1.csv` with the `week2.csv`...

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*clinical-trials/drop-foot/week.*")


----
Peter is now cataloguing files arriving in the landing folders and their origin is set up - and so phase 2 shown in figure 9 is complete.


![Phase 2](../images/automated-curation-scenario-2.png)
> **Figure 9:** Phase 2: Creating assets for newly arrived files with a template

----

The integration daemon now supports the cataloguing of new files received from either hospital.  The files are represented by `CSVFile` assets.  The template used when creating these assets ensures the origin is set up with the details of the particular hospital that sent the file. 

One of the requirements of the onboarding process was that files remained in the landing area for as short a time as possible. Peter and Erin need to define an automated process that will immediately bring the files from the landing area into the data lake and catalog them with the appropriate classification and lineage.

## Creating a governance action process for new assets

Up to this point in the lab, we have been calling governance actions directly.
A *Governance Action Process* defines a flow of calls to the governance engines.
Each call results in the creation of a *Governance Action* that runs as you have seen above.

A governance action process is represented by a metadata element of type [GovernanceActionProcess](https://egeria-project.org/types/4/0462-Governance-Action-Types).
This element gives the process a unique name and the anchor point to connect it into the governance action definitions.

Erin defines the governance action process first.

----

In [None]:
governanceActionProcessName = "governance-action-process:clinical-trials:drop-foot:weekly-measurements:onboarding"

qualifiedName = governanceActionProcessName
displayName   = "Drop Foot Onboard Weekly Measurement Files"
description   = "Ensures that new weekly drop foot measurement files from the hospitals are correctly catalogued in the data lake."
technicalName = "DFOBWKLY01"
technicalDescription = """ This process performs the follow function:
 1) The physical file is moved to the data lake and renamed 
 2) A new asset is created for the new file 
 3) Lineage is created between the orginal file asset and the new file asset
 4) The owner and origin are assigned
 5) The governance zones are assigned to make the new asset visible to the research team."""

governanceActionProcessGUID = createGovernanceActionProcess(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, qualifiedName, displayName, description, technicalName, technicalDescription)


----

She then adds the definitions of which type of governance actions should run when the process is initiated.
These definitions are represented by
[GovernanceActionType](https://egeria-project.org/practices/types/4/0462-Governance-Action-Types) metadata elements.

They are linked together to define which governance action is started depending on the guards produced by the previous governance action.

![Governance Action Process Flow](../images/governance-action-process-flow.png)
> **Figure 10:** Definition of a governance action process flow using a chain of governance action types

----

Each GovernanceActionType also links to a Governance Engine.  The governance request type is set up in the relationship.

![Governance Action Process Implementation](../images/governance-action-process-implementation.png)
> **Figure 11:** Linkage of the governance action process to the Asset Governance governance engine

----

The first step in the process is to move the file from the landing area to the data lake folder.

For this Erin is using a different call to the `Move/Copy File` Governance Action Service.
This instance will be called using the `move-file` request type which moves the file, produces lineage and changes the resulting filename so that the files are sequenced according to their arrival.  For example:
 * DropFoot_000001.csv
 * DropFoot_000002.csv

This aids the time-based loading of the files into a database by ensuring any corrections to the readings are applied in the
correct order if a hospital sends a correction file at a later date.

Erin defines the call to this governance action service via the following governance action type.
Notice that the template created for cataloguing the files as they move into the data lake directory is set up in the `destinationFileTemplateQualifiedName` requestParameter as required for phase 5.

![Phase 5](../images/automated-curation-scenario-5.png)
> **Figure 12:** Phase 5: Set up the template for newly arrived files in the data lake

----

In [None]:

assetGovernanceEngineGUID = getGovernanceEngineGUID(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, assetGovernanceEngineName)


In [None]:

prGovernanceServiceRequestType = "move-file"

prGovernanceServiceType = "provision-weekly-measurements-governance-action-type"
prSupportedGuards = ["provisioning-complete", "provisioning-failed"]
prGovernanceServiceRequestParameters = {
        "destinationFileTemplateQualifiedName" : t3QualifiedName,
        "targetFileNamePattern" : "DropFoot_{1, number,000000}.csv",
        "destinationFolder" : dataLakeDirectory
    }

prActionTypeGUID = None
prActionTypeGUID = createGovernanceActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, assetGovernanceEngineGUID, prGovernanceServiceType, prSupportedGuards, prGovernanceServiceRequestType, prGovernanceServiceRequestParameters) 

if prActionTypeGUID:
    setupFirstActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionProcessGUID, prActionTypeGUID, False)


----

Now that the files are in the correct folder with the appropiate file name and a new catalog entry has been created, it is time to
add the last touches to the catalog entry before it is published to the data lake users.
This is phase 6 of the onboarding process.

![Phase 6](../images/automated-curation-scenario-6.png)
> **Figure 12:** Phase 6: Extend the governance action process to enrich the asset

The template used to create the catalog was able to set up the owner of the file because it is the same for each file that is moved into this directory.
However, the origin is variable, depending on which hospital send the original file.
To set the origin, we are going to use the
[Origin Seeker Remediation Governance Action Service](https://egeria-project.org/connectors/governance-action/origin-seeker-remediation-governance-action-service/)
that is able to navigate back through the lineage relationship created by the `Move/Copy File` governance action service to pick up the origin from the landing area asset.

----

In [None]:

osGovernanceServiceRequestType = "seek-origin"

osPreviousGuard = "provisioning-complete"
osGovernanceServiceType = "origin-seeker-measurements-governance-action-type"
osSupportedGuards = [ "origin-assigned", "origin-already-assigned", "multiple-origins-detected", "no-origins-detected", "no-targets-detected", "multiple-targets-detected", "origin-seeking-failed"]
osGovernanceServiceRequestParameters = {
        "destinationFileTemplateQualifiedName" : t3QualifiedName
    }

osActionTypeGUID = createGovernanceActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, assetGovernanceEngineGUID, osGovernanceServiceType, osSupportedGuards, osGovernanceServiceRequestType, osGovernanceServiceRequestParameters) 

setupNextActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, prActionTypeGUID, osActionTypeGUID, osPreviousGuard, True, True)

----

Once the origin is set, the asset is completely defined and it is ready to be consumed by the data lake users.
The last governance action service called `ZonePublisher` sets the zone membership for the asset so that it becomes visible to the data lake users.

----

In [None]:

zpGovernanceServiceRequestType = "set-zone-membership"

zpPreviousGuard = "origin-assigned"
zpGovernanceServiceType = "zone-publisher-measurements-governance-action-type"
zpSupportedGuards = [ "zone-assigned","no-zones-detected","no-targets-detected", "zone-publishing-failed"]
zpGovernanceServiceRequestParameters = {
    "publishZones" : "data-lake,clinical-trials"
    }

zpActionTypeGUID = createGovernanceActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, assetGovernanceEngineGUID, zpGovernanceServiceType, zpSupportedGuards, zpGovernanceServiceRequestType, zpGovernanceServiceRequestParameters) 

setupNextActionType(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, osActionTypeGUID, zpActionTypeGUID, zpPreviousGuard, True, True)


----

Erin now has the process that will complete the onboarding of the files into the data lake.

The next step is to set up a listener for the events broadcast when a new asset is created by the integration connectors to represent a new file appearing in the landing area directory.
When the listener receives such an event,
it should initiate a new instance of this governance action process to move the file into the data lake, 
catalog the file in its new location as a new asset,
link it with lineage to the original asset and add the appropriate classifications to the new data lake asset.

The movement of the file out of the landing area causes the appropriate landing area integration connector to archive the associated asset.  This means that the [*Memento*](https://egeria-project.org/concepts/memento/) classification is added to it and it becomes invisible to all but lineage requests.

## Listening for new assets

The watchdog governance action services are responsible for listening for events that indicate specific types of activity and then acting on it.
They can initiate
a [governance action](https://egeria-project.org/concepts/governance-action),
a [governance action process](https://egeria-project.org/concepts/governance-action-process) or
an [incident report](https://egeria-project.org/concepts/incident-report).

Erin sets up a watchdog governance action service called `GenericFolderWatchdog` to listen for assets that are catalogued and linked to the landing area directory.  When assets appear in the catalog (created by the integration connectors monitoring the landing area), the watchdog governance action service will run the `governance-action-process:clinical-trials:drop-foot:weekly-measurements:onboarding` governance action process just defined above.

----

In [None]:
watchdogGovernanceServiceRequestType = "watch-for-new-files"
watchdogGovernanceServiceProcessToCall = governanceActionProcessName


requestParameters = {
        "interestingTypeName" : "DataFile",
        "folderName" : landingAreaDirectoryName,
        "actionTargetName" : "sourceFile",
        "newElementProcessName" : watchdogGovernanceServiceProcessToCall
    }
qualifiedName = "Listener: " + landingAreaDirectory

governanceActionGUID = None
governanceActionGUID = initiateGovernanceAction(cocoMDS2Name,
                                                cocoMDS2PlatformName,
                                                cocoMDS2PlatformURL,
                                                erinsUserId,
                                                assetGovernanceEngineName,
                                                qualifiedName,
                                                watchdogGovernanceServiceRequestType,
                                                requestParameters,
                                                "Landing Area Monitoring")

print("Sleeping to ensure watchdog service is running ...")
time.sleep(2)

print("Done. ")

----

The watchdog service is listening for events.  Here is its governance action.  You can see that its *actionStatus* is `IN_PROGRESS`

----

In [None]:
printGovernanceAction(cocoMDS2Name,cocoMDS2PlatformName, cocoMDS2PlatformURL, erinsUserId, governanceActionGUID)

----

With the watchdog in place, any new files added to the metadata repository will trigger a governance action process called `governance-action-process:clinical-trials:drop-foot:weekly-measurements:onboarding`.

![Phase 3](../images/automated-curation-scenario-3.png)
> **Figure 13:** Phase 3: Triggering provisioning of the file in to the data lake

----

Peter provisions the week 3 file in to the Oak Dene Hospital's landing area folder
to see if the process is triggered and the file is moved to the data lake folder.

----

In [None]:

addFileToLandingArea(OakDeneSourceFolder, OakDeneLandingDirectory, "3", "FTP Oak Dene Week 3")


----

Peter monitors the status of the governance actions.  

*Retry running the* monitorGovernanceActions *command until you can see all of the governance actions with a status of `ACTIONED` except the watchdog governance action that stays `IN_PROGRESS`.*

----

In [None]:

monitorGovernanceActions(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, petersUserId)
time.sleep(3)

----

The file is created in the data lake folder with all of the correct attributes.  *(If no assets are returned, repeatedly run the cell again until an asset appears.)*

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*clinical-trials/drop-foot/weekly-measurements/.*")


----

The week3.csv file in the landing area is gone.

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing-area/hospitals/oak-dene/clinical-trials/drop-foot/.*")


----

Restarting the data lake directory's integration connector in the integration daemon will enable it to maintain the last update date for the folder.

----

In [None]:
restartIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId, "files-integrator", DataLakeDirectoryConnectorName)

----

We can check that the connector is running ...

----

In [None]:
getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)

----

## Running the complete onboarding pipeline

The finale of this lab is to show the complete pipeline running.  To do this Peter will use files from both Oak Dene Hospital and the Old Market Hospital. 

----

The code below uses the `Move/Copy File` governance action service to add the first five weeks files from each hospital into their landing area.

----

In [None]:

# Add the remaining files from Oak Dene Hospital

addFileToLandingArea(OakDeneSourceFolder,   OakDeneLandingDirectory,   "4", "FTP Oak Dene Hospital Week 4")
addFileToLandingArea(OakDeneSourceFolder,   OakDeneLandingDirectory,   "5", "FTP Oak Dene Hospital Week 5")

# Add all of the files from Old Market Hospital

addFileToLandingArea(OldMarketSourceFolder, OldMarketLandingDirectory, "1", "FTP Old Market Hospital Week 1")
addFileToLandingArea(OldMarketSourceFolder, OldMarketLandingDirectory, "2", "FTP Old Market Hospital Week 2")
addFileToLandingArea(OldMarketSourceFolder, OldMarketLandingDirectory, "3", "FTP Old Market Hospital Week 3")
addFileToLandingArea(OldMarketSourceFolder, OldMarketLandingDirectory, "4", "FTP Old Market Hospital Week 4")
addFileToLandingArea(OldMarketSourceFolder, OldMarketLandingDirectory, "5", "FTP Old Market Hospital Week 5")
print("Done. ")


----

The files added to Oak Dene Hospital's landing folder are immedately catalogued as new assets by its integration connector.  The newly created assets are picked up by the *watchdog governance service* and it kicks off the *governance action process* which moves them into the data lake folder and sets up the origin and governance zone classifications on the asset so they are ready for use.

Peter checks there are no new files catalogued for the landing area.

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing-area/hospitals/oak-dene/clinical-trials/drop-foot/.*")


----

They have been populated into the data lake with the correct classifications.

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*clinical-trials/drop-foot/weekly-measurements/.*")


----

This is, however, the first time files from Old Market Hospital have been added to its landing folder.  Its integration connector is still in `FAILED` status because when it started up, the landing area folder for Old Market Hospital had not been created.  The command below shows the status of the integration connector.

----

In [None]:

getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)


----

Because the integration connector is in `FAILED` status, it is not cataloguing the new files.

----

In [None]:

assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing-area/hospitals/old-market/clinical-trials/drop-foot/.*")


----

The code below restarts the Old Market Hospital integration connnector in `exchangeDL01`.  It will begin cataloguing the files in that landing area, which will be picked up by the watchdog integration connector, which will kick off the governance action process to move the files to the data lake folder.

----

In [None]:

restartIntegrationConnector(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId, "files-integrator", OldMarketLandingAreaConnectorName)


----

When we check the status of the integration daemon, all connectors are running.

----

In [None]:

getIntegrationDaemonStatus(exchangeDL01Name, exchangeDL01PlatformName, exchangeDL01PlatformURL, petersUserId)


----

The files are now being catalogued and moved into the data lake folder.  You can see the status of the governance actions as they run to completion by repeatedly running the monitorGovernanceActions command.  You may the governance actions in different states.  The picture below shows the states a governance action moves through. 

<img src="https://egeria-project.org/patterns/metadata-governance/governance-action-status.svg">


When all of the files have been onboarded, all of the governance actions are in `ACTIONED` status exception the watchdog governance action which is `IN_PROGRESS`, listening for new files.  

----

In [None]:

monitorGovernanceActions(cocoMDS2Name, cocoMDS2PlatformName, cocoMDS2PlatformURL, petersUserId)
time.sleep(3)

----

Peter checks that all of the files have been created and catalogued in the data lake folder ...

----

In [None]:
assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*DropFoot.*")

----

Then Peter checks that landing area folder from Oak Dene Hospital still only contains the two original test files ...

----

In [None]:
oakDeneLandingAreaFullPath = landingAreaDirectoryName + "/oak-dene/clinical-trials/drop-foot"
#
#
# When running in a kubernetes environment, the os.listdir command will fail 
# because the topology of the system is different. Consequently the default is to comment out this command.
# If your topology allows (e.g. you are running Egeria locally to your Jupyter server), feel free to uncomment this 
# so that you can watch the files change.
#
# os.listdir (oakDeneLandingAreaFullPath)


----

A similar check that the Old Market Hospital landing area is empty ...

----

In [None]:
oldMarketLandingAreaFullPath = landingAreaDirectoryName + "/old-market/clinical-trials/drop-foot"
#
#
# When running in a kubernetes environment, the os.listdir command will fail 
# because the topology of the system is different. Consequently the default is to comment out this command.
# If your topology allows (e.g. you are running Egeria locally to your Jupyter server), feel free to uncomment this 
# so that you can watch the files change.
#
# os.listdir (oldMarketLandingAreaFullPath)

----

There should be no catalog entries retrieved for any of the new files that were in the landing area.

----

In [None]:
assetOwnerPrintAssets(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, petersUserId, ".*landing-area/hospitals/.*csv")

----

However, they are still in the catalog, just archived so they are available for lineage.  This next function retrieves the archived file asset.  Notice the `forLineage=true` parameter on the URL.  This tells Egeria that the query is a lineage query and archived assets should be removed.

----

In [None]:
def getArchivedFileByPathName(serverName, serverPlatformName, serverPlatformURL, serviceURLMarker, userId, assetPathName):
    metadataStoreURL = serverPlatformURL + '/servers/' + serverName + '/open-metadata/framework-services/' + serviceURLMarker + '/open-metadata-store/users/' + userId
    getAsset = metadataStoreURL + '/metadata-elements/by-unique-name?forLineage=true&forDuplicateProcessing=false&effectiveTime=0'
    getAssetBody = {
        "class" : "NameRequestBody",
        "name" : assetPathName,
        "nameParameterName" : "pathName",
        "namePropertyName" : "pathName"
    }
    response=issuePost(getAsset, getAssetBody)
    asset = response.json().get('element')
    if asset:
        return asset
    else:
        print ("No Asset returned")
        processErrorResponse(serverName, serverPlatformName, serverPlatformURL, response) 

----

Peter retrieves the asset for `week3.csv`.   Notice that there is a *Memento* classification attached.  This is how archived files are identified. 

----

In [None]:
asset = getArchivedFileByPathName(cocoMDS1Name, cocoMDS1PlatformName, cocoMDS1PlatformURL, "asset-owner", petersUserId, oakDeneLandingAreaFullPath + "/week3.csv")

print("Element: " + asset.get('elementGUID'))
elementProperties = asset.get('elementProperties')
if elementProperties:
    print("Properties:")
    properties = elementProperties.get('propertyValueMap')
    propertyList = list(properties)
    for x in range(len(propertyList)):
        propertyStructure = properties.get(propertyList[x])
        propertyValue = propertyStructure.get('primitiveValue')
        print("    " + propertyList[x] + ": " + propertyValue)

classifications = asset.get('classifications')
if classifications:
    print("Classifications:")
    for x in range(len(classifications)):
        print("    " + classifications[x].get('classificationName'))




----

<img src="https://raw.githubusercontent.com/odpi/egeria-docs/main/site/docs/practices/coco-pharmaceuticals/personas/callie-quartile.png" style="float:right">

Finally we check that Callie Quartile, a data scientist working in the research team, is able to see the files from `cocoMDS3`:



In [None]:

assetConsumerPrintAssets(cocoMDS3Name, cocoMDS3PlatformName, cocoMDS3PlatformURL, calliesUserId, ".*clinical-trials/drop-foot/weekly-measurements.*")


----
## Where to next

* [Improving Data Quality Lab](improving-data-quality-lab.ipynb) - follow Peter as he makes use of automated metadata discovery to detect quality errors in the
  measurement files from the hospitals.
* [Understanding an Asset](understanding-an-asset.ipynb) - work with Callie to retrieve and understand various assets.

----