![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/master/assets/img/ODPi_Egeria_Logo_color.png)

### ODPi Egeria Hands-On Lab
# Welcome to the Building a Data Catalog Lab

## Introduction

ODPi Egeria is an open source project that provides open standards and implementation libraries to connect tools, catalogues and platforms together so they can share information about data and technology (called metadata).

In this hands-on lab you will get a chance to work with three ODPi Egeria metadata servers to build a distributed catalog of data assets and then experiment with attaching feedback (comments) to the catalog entries from different servers.

## The Scenario

The ODPi Egeria team use the personas from the fictitious company called Coco Pharmaceuticals.  (See https://opengovernance.odpi.org/coco-pharmaceuticals/ for more information).

The two main character engaged in this scenario are Peter Profile and Erin Overview.

![Peter and Erin](../images/peter-and-erin.png)

In [None]:
petersUserId = "peterprofile"
erinsUserId  = "erinoverview"

Peter and Erin are cataloguing new data sets that have been received from a hospital.  These data sets are part of a clinical trial that the hospital is participating in.

## Setting up

Coco Pharmaceuticals make widespread use of ODPi Egeria for tracking and managing their data and related assets.
Figure 1 below shows their metadata servers and the platforms that are hosting them.

![Figure 1](../images/coco-pharmaceuticals-systems-omag-server-platforms.png)
> **Figure 1:** Coco Pharmaceuticals' OMAG Server Platforms

In [None]:
import os

corePlatformURL     = os.environ.get('corePlatformURL','http://localhost:8080') 
dataLakePlatformURL = os.environ.get('dataLakePlatformURL','http://localhost:8081') 
devPlatformURL      = os.environ.get('devPlatformURL','http://localhost:8082')

Peter is using the data lake operations metadata server called `cocoMDS1`. This server is hosted on the Data Lake OMAG Server Platform.

In [None]:
server1            = "cocoMDS1"
server1PlatformURL = dataLakePlatformURL

The following request checks that this server is running.

In [None]:
import requests
import pprint
import json

adminUserId = "garygeeke"

isServer1ActiveURL = server1PlatformURL + "/open-metadata/platform-services/users/" + adminUserId + "/server-platform/servers/" + server1 + "/status"

print (" ")
print ("GET " + isServer1ActiveURL)
print (" ")

response = requests.get(isServer1ActiveURL)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

serverStatus = response.json().get('active')
if serverStatus == True:
    print("Server " + server1 + " is active - ready to begin")
else:
    print("Server " + server1 + " is down - start it before proceeding")


----
If you see `Server cocoMDS1 is active - ready to begin` then the server is running.  If the server is down, follow the instructions in the **Managing Servers** notebook to start the server.

----
## Exercise 1

### Adding assets to the catalog

In the first exercise, Peter Profile is adding some new data sets (assets) to the catalog. 

Peter uses the **Asset Owner** Open Metadata Access Service (OMAS) API to manage assets in the catalog.  All of the request for the Asset Owner OMAS begin with the following URL root.

In [None]:
server1AssetOwnerURL = server1PlatformURL + '/servers/' + server1 + '/open-metadata/access-services/asset-owner/users/' + petersUserId 

First Peter will query the current list of Clinical Trial Assets from cocoMDS1.

In [None]:

server1GetAssetsURL = server1AssetOwnerURL + '/assets/by-name?startFrom=0&pageSize=50'
searchString="Drop Foot"

print (" ")
print ("GET " + server1GetAssetsURL)
print ("{ " + searchString + " }")
print (" ")

response=requests.post(server1GetAssetsURL, data=searchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('assets'):
    if len(response.json().get('assets')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('assets'))) + " assets found")
else:
    print ("No assets found")


----
We can see here that no assets are returned as the repository is empty.

#### Adding weekly clinical trial assets


Peter is now going to create three weeks of clinical asset data. These are 3 data sets. We'll start with week 1

In [None]:
server1CreateAssetURL = server1AssetOwnerURL + '/assets/csv-files'

print (" ")
print ("POST: " + server1CreateAssetURL)

jsonHeader = {'content-type':'application/json'}
createAssetBody = {
	"class" : "NewFileAssetRequestBody",
	"displayName" : "Week 1: Drop Foot Clinical Trial Measurements",
	"description" : "One week's data covering foot angle, hip displacement and mobility measurements.",
	"fullPath" : "file://secured/research/clinical-trials/drop-foot/DropFootMeasurementsWeek1.csv"
}

response=requests.post(server1CreateAssetURL, json=createAssetBody, headers=jsonHeader)

response.json()


----
Notice the response includes a property called “guid”.  This is the unique identifier of the asset and we need to save it away in a variable to use later

In [None]:
asset1guid=response.json().get('guid')

print (" ")
print ("The guid for asset 1 is: " + asset1guid)
print (" ")


----
Now let's take a look again at what assets are in the repository using the same get request we used earlier.


In [None]:

print (" ")
print ("GET " + server1GetAssetsURL)
print ("{ " + searchString + " }")
print (" ")

response=requests.post(server1GetAssetsURL, data=searchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('assets'):
    if len(response.json().get('assets')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('assets'))) + " assets found")
else:
    print ("No assets found")


----

Peter is now going to add the next two weeks of assets

In [None]:

csvbody2 = {
	"class" : "NewFileAssetRequestBody",
	"displayName" : "Week 2: Drop Foot Clinical Trial Measurements",
	"description" : "One week's data covering foot angle, hip displacement and mobility measurements.",
	"fullPath" : "file://secured/research/clinical-trials/drop-foot/DropFootMeasurementsWeek2.csv"
}

response2=requests.post(server1CreateAssetURL, json=csvbody2, headers=jsonHeader)

print ("Second request responded with: " + str(response2.status_code))

asset2guid=response2.json().get('guid')


csvbody3 = {
	"class" : "NewFileAssetRequestBody",
	"displayName" : "Week 3: Drop Foot Clinical Trial Measurements",
	"description" : "One week's data covering foot angle, hip displacement and mobility measurements.",
	"fullPath" : "file://secured/research/clinical-trials/drop-foot/DropFootMeasurementsWeek3.csv"
}

response3=requests.post(server1CreateAssetURL, json=csvbody3, headers=jsonHeader)

print ("Third request responded with: "  + str(response3.status_code))

asset3guid=response3.json().get('guid')

print (" ")
print ('Asset 1 guid is: ' + asset1guid)
print ('Asset 2 guid is: ' + asset2guid)
print ('Asset 3 guid is: ' + asset3guid)


----
Peter has successfully created three assets:

In [None]:

print (" ")
print ("GET " + server1GetAssetsURL)
print ("{ " + searchString + " }")
print (" ")

response=requests.post(server1GetAssetsURL, data=searchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('assets'):
    if len(response.json().get('assets')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('assets'))) + " assets found")
else:
    print ("No assets found")
    

----
## Exercise 2 - Sharing the catalog and adding feedback

In this next exercise Erin is going to work with the assets that Peter created.  Erin is part of the governance team.  She is accessing
metadata using the `cocoMDS2` server.  It sits on the core OMAG Server Platform.

![Figure 1](../images/coco-pharmaceuticals-systems-omag-server-platforms.png)
> **Figure 1:** Coco Pharmaceuticals' OMAG Server Platforms (repeat)

In [None]:
server2            = "cocoMDS2"
server2PlatformURL = corePlatformURL

This next code checks that cocoMDS2 is running ...

In [None]:

isServer2ActiveURL = server2PlatformURL + "/open-metadata/platform-services/users/" + adminUserId + "/server-platform/servers/" + server2 + "/status"

print (" ")
print ("GET " + isServer2ActiveURL)
print (" ")

response = requests.get(isServer2ActiveURL)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

serverStatus = response.json().get('active')
if serverStatus == True:
    print("Server " + server2 + " is active - ready to begin")
else:
    print("Server " + server2 + " is down - start it before proceeding")


----
If you see Server cocoMDS2 is active - ready to begin then the server is running. If the server is down, follow the instructions in the **Managing Servers** notebook to start the server.

----
The metadata servers `cocoMDS1` and `cocoMDS2` are part of the same open metadata cohort called `cocoCohort`.  This means that they are actively sharing metadata.

![Figure 2](../images/coco-pharmaceuticals-systems-metadata-servers.png)
> **Figure 2:** Membership of Coco Pharmaceuticals' cohorts

----
Even though Erin is connected to a different server to Peter, she can see the same assets.

In [None]:

server2AssetConsumerURL = server2PlatformURL + '/servers/' + server2 + '/open-metadata/access-services/asset-consumer/users/' + erinsUserId 
server2GetAssetsURL = server2AssetConsumerURL + '/assets/by-name?startFrom=0&pageSize=50'

print (" ")
print ("GET " + server2GetAssetsURL)
print ("{ " + searchString + " }")
print (" ")

response=requests.post(server2GetAssetsURL, data=searchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('assets'):
    if len(response.json().get('assets')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('assets'))) + " assets found")
else:
    print ("No assets found")


----
Erin looks at the new assets that Peter has defined and has a question.  She adds a comment to the first asset.

In [None]:

server2AddCommentURL = server2AssetConsumerURL + '/assets/' + asset1guid + '/comments'

print("")
print ("POST " + server2AddCommentURL)

commentBody={
	"class" : "CommentRequestBody",
	"commentType" : "QUESTION",
	"commentText" : "This file has much less data than normal.  Did the hospital provide any additional information about this batch to explain it?",
    "isPublic" : True
}
addCommentResponse = requests.post(server2AddCommentURL, json=commentBody, headers=jsonHeader)

addCommentResponse.json()

In [None]:
commentGUID = addCommentResponse.json().get('guid')

print (" ")
print ('Erin\'s comment guid is: ' + commentGUID)

----
The comment is attached to the asset.  Peter can query an asset's comments as follows:

In [None]:

server1ConnectedAssetURL = server1PlatformURL + '/servers/' + server1 + '/open-metadata/common-services/asset-consumer/connected-asset/users/' + petersUserId 
server1CommentQuery = server1ConnectedAssetURL + '/assets/' + asset1guid + '/comments?elementStart=0&maxElements=50'

print (" ")
print ("GET " + server1CommentQuery)

getCommentsResponse = requests.get(server1CommentQuery)
getCommentsResponse.json()


----
He replies to Erin's question

In [None]:

server1AssetConsumerURL = server1PlatformURL + '/servers/' + server1 + '/open-metadata/access-services/asset-consumer/users/' + petersUserId 
server1CommentReplyURL = server1AssetConsumerURL + '/assets/' + asset1guid + '/comments/' + commentGUID + '/replies'

print (" ")
print ("POST " + server1CommentReplyURL)

commentReplyBody={
	"class" : "CommentRequestBody",
	"commentType" : "ANSWER",
	"commentText" : "I checked back with Bobbie Records and they had an air conditioning failure that caused them to cancel patient appointments for 2 days - hence less data.  They are working to catch up on their waiting list so expect increased data for the next few weeks.",
    "isPublic" : True
}

addCommentReplyResponse = requests.post(server1CommentReplyURL, json=commentReplyBody, headers=jsonHeader)
addCommentReplyResponse.json()

----
Erin views the reply.

In [None]:
server2ConnectedAssetURL = server2PlatformURL + '/servers/' + server2 + '/open-metadata/common-services/asset-consumer/connected-asset/users/' + erinsUserId 
server2CommentReplyQuery = server2ConnectedAssetURL + '/assets/' + asset1guid + '/comments/' + commentGUID + '/replies?elementStart=0&maxElements=50'

print (" ")
print ("GET " + server2CommentReplyQuery)

getCommentRepliesResponse = requests.get(server2CommentReplyQuery)

getCommentRepliesResponse.json()

----
This is the current information known about the first asset:

In [None]:
server2GetAsset1 = server2ConnectedAssetURL + '/assets/' + asset1guid

print (" ")
print ("GET " + server2GetAsset1)

getAssetResponse = requests.get(server2GetAsset1)

getAssetResponse.json()

In [None]:
server2GetRelatedAssets1 = server2ConnectedAssetURL + '/assets/' + asset1guid + '/related-assets?elementStart=0&maxElements=50'

print (" ")
print ("GET " + server2GetRelatedAssets1)

getAssetResponse = requests.get(server2GetRelatedAssets1)

getAssetResponse.json()

----
## Exercise 3 - controlling access to assets

