![Egeria Logo](https://raw.githubusercontent.com/odpi/egeria/master/assets/img/ODPi_Egeria_Logo_color.png)

### ODPi Egeria Hands-On Lab
# Welcome to the Building a Data Catalog Lab

## Introduction

ODPi Egeria is an open source project that provides open standards and implementation libraries to connect tools, catalogues and platforms together so they can share information about data and technology (called metadata).

In this hands-on lab you will get a chance to work with three ODPi Egeria metadata servers to build a distributed catalog of data assets and then experiment with attaching feedback (comments) to the catalog entries from different servers.  We will also cover how governance zones can be used to group assets together and control who can discover them in the data catalog.

## The Scenario

The ODPi Egeria team use the personas and scenarios from the fictitious company called Coco Pharmaceuticals.  (See https://opengovernance.odpi.org/coco-pharmaceuticals/ for more information).

As part of the huge business transformation that Coco Pharmaceuticals has embarked on, they
have created a data lake for managing data for research, analytics, exchange between their internal organizations and business partners (such as hospitals).  As a result, the data lake has to be
designed to handle a wide variety of data, including some highly sensitive and regulated data.

In this lab we look at how data is cataloged in the data lake.  The two main character engaged in the first part of this lab are Peter Profile and Erin Overview.

![Peter and Erin](../images/peter-and-erin.png)

In [260]:
petersUserId = "peterprofile"
erinsUserId  = "erinoverview"

Peter and Erin are cataloguing new data sets that have been received from a hospital.  These data sets are part of a clinical trial that the hospital is participating in.

## Setting up

Coco Pharmaceuticals make widespread use of ODPi Egeria for tracking and managing their data and related assets.
Figure 1 below shows their metadata servers and the Open Metadata and Governance (OMAG) Server Platforms that are hosting them.  Each metadata server supports a department in the organization.  The servers are distributed across the platform to even out the workload.  Servers can be moved to a different platform if needed.

![Figure 1](../images/coco-pharmaceuticals-systems-omag-server-platforms.png)
> **Figure 1:** Coco Pharmaceuticals' OMAG Server Platforms

The code below sets up the newtwork addresses for the three platforms.  This varies depending on whether you are running them locally, through **docker-compose** or on **kubernetes**.

In [261]:
import os

corePlatformURL     = os.environ.get('corePlatformURL','http://localhost:8080') 
dataLakePlatformURL = os.environ.get('dataLakePlatformURL','http://localhost:8081') 
devPlatformURL      = os.environ.get('devPlatformURL','http://localhost:8082')

Peter is using the data lake operations metadata server called `cocoMDS1`. This server is hosted on the Data Lake OMAG Server Platform.

In [262]:
server1            = "cocoMDS1"
server1PlatformURL = dataLakePlatformURL

The following request checks that the `cocoMDS1` server is running.  The user id `garygeeke` used for the command belongs to Gary Geeke who is the IT Administration Leader and has permission to issue these types of commands.

In [263]:
import requests
import pprint
import json

adminUserId = "garygeeke"

isServer1ActiveURL = server1PlatformURL + "/open-metadata/platform-services/users/" + adminUserId + "/server-platform/servers/" + server1 + "/status"

print (" ")
print ("GET " + isServer1ActiveURL)
print (" ")

response = requests.get(isServer1ActiveURL)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

serverStatus = response.json().get('active')
if serverStatus == True:
    print("Server " + server1 + " is active - ready to begin")
else:
    print("Server " + server1 + " is down - start it before proceeding")


 
GET http://localhost:8081/open-metadata/platform-services/users/garygeeke/server-platform/servers/cocoMDS1/status
 
Returns:
{
    "relatedHTTPCode": 200,
    "serverName": "cocoMDS1",
    "serverStartTime": "2019-09-24T22:02:19.163+0000",
    "active": true
}
 
Server cocoMDS1 is active - ready to begin


----
If you see `Server cocoMDS1 is active - ready to begin` then the server is running.  If the server is down, follow the instructions in the **Managing Servers** notebook to start the server.

----
## Exercise 1

### Adding assets to the catalog

In the first exercise, Peter Profile is adding descriptions of some new data sets to the catalog. They are stored in the catalog as **Assets**.  An Asset represent a real resource of value that needs to be governed to ensure it is properly managed and used.

Every Asset identifies the owner of the resource.  This is either a person or a team.  The owner's role is to set up the Asset with the correct properties to ensure that the real resources (data sets in this case) are managed correctly.  This management is performed by tools, platforms and engines that host and/or work with the real resources.  If these technologies can connect to an open metadata repository, they can read these properties directly and ensure the correct actions are taken.  Some technologies do not support a direct connection to an open metadata repository.  Egeria provides governance servers to push the Asset properties to these types of technologies.

In either case, the owner's role in setting up the correct properties is an important one.

Peter will be acting a the owner of these new data sers. He uses the **Asset Owner** Open Metadata Access Service (OMAS) API to set up the Assets in the catalog.  

All of the request for the Asset Owner OMAS begin with the following URL root.

In [264]:
server1AssetOwnerURL = server1PlatformURL + '/servers/' + server1 + '/open-metadata/access-services/asset-owner/users/' + petersUserId

print (" ")
print (server1AssetOwnerURL)
print (" ")

 
http://localhost:8081/servers/cocoMDS1/open-metadata/access-services/asset-owner/users/peterprofile
 


----
Before adding the new Assets, Peter queries the current list of Clinical Trial Assets from cocoMDS1 to check that these data sets have not been added already.

In [265]:

server1GetAssetsURL = server1AssetOwnerURL + '/assets/by-search-string?startFrom=0&pageSize=50'
searchString=".*file.*"

print (" ")
print ("GET " + server1GetAssetsURL)
print ("{ " + searchString + " }")
print (" ")

response=requests.post(server1GetAssetsURL, data=searchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('assets'):
    if len(response.json().get('assets')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('assets'))) + " assets found")
else:
    print ("No assets found")


 
GET http://localhost:8081/servers/cocoMDS1/open-metadata/access-services/asset-owner/users/peterprofile/assets/by-search-string?startFrom=0&pageSize=50
{ .*file.* }
 
Returns:
{
    "class": "AssetsResponse",
    "relatedHTTPCode": 200,
    "startingFromElement": 0
}
 
No assets found


----
We can see here that no assets are returned as the repository is empty.

#### Adding weekly clinical trial assets


Peter is now going to create three weeks of clinical asset data. This data is stored in three data sets, one for each week.

He begins with week 1.  The Asset he creates includes the full path of the data set as well as some descriptive information.  This descriptive information helps others to locate and understand the data set.

In [266]:
server1CreateAssetURL = server1AssetOwnerURL + '/assets/data-files/csv'

print (" ")
print ("POST: " + server1CreateAssetURL)

jsonHeader = {'content-type':'application/json'}
createAssetBody = {
	"class" : "NewCSVFileAssetRequestBody",
	"displayName" : "Week 1: Drop Foot Clinical Trial Measurements",
	"description" : "One week's data covering foot angle, hip displacement and mobility measurements.",
	"fullPath" : "file://secured/research/clinical-trials/drop-foot/DropFootMeasurementsWeek1.csv"
}

response=requests.post(server1CreateAssetURL, json=createAssetBody, headers=jsonHeader)

response.json()


 
POST: http://localhost:8081/servers/cocoMDS1/open-metadata/access-services/asset-owner/users/peterprofile/assets/data-files/csv


{'relatedHTTPCode': 200,
 'guids': ['c2d3a05d-3f55-4e2e-80e9-1c2e58477f11',
  'b8269848-517e-43f6-8065-d2a0946aadc6',
  '316ec7cd-8717-4716-8358-a128539f81b2',
  'b61ac73b-91ed-49f3-b296-02780d5c728d',
  '74cea331-0378-4c55-af6b-83865639031c',
  'fc8fc4a7-09ba-403e-aac9-2da4b787bb4a']}

----
Notice the response includes a property called “guids”.  This is the list of unique identifiers (GUIDs) of the chain of assets for the folder structure and the file itself.

![Figure 2](../images/file-asset-hierarchy.png)
> **Figure 2:** Hierarchy of assets for a file

We need to save the file's unique identifier (the last one in the list) in a variable to use later.

In [267]:
asset1guids=response.json().get('guids')

if asset1guids == None:
    asset1guid="<unknown>"
else:
    for guid in asset1guids:
        asset1guid=guid

print (" ")
print ("The GUID for asset 1 is: " + asset1guid)
print (" ")


 
The GUID for asset 1 is: fc8fc4a7-09ba-403e-aac9-2da4b787bb4a
 


----
Now let's take a look again at what assets are in the repository using the same get request we used earlier.


In [268]:

print (" ")
print ("GET " + server1GetAssetsURL)
print ("{ " + searchString + " }")
print (" ")

response=requests.post(server1GetAssetsURL, data=searchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('assets'):
    if len(response.json().get('assets')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('assets'))) + " assets found")
else:
    print ("No assets found")


 
GET http://localhost:8081/servers/cocoMDS1/open-metadata/access-services/asset-owner/users/peterprofile/assets/by-search-string?startFrom=0&pageSize=50
{ .*file.* }
 
Returns:
{
    "class": "AssetsResponse",
    "relatedHTTPCode": 200,
    "startingFromElement": 0,
    "assets": [
        {
            "class": "Asset",
            "type": {
                "class": "ElementType",
                "elementTypeId": "229ed5cc-de31-45fc-beb4-9919fd247398",
                "elementTypeName": "FileFolder",
                "elementSuperTypeNames": [
                    "DataStore",
                    "Asset",
                    "Referenceable"
                ],
                "elementTypeVersion": 1,
                "elementTypeDescription": "A description of a folder (directory) in a file system.",
                "elementSourceServer": "cocoMDS1",
                "elementOrigin": "LOCAL_COHORT",
                "elementHomeMetadataCollectionId": "9aea0985-fc43-455d-8446-80dc7e63073a"

----

Notice that five assets are returned.  Four are folders and one is for the file.  The file system is not returned because strictly speaking, it is not an [Asset](https://egeria.odpi.org/open-metadata-publication/website/open-metadata-types/0010-Base-Model.html), it is a [SoftwareServerCapability](https://egeria.odpi.org/open-metadata-publication/website/open-metadata-types/0042-Software-Server-Capabilities.html).  This is part of a [SoftwareServer](https://egeria.odpi.org/open-metadata-publication/website/open-metadata-types/0040-Software-Servers.html) description.

Peter is now going to add the files for the next two weeks:

In [269]:
csvbody2 = {
	"class" : "NewCSVFileAssetRequestBody",
	"displayName" : "Week 2: Drop Foot Clinical Trial Measurements",
	"description" : "One week's data covering foot angle, hip displacement and mobility measurements.",
	"fullPath" : "file://secured/research/clinical-trials/drop-foot/DropFootMeasurementsWeek2.csv"
}

response2=requests.post(server1CreateAssetURL, json=csvbody2, headers=jsonHeader)

asset2guids=response2.json().get('guids')

for guid in asset2guids:
    asset2guid=guid
    
print ("Request to create the week 2 Asset responded with: " )
response2.json()

Request to create the week 2 Asset responded with: 


{'relatedHTTPCode': 200,
 'guids': ['c2d3a05d-3f55-4e2e-80e9-1c2e58477f11',
  'b8269848-517e-43f6-8065-d2a0946aadc6',
  '316ec7cd-8717-4716-8358-a128539f81b2',
  'b61ac73b-91ed-49f3-b296-02780d5c728d',
  '74cea331-0378-4c55-af6b-83865639031c',
  'b23c1e6e-33f0-4b41-8b7d-d4bf89b30c8d']}

In [270]:
csvbody3 = {
	"class" : "NewCSVFileAssetRequestBody",
	"displayName" : "Week 3: Drop Foot Clinical Trial Measurements",
	"description" : "One week's data covering foot angle, hip displacement and mobility measurements.",
	"fullPath" : "file://secured/research/clinical-trials/drop-foot/DropFootMeasurementsWeek3.csv"
}

response3=requests.post(server1CreateAssetURL, json=csvbody3, headers=jsonHeader)

asset3guids=response3.json().get('guids')

for guid in asset3guids:
    asset3guid=guid

print ("Request to create the week 3 Asset responded with: " )
response3.json()

Request to create the week 3 Asset responded with: 


{'relatedHTTPCode': 200,
 'guids': ['c2d3a05d-3f55-4e2e-80e9-1c2e58477f11',
  'b8269848-517e-43f6-8065-d2a0946aadc6',
  '316ec7cd-8717-4716-8358-a128539f81b2',
  'b61ac73b-91ed-49f3-b296-02780d5c728d',
  '74cea331-0378-4c55-af6b-83865639031c',
  '206960ce-26a0-4622-a09f-dbea325ea3e8']}

In [271]:
print (" ")
print ("Summary of the assets so far:")
print (' Asset 1 GUID is: ' + asset1guid)
print (' Asset 2 GUID is: ' + asset2guid)
print (' Asset 3 GUID is: ' + asset3guid)

 
Summary of the assets so far:
 Asset 1 GUID is: fc8fc4a7-09ba-403e-aac9-2da4b787bb4a
 Asset 2 GUID is: b23c1e6e-33f0-4b41-8b7d-d4bf89b30c8d
 Asset 3 GUID is: 206960ce-26a0-4622-a09f-dbea325ea3e8


----
Peter has successfully onboarded three file assets.  When we query the assets again, there are now seven assets.  All of the files are stored in the same folder on disk, so all of the Assets for these files are stored under the same FileFolder Asset in the metadata server.  So there are now four FileFolder Assets and 3 DataFile Assets.

In [272]:

print (" ")
print ("GET " + server1GetAssetsURL)
print ("{ " + searchString + " }")
print (" ")

response=requests.post(server1GetAssetsURL, data=searchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('assets'):
    if len(response.json().get('assets')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('assets'))) + " assets found")
else:
    print ("No assets found")
    

 
GET http://localhost:8081/servers/cocoMDS1/open-metadata/access-services/asset-owner/users/peterprofile/assets/by-search-string?startFrom=0&pageSize=50
{ .*file.* }
 
Returns:
{
    "class": "AssetsResponse",
    "relatedHTTPCode": 200,
    "startingFromElement": 0,
    "assets": [
        {
            "class": "Asset",
            "type": {
                "class": "ElementType",
                "elementTypeId": "229ed5cc-de31-45fc-beb4-9919fd247398",
                "elementTypeName": "FileFolder",
                "elementSuperTypeNames": [
                    "DataStore",
                    "Asset",
                    "Referenceable"
                ],
                "elementTypeVersion": 1,
                "elementTypeDescription": "A description of a folder (directory) in a file system.",
                "elementSourceServer": "cocoMDS1",
                "elementOrigin": "LOCAL_COHORT",
                "elementHomeMetadataCollectionId": "9aea0985-fc43-455d-8446-80dc7e63073a"

----
## Exercise 2 - Sharing the catalog and adding feedback

In this next exercise Erin is going to work with the assets that Peter created.  Erin is part of the governance team.  She is accessing
metadata using the `cocoMDS2` server.  It sits on the core OMAG Server Platform.

![Figure 1](../images/coco-pharmaceuticals-systems-omag-server-platforms.png)
> **Figure 1:** Coco Pharmaceuticals' OMAG Server Platforms (repeat)

So Erin is using a different server located on a different platform to Peter.

In [273]:
server2            = "cocoMDS2"
server2PlatformURL = corePlatformURL

This next code checks that cocoMDS2 is running ...

In [274]:

isServer2ActiveURL = server2PlatformURL + "/open-metadata/platform-services/users/" + adminUserId + "/server-platform/servers/" + server2 + "/status"

print (" ")
print ("GET " + isServer2ActiveURL)
print (" ")

response = requests.get(isServer2ActiveURL)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

serverStatus = response.json().get('active')
if serverStatus == True:
    print("Server " + server2 + " is active - ready to begin")
else:
    print("Server " + server2 + " is down - start it before proceeding")


 
GET http://localhost:8080/open-metadata/platform-services/users/garygeeke/server-platform/servers/cocoMDS2/status
 
Returns:
{
    "relatedHTTPCode": 200,
    "serverName": "cocoMDS2",
    "serverStartTime": "2019-09-24T22:02:14.752+0000",
    "active": true
}
 
Server cocoMDS2 is active - ready to begin


----
If you see Server cocoMDS2 is active - ready to begin then the server is running. If the server is down, follow the instructions in the **Managing Servers** notebook to start the server.

----
The metadata servers `cocoMDS1` and `cocoMDS2` are part of the same open metadata cohort called `cocoCohort`.  This means that they are actively sharing metadata.

![Figure 3](../images/coco-pharmaceuticals-systems-metadata-servers.png)
> **Figure 3:** Membership of Coco Pharmaceuticals' cohorts

----
Even though Erin is connected to a different server to Peter, she can see the same assets.  The search request below uses the Asset Consumer's OMAS interface of cocoMDS2 to return the unique identifiers (GUIDs) of the assets for the three new files.

In [275]:
newFilesSearchString=".*Drop Foot Clinical Trial Measurements.*"

server2AssetConsumerURL = server2PlatformURL + '/servers/' + server2 + '/open-metadata/access-services/asset-consumer/users/' + erinsUserId 
server2GetAssetsURL = server2AssetConsumerURL + '/assets/by-search-string?startFrom=0&pageSize=50'


print (" ")
print ("GET " + server2GetAssetsURL)
print ("{ " + newFilesSearchString + " }")
print (" ")

response=requests.post(server2GetAssetsURL, data=newFilesSearchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('guids'):
    if len(response.json().get('guids')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('guids'))) + " GUIDs for matching assets found")
else:
    print ("No assets found")


 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/access-services/asset-consumer/users/erinoverview/assets/by-search-string?startFrom=0&pageSize=50
{ .*Drop Foot Clinical Trial Measurements.* }
 
Returns:
{
    "relatedHTTPCode": 200,
    "guids": [
        "fc8fc4a7-09ba-403e-aac9-2da4b787bb4a",
        "206960ce-26a0-4622-a09f-dbea325ea3e8",
        "b23c1e6e-33f0-4b41-8b7d-d4bf89b30c8d"
    ]
}
 
3 GUIDs for matching assets found


----
These are the same GUIDs as the ones saved when Peter created the assets:

In [276]:
print (" ")
print ("Summary of the assets so far:")
print (' Asset 1 GUID is: ' + asset1guid)
print (' Asset 2 GUID is: ' + asset2guid)
print (' Asset 3 GUID is: ' + asset3guid)

 
Summary of the assets so far:
 Asset 1 GUID is: fc8fc4a7-09ba-403e-aac9-2da4b787bb4a
 Asset 2 GUID is: b23c1e6e-33f0-4b41-8b7d-d4bf89b30c8d
 Asset 3 GUID is: 206960ce-26a0-4622-a09f-dbea325ea3e8


----
Erin looks at the new assets that Peter has defined and has a question.  She adds a comment to the first asset.

In [277]:

server2AddCommentURL = server2AssetConsumerURL + '/assets/' + asset1guid + '/comments'

print("")
print ("POST " + server2AddCommentURL)

commentBody={
	"class" : "CommentRequestBody",
	"commentType" : "QUESTION",
	"commentText" : "This file has much less data than normal.  Did the hospital provide any additional information about this batch to explain it?",
    "isPublic" : True
}
addCommentResponse = requests.post(server2AddCommentURL, json=commentBody, headers=jsonHeader)

addCommentResponse.json()


POST http://localhost:8080/servers/cocoMDS2/open-metadata/access-services/asset-consumer/users/erinoverview/assets/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a/comments


{'relatedHTTPCode': 200, 'guid': 'f199724b-f63f-4552-866d-8956626d4ead'}

In [278]:
commentGUID = addCommentResponse.json().get('guid')

print (" ")
print ('Erin\'s comment guid is: ' + commentGUID)

 
Erin's comment guid is: f199724b-f63f-4552-866d-8956626d4ead


----
The comment is attached to the asset.  Peter can query an asset's comments as follows:

In [279]:

server1ConnectedAssetURL = server1PlatformURL + '/servers/' + server1 + '/open-metadata/common-services/asset-consumer/connected-asset/users/' + petersUserId 
server1CommentQuery = server1ConnectedAssetURL + '/assets/' + asset1guid + '/comments?elementStart=0&maxElements=50'

print (" ")
print ("GET " + server1CommentQuery)

getCommentsResponse = requests.get(server1CommentQuery)
getCommentsResponse.json()


 
GET http://localhost:8081/servers/cocoMDS1/open-metadata/common-services/asset-consumer/connected-asset/users/peterprofile/assets/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a/comments?elementStart=0&maxElements=50


{'class': 'CommentsResponse', 'relatedHTTPCode': 200, 'startingFromElement': 0}

----
He replies to Erin's question

In [280]:

server1AssetConsumerURL = server1PlatformURL + '/servers/' + server1 + '/open-metadata/access-services/asset-consumer/users/' + petersUserId 
server1CommentReplyURL = server1AssetConsumerURL + '/assets/' + asset1guid + '/comments/' + commentGUID + '/replies'

print (" ")
print ("POST " + server1CommentReplyURL)

commentReplyBody={
	"class" : "CommentRequestBody",
	"commentType" : "ANSWER",
	"commentText" : "I checked back with Bobbie Records and they had an air conditioning failure that caused them to cancel patient appointments for 2 days - hence less data.  They are working to catch up on their waiting list so expect increased data for the next few weeks.",
    "isPublic" : True
}

addCommentReplyResponse = requests.post(server1CommentReplyURL, json=commentReplyBody, headers=jsonHeader)
addCommentReplyResponse.json()

 
POST http://localhost:8081/servers/cocoMDS1/open-metadata/access-services/asset-consumer/users/peterprofile/assets/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a/comments/f199724b-f63f-4552-866d-8956626d4ead/replies


{'relatedHTTPCode': 200, 'guid': '94ec7b90-dba5-43b8-a618-bfc702e3d0e1'}

----
Erin views the reply.

In [281]:
server2ConnectedAssetURL = server2PlatformURL + '/servers/' + server2 + '/open-metadata/common-services/asset-consumer/connected-asset/users/' + erinsUserId 
server2CommentReplyQuery = server2ConnectedAssetURL + '/assets/' + asset1guid + '/comments/' + commentGUID + '/replies?elementStart=0&maxElements=50'

print (" ")
print ("GET " + server2CommentReplyQuery)

getCommentRepliesResponse = requests.get(server2CommentReplyQuery)

getCommentRepliesResponse.json()

 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/common-services/asset-consumer/connected-asset/users/erinoverview/assets/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a/comments/f199724b-f63f-4552-866d-8956626d4ead/replies?elementStart=0&maxElements=50


{'class': 'CommentsResponse',
 'relatedHTTPCode': 200,
 'startingFromElement': 0,
 'list': [{'class': 'CommentResponse',
   'relatedHTTPCode': 200,
   'comment': {'class': 'Comment',
    'type': {'class': 'ElementType',
     'elementTypeId': '1a226073-9c84-40e4-a422-fbddb9b84278',
     'elementTypeName': 'Comment',
     'elementSuperTypeNames': ['Referenceable'],
     'elementTypeVersion': 1,
     'elementTypeDescription': 'Descriptive feedback or discussion related to an item.',
     'elementSourceServer': 'cocoMDS2',
     'elementOrigin': 'LOCAL_COHORT',
     'elementHomeMetadataCollectionId': 'fac57bd7-0c63-4836-8278-d140cf1b7f4d'},
    'guid': 'f199724b-f63f-4552-866d-8956626d4ead',
    'extendedProperties': {'anchorGUID': 'fc8fc4a7-09ba-403e-aac9-2da4b787bb4a'},
    'commentType': 'QUESTION',
    'commentText': 'This file has much less data than normal.  Did the hospital provide any additional information about this batch to explain it?',
    'user': 'erinoverview',
    'public': 

----
This is the current information known about the first asset:

In [282]:
server2GetAsset1 = server2ConnectedAssetURL + '/assets/' + asset1guid

print (" ")
print ("GET " + server2GetAsset1)

getAssetResponse = requests.get(server2GetAsset1)

getAssetResponse.json()

 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/common-services/asset-consumer/connected-asset/users/erinoverview/assets/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a


{'class': 'AssetResponse',
 'relatedHTTPCode': 200,
 'asset': {'class': 'Asset',
  'type': {'class': 'ElementType',
   'elementTypeId': '2ccb2117-9cee-47ca-8150-9b3a543adcec',
   'elementTypeName': 'CSVFile',
   'elementSuperTypeNames': ['DataFile',
    'DataStore',
    'Asset',
    'Referenceable'],
   'elementTypeVersion': 1,
   'elementTypeDescription': 'A description of a comma separated value (CSV) file',
   'elementSourceServer': 'cocoMDS1',
   'elementOrigin': 'LOCAL_COHORT',
   'elementHomeMetadataCollectionId': '9aea0985-fc43-455d-8446-80dc7e63073a'},
  'guid': 'fc8fc4a7-09ba-403e-aac9-2da4b787bb4a',
  'extendedProperties': {'quoteCharacter': '"',
   'delimiterCharacter': ',',
   'fileType': 'csv'},
  'qualifiedName': 'CSVFile:file://secured/research/clinical-trials/drop-foot/DropFootMeasurementsWeek1.csv',
  'displayName': 'Week 1: Drop Foot Clinical Trial Measurements',
  'description': "One week's data covering foot angle, hip displacement and mobility measurements.",
  'ow

In [283]:
server2GetRelatedAssets1 = server2ConnectedAssetURL + '/assets/' + asset1guid + '/related-assets?elementStart=0&maxElements=50'

print (" ")
print ("GET " + server2GetRelatedAssets1)

getAssetResponse = requests.get(server2GetRelatedAssets1)

getAssetResponse.json()

 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/common-services/asset-consumer/connected-asset/users/erinoverview/assets/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a/related-assets?elementStart=0&maxElements=50


{'class': 'RelatedAssetsResponse',
 'relatedHTTPCode': 200,
 'startingFromElement': 0,
 'list': [{'class': 'RelatedAsset',
   'typeName': 'FileFolder',
   'attributeName': 'homeFolder',
   'relatedAsset': {'class': 'Asset',
    'type': {'class': 'ElementType',
     'elementTypeId': '229ed5cc-de31-45fc-beb4-9919fd247398',
     'elementTypeName': 'FileFolder',
     'elementSuperTypeNames': ['DataStore', 'Asset', 'Referenceable'],
     'elementTypeVersion': 1,
     'elementTypeDescription': 'A description of a folder (directory) in a file system.',
     'elementSourceServer': 'cocoMDS1',
     'elementOrigin': 'LOCAL_COHORT',
     'elementHomeMetadataCollectionId': '9aea0985-fc43-455d-8446-80dc7e63073a'},
    'guid': '74cea331-0378-4c55-af6b-83865639031c',
    'qualifiedName': 'file://secured/research/clinical-trials/drop-foot',
    'displayName': 'secured/research/clinical-trials/drop-foot',
    'owner': 'peterprofile',
    'ownerType': 'USER_ID',
    'zoneMembership': ['quarantine'],
   

## Summary of Exercise 1 and 2

In the first two exercises of this hands-on lab you have shown that two servers with their own repositories can share and extend the metadata contributed by the other.  It began by Peter creating three assets in cocoMDS1.  Erin then connected to cocoMDS2 and she could also see these assets.  Then Erin was able to attach a comment to one of those assets through cocoMDS2 and Peter was then able to response through cocoMDS1.

Hence this is a truly distributed catalogue.


![Figure 3](../images/distributed-asset-with-comments.png)
> **Figure 3:** Asset and Comments distributed across 2 servers


----
## Exercise 3 - controlling access to assets

In the next exercise we will consider how organizations control the visability of assets.
Peter and Erin are joined by their colleague Callie Quartile, a data scientist working in the research team.

![Callie Quartile](https://raw.githubusercontent.com/odpi/data-governance/master/docs/coco-pharmaceuticals/personas/callie-quartile.png)

Callie's userId is `calliequartile`.

In [284]:
calliesUserId = 'calliequartile'

Callie has heard that the clinical trial files have arrived.  She is keen to start working on them as there was a delay in receiving the first two weeks worth of data.

Since Callie works in the research team, she uses the `cocoMDS3` metadata server.  She tries a search for the files.

In [296]:
server3            = "cocoMDS3"
server3PlatformURL = corePlatformURL

server3AssetConsumerURL = server3PlatformURL + '/servers/' + server3 + '/open-metadata/access-services/asset-consumer/users/' + calliesUserId 
server3GetAssetsURL = server3AssetConsumerURL + '/assets/by-search-string?startFrom=0&pageSize=50'


print (" ")
print ("GET " + server3GetAssetsURL)
print ("{ " + newFilesSearchString + " }")
print (" ")

response=requests.post(server3GetAssetsURL, data=newFilesSearchString)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

if response.json().get('guids'):
    if len(response.json().get('guids')) == 1:
        print ("1 asset found")
    else:
        print (str(len(response.json().get('guids'))) + " GUIDs for matching assets found")
else:
    print ("No assets found")

 
GET http://localhost:8080/servers/cocoMDS3/open-metadata/access-services/asset-consumer/users/calliequartile/assets/by-search-string?startFrom=0&pageSize=50
{ .*Drop Foot Clinical Trial Measurements.* }
 
Returns:
{
    "relatedHTTPCode": 200
}
 
No assets found


----
## Bonus material

This final section is an opportunity to dig a little deeper into the workings of Egeria.

The APIs used in the exercises above are from the access services - or Open Metadata Access Services (OMASs) to give them their formal name.  These APIs are domain specific - designed to use by tools, engines and platforms.

Underneath the access services are the repository services (Open Metadata Repository Services (OMRS)) and the platform services (Open Metadata and Governance (OMAG) Server Platform Services).

The repository services manage the exchange of metadata between servers.  The platform services provide a platform for running Egeria servers such as cocoMDS1 and cocoMDS2.


### Repository services

The repository services provide the ability for metadata to be accessed and exchanged from different servers.
Each server that has a repository (store) of metadata is assigned a **metadata collection id**.  This is a unique identifer that is associated with all metadata that originates from that repository.

The command below extracts the metadata collection id for cocoMDS1.

In [285]:
server1RepositoryServicesURL = server1PlatformURL + '/servers/' + server1 + '/open-metadata/repository-services/users/' + adminUserId 
server1MetadataColectionIdQuery = server1RepositoryServicesURL + '/metadata-collection-id'

print (" ")
print ("GET " + server1MetadataColectionIdQuery)

response = requests.get(server1MetadataColectionIdQuery)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

serverStatus = response.json().get('relatedHTTPCode')
if serverStatus == 200:
    cocoMDS1MetadataCollectionId = response.json().get('metadataCollectionId')
    print("Metadata collection id for " + server1 + " is " + cocoMDS1MetadataCollectionId)
else:
    print("Server " + server1 + " is not able to supply a metadata collection id")

 
GET http://localhost:8081/servers/cocoMDS1/open-metadata/repository-services/users/garygeeke/metadata-collection-id
Returns:
{
    "class": "MetadataCollectionIdResponse",
    "relatedHTTPCode": 200,
    "metadataCollectionId": "9aea0985-fc43-455d-8446-80dc7e63073a"
}
 
Metadata collection id for cocoMDS1 is 9aea0985-fc43-455d-8446-80dc7e63073a


----
Now we extract the metadata collection id for cocoMDS2.

In [286]:
server2RepositoryServicesURL = server2PlatformURL + '/servers/' + server2 + '/open-metadata/repository-services/users/' + adminUserId 
server2MetadataColectionIdQuery = server2RepositoryServicesURL + '/metadata-collection-id'

print (" ")
print ("GET " + server2MetadataColectionIdQuery)

response = requests.get(server2MetadataColectionIdQuery)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

serverStatus = response.json().get('relatedHTTPCode')
if serverStatus == 200:
    cocoMDS2MetadataCollectionId = response.json().get('metadataCollectionId')
    print("Metadata collection id for " + server2 + " is " + cocoMDS2MetadataCollectionId)
else:
    print("Server " + server2 + " is not able to supply a metadata collection id")

 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/repository-services/users/garygeeke/metadata-collection-id
Returns:
{
    "class": "MetadataCollectionIdResponse",
    "relatedHTTPCode": 200,
    "metadataCollectionId": "fac57bd7-0c63-4836-8278-d140cf1b7f4d"
}
 
Metadata collection id for cocoMDS2 is fac57bd7-0c63-4836-8278-d140cf1b7f4d


----

The metadata collection id is allocated when the server is first configured.  Once the server starts sharing metadata, the metadata collection id must never change as it is used in the metadata repository to identify where each piece of metadata came from.

The cocoMDS4 server does not have a repository and uses federated queries to retrieve metadata from other servers.

In [288]:
server4            = "cocoMDS4"
server4PlatformURL = dataLakePlatformURL

server4RepositoryServicesURL = server4PlatformURL + '/servers/' + server4 + '/open-metadata/repository-services/users/' + adminUserId 
server4MetadataColectionIdQuery = server4RepositoryServicesURL + '/metadata-collection-id'

print (" ")
print ("GET " + server4MetadataColectionIdQuery)

response = requests.get(server4MetadataColectionIdQuery)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

serverStatus = response.json().get('relatedHTTPCode')
if serverStatus == 200:
    cocoMDS4MetadataCollectionId = response.json().get('metadataCollectionId')
    print("Metadata collection id for " + server4 + " is " + cocoMDS4MetadataCollectionId)
else:
    print("Server " + server4 + " is not able to supply a metadata collection id")

 
GET http://localhost:8081/servers/cocoMDS4/open-metadata/repository-services/users/garygeeke/metadata-collection-id
Returns:
{
    "class": "MetadataCollectionIdResponse",
    "relatedHTTPCode": 503,
    "exceptionClassName": "org.odpi.openmetadata.repositoryservices.ffdc.exception.RepositoryErrorException",
    "exceptionErrorMessage": "OMRS-REST-API-503-001 There is no local repository to support REST API call getMetadataCollectionId",
    "exceptionSystemAction": "The server has received a call on its open metadata repository REST API services but is unable to process it because the local repository is not active.",
    "exceptionUserAction": "Ensure that the open metadata services have been activated in the server. If they are active and the server is supposed to have a local repository, correct the server's configuration document to include a local repository and restart the server."
}
 
Server cocoMDS4 is not able to supply a metadata collection id


----
This result is also a demonstration of the error handling in Egeria. All errors consist of a message, system action and user response.

----
Metadata instances such as the Assets and Comments that you were working with in Exercises 1 and 2 are stored in the repository as entities.  These entities are linked together with relationships (it is a logical graph model).

The command below uses the respository services to retrieve one of the assets created in exercise 1

In [289]:
server2AssetEntityQuery = server2RepositoryServicesURL + '/instances/entity/' + asset1guid

print (" ")
print ("GET " + server2AssetEntityQuery)

response = requests.get(server2AssetEntityQuery)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/repository-services/users/garygeeke/instances/entity/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a
Returns:
{
    "class": "EntityDetailResponse",
    "relatedHTTPCode": 200,
    "entity": {
        "class": "EntityDetail",
        "type": {
            "class": "InstanceType",
            "typeDefCategory": "ENTITY_DEF",
            "typeDefGUID": "2ccb2117-9cee-47ca-8150-9b3a543adcec",
            "typeDefName": "CSVFile",
            "typeDefVersion": 1,
            "typeDefDescription": "A description of a comma separated value (CSV) file",
            "typeDefSuperTypes": [
                {
                    "guid": "10752b4a-4b5d-4519-9eae-fdd6d162122f",
                    "name": "DataFile"
                },
                {
                    "guid": "30756d0b-362b-4bfa-a0de-fce6a8f47b47",
                    "name": "DataStore"
                },
                {
                    "guid": "896d14c2-7522-4f6c-8519-7577

The entity includes its type definition and the properties of the asset.  Also notice the metadata collection id for cocoMDS1 around the middle of the structure.

Contrast the asset entity with the comment that Erin created.  Notice the type information is different, and the metadata collection id for cocoMDS2.

In [290]:
server2CommentEntityQuery = server2RepositoryServicesURL + '/instances/entity/' + commentGUID

print (" ")
print ("GET " + server2CommentEntityQuery)

response = requests.get(server2CommentEntityQuery)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/repository-services/users/garygeeke/instances/entity/f199724b-f63f-4552-866d-8956626d4ead
Returns:
{
    "class": "EntityDetailResponse",
    "relatedHTTPCode": 200,
    "entity": {
        "class": "EntityDetail",
        "type": {
            "class": "InstanceType",
            "typeDefCategory": "ENTITY_DEF",
            "typeDefGUID": "1a226073-9c84-40e4-a422-fbddb9b84278",
            "typeDefName": "Comment",
            "typeDefVersion": 1,
            "typeDefDescription": "Descriptive feedback or discussion related to an item.",
            "typeDefSuperTypes": [
                {
                    "guid": "a32316b8-dc8c-48c5-b12b-71c1b2a080bf",
                    "name": "Referenceable"
                }
            ],
            "validInstanceProperties": [
                "qualifiedName",
                "additionalProperties",
                "anchorGUID",
                "text",
                "type"
       

----
Finally, consider the relationship between the asset and the comment.  It includes summary information about the two entities (called an **entity proxy**).  This is how it is possible to transmit and even store relationships independently of the entities.

In [291]:
server2AssetRelationshipQuery = server2RepositoryServicesURL + '/instances/entity/' + asset1guid + '/relationships'

print (" ")
print ("POST " + server2AssetRelationshipQuery)

relationshipRequestBody={
	"class" : "TypeLimitedFindRequest",
	"offset" : "0",
	"pageSize" : "100" 
}
response = requests.post(server2AssetRelationshipQuery, json=relationshipRequestBody, headers=jsonHeader)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")



 
POST http://localhost:8080/servers/cocoMDS2/open-metadata/repository-services/users/garygeeke/instances/entity/fc8fc4a7-09ba-403e-aac9-2da4b787bb4a/relationships
Returns:
{
    "class": "RelationshipListResponse",
    "relatedHTTPCode": 200,
    "offset": 0,
    "pageSize": 100,
    "relationships": [
        {
            "class": "Relationship",
            "type": {
                "class": "InstanceType",
                "typeDefCategory": "RELATIONSHIP_DEF",
                "typeDefGUID": "0d90501b-bf29-4621-a207-0c8c953bdac9",
                "typeDefName": "AttachedComment",
                "typeDefVersion": 1,
                "typeDefDescription": "Links a comment to an item, or another comment.",
                "validInstanceProperties": [
                    "isPublic"
                ]
            },
            "instanceProvenanceType": "LOCAL_COHORT",
            "metadataCollectionId": "fac57bd7-0c63-4836-8278-d140cf1b7f4d",
            "metadataCollectionName": "cocoM

Which server was the relationship created in?

----
#### Open Metadata Cohorts

The metadata exchange between the servers is a peer-to-peer protocol.  Each server registers with one or more open metadata cohorts.  

Figure 4 shows which metadata servers belong to each cohort.

![Figure 4](../images/coco-pharmaceuticals-systems-metadata-servers.png)
> **Figure 4:** Membership of Coco Pharmaceuticals' cohorts

----
The command below queries cocoMDS2's view of the cohorts

In [292]:
server2cohortURLcore =  server2RepositoryServicesURL + '/metadata-highway'

import pprint
import json

print (" ")
print ("Querying cohorts for " + server2 + " ...")
url = server2cohortURLcore + '/cohort-descriptions'
print ("GET " + url)

response = requests.get(url)

print (" ")

serverStatus = response.json().get('relatedHTTPCode')
if serverStatus == 200:
    cohorts = response.json().get('cohorts')
    cohort1 = cohorts[0]
    cohort1Name = cohort1.get('cohortName')
    print("Cohort 1 for " + server2 + " is " + cohort1Name)
    cohort2 = cohorts[1]
    cohort2Name = cohort2.get('cohortName')
    print("Cohort 2 for " + server2 + " is " + cohort2Name)
    cohort3 = cohorts[2]
    cohort3Name = cohort3.get('cohortName')
    print("Cohort 3 for " + server2 + " is " + cohort3Name)
else:
    prettyResponse = json.dumps(response.json(), indent=4)
    print (prettyResponse)
    print (" ")

 
Querying cohorts for cocoMDS2 ...
GET http://localhost:8080/servers/cocoMDS2/open-metadata/repository-services/users/garygeeke/metadata-highway/cohort-descriptions
 
Cohort 1 for cocoMDS2 is cocoCohort
Cohort 2 for cocoMDS2 is devCohort
Cohort 3 for cocoMDS2 is iotCohort


----
There are more examples and explanation about the way that the cohorts work in the **Understanding Cohorts** notebook.


----
### Metadata security

Security of metadata is extremely important.  Egeria has multiple levels of security so that access to individual metadata instances can be controlled.  The command below is a simple test when an unauthorized user tries to access one of Coco Pharmaceutical metadata servers.


In [293]:
unauthorizedUserQuery = server2PlatformURL + '/servers/' + server2 + '/open-metadata/repository-services/users/evilEdna/metadata-collection-id'

print (" ")
print ("GET " + unauthorizedUserQuery)

response = requests.get(unauthorizedUserQuery)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

 
GET http://localhost:8080/servers/cocoMDS2/open-metadata/repository-services/users/evilEdna/metadata-collection-id
Returns:
{
    "class": "MetadataCollectionIdResponse",
    "relatedHTTPCode": 403,
    "exceptionClassName": "org.odpi.openmetadata.repositoryservices.ffdc.exception.UserNotAuthorizedException",
    "exceptionErrorMessage": "OMAG-PLATFORM-SECURITY-403-002 User evilEdna is not authorized to issue a request to server cocoMDS2",
    "exceptionSystemAction": "The system is unable to process a request from the user because they do not have access to the necessary services and/or resources.",
    "exceptionUserAction": "The request fails with a UserNotAuthorizedException exception."
}
 


----
### Platform services

The platform services are for the infrastructure team running an Egeria service.  In the case of a cloud service, this may be a different organization to the metadata owners.  As a result, there is a separation of users able to work with the platform services verses the access and repository services.

This first command queries the servers running on a platform.

In [294]:
corePlatformServices = corePlatformURL + '/open-metadata/platform-services/users/' + adminUserId + '/server-platform'
corePlatformServers  = corePlatformServices + '/servers'

print (" ")
print ("CorePlatform's Servers ")
print ("GET " + corePlatformServers)

response = requests.get(corePlatformServers)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

dataLakePlatformServices = dataLakePlatformURL + '/open-metadata/platform-services/users/' + adminUserId + '/server-platform'
dataLakePlatformServers  = dataLakePlatformServices + '/servers'

print (" ")
print ("DataLakePlatform's Servers ")
print ("GET " + dataLakePlatformServers)

response = requests.get(dataLakePlatformServers)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

 
CorePlatform's Servers 
GET http://localhost:8080/open-metadata/platform-services/users/garygeeke/server-platform/servers
Returns:
{
    "relatedHTTPCode": 200,
    "serverList": [
        "cocoMDS2",
        "cocoMDS3"
    ]
}
 
 
DataLakePlatform's Servers 
GET http://localhost:8081/open-metadata/platform-services/users/garygeeke/server-platform/servers
Returns:
{
    "relatedHTTPCode": 200,
    "serverList": [
        "cocoMDS1",
        "cocoMDS4"
    ]
}
 


----
This last command queries the services active on server 1

In [295]:
server1Services = dataLakePlatformServices + '/servers/' + server1 + '/services'

print (" ")
print (server1 + " services ")
print ("GET " + server1Services)

response = requests.get(server1Services)

print ("Returns:")
prettyResponse = json.dumps(response.json(), indent=4)
print (prettyResponse)
print (" ")

 
cocoMDS1 services 
GET http://localhost:8081/open-metadata/platform-services/users/garygeeke/server-platform/servers/cocoMDS1/services
Returns:
{
    "relatedHTTPCode": 200,
    "serverName": "cocoMDS1",
    "serverServicesList": [
        "Open Metadata Repository Services (OMRS)",
        "Connected Asset Services",
        "Data Engine",
        "Community Profile OMAS",
        "Asset Consumer OMAS",
        "OMAG Server Operational Services",
        "Discovery Engine OMAS",
        "Glossary View",
        "Data Platform OMAS",
        "Asset Owner OMAS"
    ]
}
 


----