## Creating service links and linking them to file bundles

With this notebook you can do the following:
1. Query file bundles via the API
2. Create URL instances in the KGE
3. Create service links and link the correct file bundle to the correct URL instance
4. Post the newly created instances to the KGE

To be able to run the script, you need to the following requirements:
- Python version >= 3.6
- openMINDS package (can be downloaded from https://pypi.org/project/openMINDS/)
- read and write permission to the KG via the API

Information about the URL links should be stored in a .csv file with the following column names written in the correct way. **Note that more columns can be present in the csv file. They will not be used and do not affect the script.**: 
- subjectName
- sampleName
- fileBundle
- URL_link

The subject and sample name will be used to generate a label for the service link, which can be found under the "view data" tab on the dataset card. The file bundle name is used to find UUID of the file bundle in the KG so that it can be linked to the service link.

In [None]:
# import relevant packages
from getpass import getpass
import requests
import os
import json
import glob
import pandas as pd
import openMINDS
import openMINDS.version_manager

### Authentication

To interact with the API, you need an access token. To request a token, follow this link: https://nexus-iam.humanbrainproject.org/v0/oauth2/authorize or copy your token from the Knowledge Graph Editor (if you have access).

In [1]:
token = getpass(prompt='Please paste your token: ')

### Extract file bundle metadata from Knowledge Graph v3

We are using the saved query to extract the file bundles that have already been generated in the Knowledge Graph editor.

In [1]:
headers = {"accept": "*/*",
        "Authorization": "Bearer " + token
        }

url = "https://core.kg.ebrains.eu/v3-beta/queries/025339f7-10f0-407f-8106-bd839aab9677/instances?stage=IN_PROGRESS"

# Query results
resp = requests.get(url, headers=headers)
fb_info = resp.json()
file_list = fb_info['data']
print('\nNumber of file bundles found: ' + str(len(file_list)) + "\n")

### Import information about service links

As a next step we will import the csv file with the URLs in it.

In [None]:
# Place the script in the same folder as the csv file or define Location of the files
cwd = os.getcwd()
answer = input("Is this where your files are stored: " + cwd + "? yes (y) or no (n) " ) 

if answer == "y":
    fpath = cwd
elif answer == "n":
    fpath = input("Please define you path: ")
     
fpath = fpath + "\\" 
os.chdir(fpath)

kg_prefix = "https://kg.ebrains.eu/api/instances/"

# Load information for the service links
filename = input("What is the name of the service link file (e.g. servicelinks.csv)? ")
df = pd.read_csv(filename)

### Create instances for the URL and service links

In [None]:
# Function to create URL instances and service link instances
def createInstances(df, file_list): 
    """
    
    Parameters
    ----------
    df : pandas DataFrame
        DataFrame with information to create URL and service link instances
    file_list : List
        List of file bundles extracted from the KGE.

    Returns
    -------
    data : pandas DataFrame
        Overview of all information and newly created instances.

    """
    
    # Ask the service the service links should be opened in.
    answer = input("Should the link be opened in 1) LocaliZoom or 2) Siibra-explorer: ")
    if answer == "1":
        service_atid = "https://openminds.ebrains.eu/instances/service/LocaliZoom"
    elif answer == "2":
        service_atid = "https://openminds.ebrains.eu/instances/service/siibraExplorer"
    
    fileBundles = df.fileBundle.unique()
    link_dict = {}
    url_dict = {}
    data = pd.DataFrame([])
    for file in file_list:
        
        name_file = file["name"]
            
        # Check if file bundle exist for subject
        if name_file in fileBundles:
            name_sub = df.subjectName[df.index[df.fileBundle == name_file][0]]
            tsc_name = df.sampleName[df.index[df.fileBundle == name_file][0]]
            
            print("Creating URL for subject " + str(name_sub) + " tissue sample collection " + str(tsc_name) + "\n")
            
            # initiate the collection into which you will store all metadata instances
            mycol = helper.create_collection()
            
            # Create URL link 
            url_dict[name_file] = mycol.add_core_URL(URL = df.URL_link[df.index[df.fileBundle == name_file][0]])
            
            print("Creating service link for subject " + str(name_sub) + " file bundle " + str(name_file) + "\n")
        
            # Create Service link    
            link_dict[name_file] = mycol.add_core_serviceLink(
                dataLocation = [{"@id": file["id"]}],
                openDataIn = [{"@id": kg_prefix + url_dict[name_file].split("/")[-1]}],
                service = [{"@id": service_atid}]) 
            if  pd.isnull(name_sub) and pd.isnull(tsc_name):
                label = "tissue sample collection (subject " + str(name_file) + ")"
            elif name_sub == tsc_name:
                label = "tissue sample collection (subject " + str(name_sub) + ")"
            else:
                label = "tissue sample collection " + str(tsc_name) + " (subject " + str(name_sub) + ")"
            mycol.get(link_dict[name_file]).name = label
        
            data = data.append(pd.DataFrame({"subject_name" : name_sub,
                        "tsc_name" : tsc_name,
                        "fileBundle_name" : name_file,  
                        "URL_link" : df.URL_link[df.index[df.fileBundle == name_file][0]],
                        "URL_uuid" : url_dict[name_file].split("/")[-1],
                        "ServiceLink_uuid" : link_dict[name_file].split("/")[-1],
                        "ServiceLink_dataLocation_uuid" : file["id"].split("/")[-1],
                        "ServiceLink_name" : label,
                        "ServiceLink_service_atid" : service_atid,
                        "DescendedFrom_name" : file["descendedFrom"][0]["lookupLabel"],
                        "DescendedFrom_atid" : file["descendedFrom"][0]["id"]},                
                               index=[0]), ignore_index=True)
        
            mycol.save(".\\")  
            
        return data

In [None]:
# Create instances and save them    
data = createInstances(df, file_list)
data.to_csv('.\\serviceLinksInstances.csv', index = False, header=True)

### Upload instances to the Knowledge Graph editor

In [None]:
# Function to upload the instances to the KGE
def upload(instances_fnames, token, space_name):
    """
    
    Parameters
    ----------
    instances_fnames : List 
        list of file paths to instances that need to be uploaded
    token : string
        Authorisation token to get access to the KGE
    space_name : string
        Space that the instances needs to be uploaded to, e.g. "dataset", "common", etc.

    Returns
    -------
    response : dictionary
        For each UUID as response is stored that indications if the upload 
        was successful

    """
    
    hed = {"accept": "*/*",
           "Authorization": "Bearer " + token,
           "Content-Type": "application/json"
           }
    
    # Prefix to upload to the right space
    url = "https://core.kg.ebrains.eu/v3-beta/instances/{}?space=" + space_name
    kg_prefix = "https://kg.ebrains.eu/api/instances/"
    
    new_instances = []
    for fname in instances_fnames:
        with open(fname, 'r') as f:
            new_instances.append(json.load(f))
        f.close()
    
    # Correct the capitalisation in the openMINDS package
    for instance in new_instances:
        atid = kg_prefix + instance["@id"].split("/")[-1] #only take the UUID 
        instance["@id"] = atid
        if "openDataIn" in instance.keys():
            atid = kg_prefix + instance["openDataIn"][0]["@id"].split("/")[-1] #only take the UUID 
            instance["openDataIn"][0]["@id"] = atid
        if instance["@type"].endswith("Servicelink"):
            splittype = instance["@type"].split("/")[:-1]
            splittype.append("ServiceLink")
            instance["@type"] = "/".join(splittype)
        if instance["@type"].endswith("Url"):
            splittype = instance["@type"].split("/")[:-1]
            splittype.append("URL")
            instance["@type"] = "/".join(splittype)
    
    # Upload to the KGE
    print("\nUploading instances now:\n")
    
    count = 0
    response = {}    
    for instance in new_instances:
        count += 1
        print("Posting instance " + str(count)+"/"+str(len(new_instances)))
        atid = instance["@id"].split("/")[-1] 
        response[atid] = requests.post(url.format(atid), json=instance, headers=hed)
        if response[atid] == 200:
            print(response[atid], "OK!" )
        elif response[atid] == 409:
            print(response[atid], "Instance already exists")
        elif response[atid] == 401:
            print(response[atid], "Token not valid, authorisation not successful")
        else:
            print(response[atid])
        
        
    return response  

In [None]:
# Upload instances to the KGE
answer = input("Would you like to upload the instances you created to the KGE? yes (y) or no (n) " ) 

if answer == "y":
    instances_fnames = glob.glob(fpath + "*\\*", recursive = True)

    print("\nUploading data now:\n")
    
    if token != "":
        response = upload(instances_fnames, token, space_name = "dataset")  
        
elif answer == "n":
    print("\nDone!")