## Adding metadata to subjects and samples
*Latest version 30 June 2022*

With this notebook you can add/change metadata (defined in an excel file) to existing subject and sample instances in the KGE.

**Note:** The script will overwrite whatever is currently in the KGE. Only specify what you want to add or change!

Currently the following metadata properties are supported:
For subject groups, subjects, tissue sample collections and tissue samples:
- lookup label
- internal identifier
- biological sex

For states of the specimen:
- age category
- age (either a range or discrete number)
- weight (either a range or discrete number)
For the age and weight both the value and the unit need to be given. Ranges are separated by a dash (e.g. 200-300)

To be able to run the script, you need to the following requirements:
- Python version >= 3.6
- read and write permission to the KG via the API

In [None]:
# import relevant packages
from getpass import getpass
import requests
import os
import pandas as pd

### Authentication

To interact with the API, you need an access token. To request a token, copy your token from the Knowledge Graph Editor or Query Builder (if you do not have access, request access via support@ebrains.eu).

In case your token is expired, rerun the cell below.

In [None]:
token = getpass(prompt='Please paste your token: ')

In [None]:
# Place the script in the same folder as the csv file or define Location of the files
cwd = os.getcwd()

answer = ""
while answer not in ["y", "n"]: 
    answer = input(f"Is this where your files are stored: {cwd}? yes (y) or no (n) " ) 
    if answer == "y":
        fpath = cwd
        break
    elif answer == "n":
        fpath = input("Please define you path: ")
        break


## Select file and patch instances

Make an excel file with the metadata you want to add to the instance as in the example below. To ensure that the correct instance is adjusted, fill in the correct UUID of the instance in the "atid" column. Please ensure that the correct spelling and capitalisation is used for metadata that are controlled terms (for more information either look at the specimen template in the MetaBot packate or online [https://humanbrainproject.github.io/openMINDS/v3/](https://humanbrainproject.github.io/openMINDS/v3/)).

**Example**
</br>
<div>
<img src="img/examplePatch.png"/>
</div>
</br>

In [None]:
fileInfo = input("What is name of the file with the information that needs to be patched? ")
data = pd.read_excel(os.path.join(cwd, fileInfo + ".xlsx"))

In [None]:
# Function to patch and upload the instances to the KGE
def patchANDupload(df, token, space_name, response):
    """
    
    Parameters
    ----------
    patched_item : variable 
        Element (string or integer) that needs to be added to an instance already in the KGE
    token : string
        Authorisation token to get access to the KGE
    space_name : string
        Space that the instances needs to be uploaded to, e.g. "dataset", "common", etc.

    Returns
    -------
    response : dictionary
        For each UUID as response is stored that indications if the upload 
        was successful

    """
    
    hed = {"accept": "*/*",
           "Authorization": "Bearer " + token,
           "Content-Type": "application/json"
           }
    
   
    
    # Prefix to upload to the right space
    url = "https://core.kg.ebrains.eu/v3-beta/instances/{}?space=" + space_name
    
    # Select the UUID of the instance that requires changing
    atid = df.atid[0]

    instance = []
    instance.append(("@context", {"@vocab": "https://openminds.ebrains.eu/vocab/"}))

    if 'lookupLabel' in df.columns:
        if not pd.isnull(df.lookupLabel[0]):
            instance.append(("lookupLabel", str(df.lookupLabel[0])))

    if 'internalIdentifier' in df.columns:
        if not pd.isnull(df.internalIdentifier[0]):
            instance.append(("internalIdentifier", str(df.internalIdentifier[0])))

    if 'biologicalSex' in df.columns:
        if not pd.isnull(df.biologicalSex[0]):
            sex = [{"@id" : "https://openminds.ebrains.eu/instances/biologicalSex/" +  str(df.biologicalSex[0])}]
            instance.append(("biologicalSex", sex))

    if 'ageCategory' in df.columns:
        if not pd.isnull(df.ageCategory[0]):
            ageCategory = [{"@id" : "https://openminds.ebrains.eu/instances/ageCategory/" +  str(df.ageCategory[0])}]
            instance.append(("ageCategory", ageCategory))

    if 'ageValue' in df.columns:
        if not pd.isnull(df.ageValue[0]) and not pd.isnull(df.ageUnit[0]):
            if str(df.ageValue[0]).find("-") != -1:
                ages = df.ageValue[0].split("-")
                age = [{"@type" : "https://openminds.ebrains.eu/core/QuantitativeValueRange",
                            "minValueUnit" : {"@id": "https://openminds.ebrains.eu/instances/unitOfMeasurement/" + str(df.ageUnit[0])},
                            "maxValueUnit" : {"@id": "https://openminds.ebrains.eu/instances/unitOfMeasurement/" + str(df.ageUnit[0])}, 
                            "minValue" : int(ages[0].strip()),
                            "maxValue" : int(ages[1].strip())
                        }]
            else:
                age = [{"@type" : "https://openminds.ebrains.eu/core/QuantitativeValue",
                            "unit" : {"@id": "https://openminds.ebrains.eu/instances/unitOfMeasurement/" + str(df.ageUnit[0])}, 
                            "value" : int(df.ageValue[0])
                        }]
        
            instance.append(("age", age))

    if 'weightValue' in df.columns:
        if not pd.isnull(df.weightValue[0]) and not pd.isnull(df.weightUnit[0]):
            if str(df.weightValue[0]).find("-") != -1:
                weights = df.weightValue[0].split("-")
                weight = [{"@type" : "https://openminds.ebrains.eu/core/QuantitativeValueRange",
                            "minValueUnit" : {"@id": "https://openminds.ebrains.eu/instances/unitOfMeasurement/" + str(df.weightUnit[0])},
                            "maxValueUnit" : {"@id": "https://openminds.ebrains.eu/instances/unitOfMeasurement/" + str(df.weightUnit[0])}, 
                            "minValue" : int(weights[0].strip()),
                            "maxValue" : int(weights[1].strip())
                        }]
            else:
                weight = [{"@type" : "https://openminds.ebrains.eu/core/QuantitativeValue",
                            "unit" : {"@id": "https://openminds.ebrains.eu/instances/unitOfMeasurement/" + str(df.weightUnit[0])}, 
                            "value" : int(df.weightValue[0])
                            }]

            instance.append(("weight", weight))


    if 'attribute' in df.columns:
        if not pd.isnull(df.attribute[0]):
            if df.attribute[0].find(','):
                attribute = []
                for attributes in df.attribute[0].split(","):
                    attribute.append({"@id": "https://openminds.ebrains.eu/instances/subjectAttribute/" + str(attributes.strip())})
            else:
                attribute = [{"@id" : "https://openminds.ebrains.eu/instances/subjectAttribute/" + str(df.attribute[0])}]
        
            instance.append(("attribute", attribute))

    # convert list to dictionary
    instance_dict = {}
    for i in range(len(instance)):
        instance_dict[instance[i][0]] = instance[i][1]

    print("Patching instance ")
    response[atid] = requests.patch(url.format(atid), json=instance_dict, headers=hed)
    if response[atid].status_code == 200:
        print(response[atid].status_code, "OK!" )
    elif response[atid].status_code == 401:
        print(response[atid], "Token not valid, authorisation unsuccessful")
    elif response[atid].status_code == 404:
        print(response[atid], "Instance not found")
    else:
        print(response[atid])
        
        
    return response  

### Run the cell below to make the changes in the KGE

You will be notified if the change was successful. If you get the resonse that the token was not valid, refresh the token and run the authorisation cell before running the cell below again.

In [None]:
response = {}
space_name="dataset"
for idx in range(len(data)):
    df = data.iloc[idx:idx+1].reset_index(drop=True)
    response = patchANDupload(df, token, space_name, response)
    print(f"instance[{idx}/{len(data)}]")

print("Done!")