#### Copyright IBM Inc. All Rights Reserved.
#### SPDX-License-Identifier: Apache-2.0

# ST4SD Property retrieval


This notebook demonstrates
* Authenticating to the `st4sd-runtime-service` REST-API
* Querying available experiments
* Submitting a `band-gap-pm3-gamess-us` experiment
* Querying status of `band-gap-pm3-gamess-us` instance
* Retrieving top-level output results from the instance

For information on how to retrieve detailed information via the `st4sd-datastore` see the notebook `ST4SD Datastore - Common Query Examples` located in same repository as this notebook.


## Notes

* Go to https://st4sd.github.io/overview/api-docs/openapi/st4sd-runtime-service/st4sd-runtime-service.openapi.html for the full API specification
* For details on the **band-gap-gamess** experiments see https://github.com/st4sd/band-gap-gamess for more information
   
## Terminology
- **experiment**: Refers to the *definition* of a particular workflow e.g. the `band-gap-pm3-gamess-us` experiment
- **instance**: Refers to a particular execution of an experiment

## Setup

In [None]:
from __future__ import print_function
from urllib.error import HTTPError

import experiment.service.db
import urllib
import pprint
import logging
import pandas as pd
import time
import datetime
import json

import ipywidgets as widgets
from IPython.display import display
from IPython.display import clear_output

def pretty_json(what):
    return json.dumps(what, indent=2, sort_keys=True)

logging.basicConfig(format='%(levelname)-9s %(name)-15s: %(funcName)-20s %(asctime)-15s: %(message)s')
root = logging.getLogger()
root.setLevel(logging.CRITICAL)

Modify the routes below to match the ones of your OpenShift cluster.

Technically the only one that *must* be defined is `route_st4sd_runtime_service`, as the wrapper can auto-detect the other two.

In [None]:
route_st4sd_runtime_service = 'https://st4sd-prod.ve-5446-9ca4d14d48413d18ce61b80811ba4308-0000.us-south.containers.appdomain.cloud/rs'
route_st4sd_rest = None
route_st4sd_registry = None

### Authenticate to the ST4SD Stack

Virtually all instances of the ST4SD stack have authentication enabled. Please follow the instructions in the cells below to correctly authenticate, if required.

**NOTE: If you are running this notebook via the st4sd-runtime-core container please ensure it has been recently updated**

In [None]:
authentication_enabled = False
try:
    response = urllib.request.urlopen('/'.join((route_st4sd_runtime_service, 'oauth/sign_in')))
except HTTPError as e:
    if e.code == 403:
        authentication_enabled = True
        print("Authentication is enabled, please proceed with the 'Authentication Enabled' section")
    else:
        print("Authentication is not enabled, skip to 'Connect to API'")

#### Authentication Enabled
- Visit the URL printed in the cell below
- If it's your first time, you will be prompted to login
   - You will have to choose the login method (depending on the OpenShift instance this can be LDAP, IBM SSO, etc)
   - If this is the first time you login, you will be prompted to give your consent for the workflow-operator ServiceAccount to know that your username has authenticated to OpenShift. You need to agree to this before you can authenticate to the `st4sd-runtime-service` REST-API.
- After logging in, you will be presented with an authentication token that you will provide to the experiment.service.db.ExperimentRestAPI wrapper in a python cell below.

**The token will last for 168 hours**

In [None]:
if authentication_enabled:
    auth_url = '/'.join((route_st4sd_runtime_service, 'authorisation/token'))
    print(f"Visit this URL to get your authentication token:\n{auth_url}")

Run this cell and paste in the widget below the value of the authorization token you've been presented with when you visited the URL in the previous cell. Alternatively, you can use an OpenShift token (the one that you'd normaly provide to the `--token` parameter of oc login) with the `cc_bearer_key` agument to experiment.service.db.ExperimentRestAPI

In [None]:
w_label = 'Input your authentication token here:'
display(w_label)
token_widget = widgets.Password()
display(token_widget)

In [None]:
auth_token = ''
if authentication_enabled and auth_token == '':
    auth_token = token_widget.value
    if auth_token == '':
        print("Authentication is required. Please fill in your token in the box above.")
        raise Exception("Missing authentication")
    if auth_token.startswith("\""):
        auth_token = auth_token[1:]
    if auth_token.endswith("\""):
        auth_token = auth_token[:-1]
    token_widget.value = ""

### Connect to the API

In [None]:
# Uncomment line below to view the documentation of the python wrapper
# help(experiment.service.db.ExperimentRestAPI)

In [None]:
# Ensure that your account is authorised to use the ST4SD Runtime Service REST API
try:
    api = experiment.service.db.ExperimentRestAPI(route_st4sd_runtime_service, route_st4sd_registry, 
                                      route_st4sd_rest, max_retries=2, secs_between_retries=1,
                                      cc_auth_token=auth_token)
except experiment.service.errors.UnauthorisedRequest:
    err = ValueError(f"Visit {auth_url} to authenticate first, then provide the printed string to the cc_auth_token "
                    "parameter of experiment.service.db.ExperimentRestAPI constructor")
else:
    print(f"You've successfully authenticated to {route_st4sd_runtime_service}")

## List and Add Experiments

In [None]:
# Query available experiments
experiments = api.api_experiment_list()
to_show = 5

print(f"There are {len(experiments.keys())} registered experiments", end='')
if len(experiments.keys()) > to_show:
    print(". The first 5 are:", end='\n\n')
else:
    print(":", end='\n\n')

for idx, e in enumerate(experiments.keys()):
    if idx > to_show:
        break
    print(e)


In [None]:
if len(experiments.keys()) > 0:
    print("The entry for the first experiment is:", end='\n\n')
    first_experiment = experiments[list(experiments.keys())[0]]
    print(pretty_json(first_experiment))

## Upsert an experiment

`band-gap-dft-gamess-us` is a DFT experiment for calculating the band gap of the anion component of a PAG along with HOMO and LUMO energies. The experiment takes as input a table of molecules as SMILES strings and returns one table containing the results of the band-gap calculation.

From: https://github.com/st4sd/band-gap-gamess

In [None]:
package = {
    "base": {
        "packages": [
            {
                "source": {
                    "git": {
                        "location": {
                            "url": "https://github.com/st4sd/band-gap-gamess.git",
                            "tag": "1.0.0"
                        }
                    }
                },
                "config": {
                    "path": "semi-empirical/homo-lumo-dft-semi-empirical.yaml",
                    "manifestPath": "semi-empirical/manifest.yaml"
                }
            }
        ]
    },
    "metadata": {
        "package": {
            "name": "band-gap-pm3-gamess-us",
            "tags": [
                "latest",
                "1.0.0"
            ],
            "maintainer": "https://github.com/michael-johnston",
            "description": "Uses the PM3 semi-empirical method to perform the geometry optimization and calculate the band-gap and related properties. The calculation is performed with GAMESS-US",
            "keywords": [
                "smiles",
                "computational chemistry",
                "semi-empirical",
                "geometry-optimization",
                "pm3",
                "homo",
                "lumo",
                "band-gap",
                "gamess-us"
            ]
        }
    },
    "parameterisation": {
        "presets": {
            "runtime": {
                "args": [
                    "--failSafeDelays=no",
                    "--registerWorkflow=yes"
                ]
            }
        },
        "executionOptions": {
            "variables": [
                {
                    "name": "numberMolecules"
                },
                {
                    "name": "startIndex"
                },
                {
                    "name":  "gamess-walltime-minutes"
                },
                {
                    "name":  "gamess-grace-period-seconds"
                },
                {
                    "name":  "number-processors"
                }
            ],
            "platform": [
                "openshift",
                "openshift-kubeflux"
            ]
        }
    }
}


package = api.api_experiment_push(package)
print(pretty_json(package))

Thanks to the use of the virtual-experiment interface, we can understand what is being calculated

In [None]:
pd.DataFrame(package['metadata']['registry']['interface']['propertiesSpec'])[['name','description']]

In [None]:
# You may delete the experiment 'band-gap-pm3-gamess-us' using

# api.api_experiment_delete('band-gap-pm3-gamess-us')

# WARNING: The only way re-instate the experiment definition 
# is to use api_experiment_push() it once again.

## Submit Experiment

In [None]:
#Input data - A csv formatted string containing a label a SMILES
#You can also use the next cell to load a file from your hard drive instead
moleculeData = '''label,smiles
mymolecule,CCCCCCCC[SH2+]
'''.rstrip()

In [None]:
display("Choose the amount of molecules you want to calculate values for:")
num_molecules = widgets.IntSlider(min = 1, max = 5, value = 1)
display(num_molecules)

In [None]:
#Input configuration
#See the experiment description for defaults and other options
experimentConfiguration = {
    "inputs": [
        {"filename": "input_smiles.csv", "content": moleculeData}
    ],
    "variables": {
        "startIndex": 0,
        # you can submit multiple molecules in 1 experiment
        "numberMolecules": num_molecules.value,
    },
    "metadata": {
      # you can provide user-metadata `key: value` pairs which you can use
      # in the future for querying the database (user-metadata)
      "author": "amazing-person"
    },
    "additionalOptions": [
      # you can provide additional arguments to elaunch here
      # but they cannot override those of the experiment definition
      # the additionalOptions of which will automatically be used too
      "--useMemoization=y"
    ],
    "orchestrator_resources": {
      "cpu": "1",
      "memory": "2Gi"
    }
}

In [None]:
#Submit an instance of the parameterised virtual experiment package
experimentId = 'band-gap-pm3-gamess-us'
rest_uid = api.api_experiment_start(experimentId, experimentConfiguration)

In [None]:
#Print REST-uid of experiment instance
print("rest-uid:", rest_uid)

In [None]:
#Get instance status - run this cell periodically, until experiment state becomes "Finished"
#it should take about 5 minutes
instance_status = api.api_rest_uid_status(rest_uid)

status = instance_status['status']
status = {key: status[key] for key in status if key != "meta"}
print("Status of instance is\n",json.dumps(status, indent=2))

#Uncomment to see verbose state of instance
#PrettyInstanceStatus(instance_status)

In [None]:
while True:
    clear_output(wait=True)
    instance_status = api.api_rest_uid_status(rest_uid)

    print(f"Outputs produced so far are {pretty_json(instance_status['outputs'])}", end='\n\n')
    exp_state = instance_status['status']['experiment-state']
    if exp_state is None:
        next_call = datetime.datetime.now() + datetime.timedelta(seconds=10)
        print(f"Kubernetes is spinning up objects - try again at {next_call}")
        time.sleep(10)
        continue
                                         
    if exp_state in ["running", "initialising"]:
        print(f"Experiment is {exp_state}, it may produce more outputs")
        print("The experiment in this example, only produces 1 output - OptimisationResults")
    else:
        print(f"Experiment is {exp_state} - it will not produce new outputs", end='\n\n')

    # Get the CSV data associated with a particular result type
    # If you attempt to fetch an output for which there is no entry in the instance_status['outputs'] dictionary,
    # or there is no workflow instance the statement below will raise an InvalidHTTPRequest exception, 
    # read the description of the Exception for more information.
    if 'OptimisationResults' in instance_status['outputs']:
        filename, contents = api.api_rest_uid_output(rest_uid, 'OptimisationResults')
        print("Contents (i.e. bytes) of", filename, "are:")
        print(contents.decode('utf-8'))
        break
    else:
        next_call = datetime.datetime.now() + datetime.timedelta(seconds=10)
        print(f"Experiment has not produced outputs yet - try again at {next_call}")
        time.sleep(10)

## Extract measured properties

Extracting measured properties can be done in three ways:
1. Making a GET request to the `/instances/{rest_uid}/properties` endpoint of the `st4sd-runtime-service`.
2. Using the `include_properties` parameter of the `api.cdb_get_experiment_document_for_rest_uid` method.
3. Using the `include_properties` parameter of the `api.cdb_get_document_experiment` method.

All these methods also allow the user to specify only a certain subset of properties to retrieve.

### Retrieve all properties

#### HTTP Request to st4sd-runtime-service

In [None]:
x = api.api_request_get(f"/instances/{rest_uid}/properties")
x = pd.DataFrame.from_dict(x)
display(x.head(num_molecules.value))

#### api.cdb_get_document_experiment_for_rest_uid

In [None]:
x = api.cdb_get_document_experiment_for_rest_uid(rest_uid, include_properties=['*'])
instance = x['instance']
x = pd.DataFrame.from_dict(x['interface']['propertyTable'])
display(x)

#### api.cdb_get_document_experiment

In [None]:
x = api.cdb_get_document_experiment(instance=instance, include_properties=['*'])[0]
x = pd.DataFrame.from_dict(x['interface']['propertyTable'])
display(x)

### Specify subset of properties to retrieve

#### HTTP Request to st4sd-runtime-service

In [None]:
x = api.api_request_get(f"/instances/{rest_uid}/properties?includeProperties=homo,lumo")
x = pd.DataFrame.from_dict(x)
display(x.head(num_molecules.value))

#### api.cdb_get_experiment_document_for_rest_uid

In [None]:
x = api.cdb_get_document_experiment_for_rest_uid(rest_uid, include_properties=['homo,lumo'])
x = pd.DataFrame.from_dict(x['interface']['propertyTable'])
display(x)

#### api.cdb_get_document_experiment

In [None]:
x = api.cdb_get_document_experiment(instance=instance, include_properties=['homo,lumo'])[0]
x = pd.DataFrame.from_dict(x['interface']['propertyTable'])
display(x)

In [None]:
# Review the instance status now that the experiment has completed
print(pretty_json(instance_status))

## Query the ST4SD Datastore

Keep in mind that the `st4sd-datastore` API will truncate files that are larger than 32MB. In such a case
the returned contents will include the message below right at the end of the truncated contents:
`FILE TRUNCATED to 33554432 bytes, actual file size is <the-file-size-in-bytes>`

In [None]:
docs = api.cdb_get_document_experiment(query={})
print("Recorded workflow instances: %d" % len(docs))

In [None]:
# Uncomment to print each `experiment` document
# for d in docs:
#     pprint.pprint(d['type'])

In [None]:
docs, files = api.cdb_get_data(stage=1, instance='band-gap-.*', component='ExtractEnergies', filename='energies.csv')
print("Last Matching component document is", end='\n\n')
print(pretty_json(docs[-1]))