# MarketPlace data sink app and reaxpro platform interfacing



*If you operating on **snellius**, follow [this documentation](https://servicedesk.surf.nl/wiki/pages/viewpage.action?pageId=30660252#JupyterNotebooksonSnellius-UsingPythonvirtualenvironmentswithSnellius'JupyterHub) for setting up you virtual environment with the needed modules and packges.*

Run the following `bash`-commands in order to setup the jupyter environment:

```
# Load Python module used by the https://jupyter.snellius.surf.nl/2021 JupyterHub
module load 2022
module load Python/3.10.4-GCCcore-11.3.0
 
# Create and activate virtual environment
virtualenv marketplace_env
 
# Purge modules so that any subsequent pip-installs don't pick up on python packages from the module environment
module purge
 
# Activate virtual environment
source marketplace_env/bin/activate

# make sure that you are using the pip module from the correct python kernel
which pip
  
# Install ipykernel in the virtual environment
pip install ipykernel reaxpro-workflow-service git+https://github.com/materials-marketplace/python-sdk.git@fix/import-error

# make sure that you are using the correct python kernel
which python3

# Install the virtual environment as custom kernel. It will show up in the Jupyter Notebook Server with the name passed to the '--name' argument.
python3 -m ipykernel install --user --name=marketplace_env
 
# Makes sure the kernel only uses Python packages from the conda environment, not from the module environment
sed -i '/"-m",/i \ \ "-E",' ~/.local/share/jupyter/kernels/marketplace_env/kernel.json
```

Now, navigate to `https://jupyter.snellius.surf.nl/2022/` and log in with your credentials.

First you follow the previous steps, you are able to operate with this noteboke. 

Make sure that the needed packages are installed, if you did not already in the previous steps.

In [None]:
!pip install git+https://github.com/materials-marketplace/python-sdk.git@fix/import-error --no-cache

Import the MPSession for the data sink client and the custom requests-module

In [2]:
import requests
import tempfile
from getpass import getpass
from pprint import pprint

from marketplace.datasink_client.session import MPSession
from osp.core.namespaces import emmo

Set the host of your reaxpro-platform. The IP address is the respective address where you launched the app via docker or singularity.

In [3]:
HOST = "172.18.57.86:8081"

Now, let us repeat the tutorial which is available from the [official documentation](https://reaxpro-workflow-service.readthedocs.io/en/latest/index.html), but in terms of the [RES calculation in ams](https://reaxpro-workflow-service.readthedocs.io/en/latest/usecases.html#res-calculation). 

In [4]:
response = requests.get(f"http://{HOST}/models/registered")
print("First of all, we make sure that the PES Calculation is our registered models:")
pprint(response.json())

First of all, we make sure that the PES Calculation is our registered models:
{'message': 'Fetched registry of data models.',
 'registered_models': ['COPt111FullscaleModel',
                       'COPt111FromMesoScaleModel',
                       'PESExploration',
                       'COPt111MesoscaleModel',
                       'COCatalyticFOAMModel',
                       'EnergyLandscapeRefinement',
                       'COpyZacrosModel']}


Now, we define in the input data from the documentation:

In [5]:
data = {
    "force_field": "CHONSFPtClNi",
    "solver_type": "Direct",
    "n_expeditions": 30,
    "n_explorers": 3,
    "max_energy": 2.0,
    "max_distance": 3.8,
    "random_seed": 100,
    "fixed_region": "surface",
    "reference_region": "surface",
    "symmetry_check": "T",
    "molecule": "4442d5c3-4b61-4b13-9bbb-fdf942776ca6",
    "lattice": "4442d5c3-4b61-4b13-9bbb-fdf942776ca6"
}    

In [6]:
response = requests.post(f"http://{HOST}/models/create/PESExploration", json=data)
print("Create the model and get the cache id from the response:")
cache = response.json()
pprint(cache)

Create the model and get the cache id from the response:
{'cache_id': '062729c0-dac7-4c86-a33f-984cdaeb6cfe'}


In [7]:
response = requests.post(f"http://{HOST}/task/send", json=cache)

print("Send the task and get the task id from the response:")

task = response.json()
pprint(task)
task_id = task["task_id"]

Send the task and get the task id from the response:
{'args': None,
 'date_done': None,
 'kwargs': None,
 'state': 'PENDING',
 'status': 'PENDING',
 'task_id': '8ac4b841-019f-4957-bbb5-be1d5f4b125d',
 'traceback': None}


In [8]:
response = requests.get(f"http://{HOST}/task/log/{task_id}")

print("Get logging messages of the task:\n")
print(response.text)

Get logging messages of the task:

2023-08-29 12:16:01,844 - INFO - received cache_key 062729c0-dac7-4c86-a33f-984cdaeb6cfe
2023-08-29 12:16:02,162 - INFO - 293 CUDS objects have been added to CeleryWorkflowSession
2023-08-29 12:16:02,183 - INFO - 1 CUDS object has been updated in CeleryWorkflowSession
2023-08-29 12:16:02,201 - INFO - 0 CUDS objects have been deleted from CeleryWorkflowSession
2023-08-29 12:16:02,226 - INFO - Did not find any workflow steps with
            complementary workers. Will scan for single
            object of type emmo.Calculation.
2023-08-29 12:16:02,253 - INFO - Found additional workers [(<emmo.ProcessSearch: 42e6a340-cf07-42d6-a84d-022781876f04,  CeleryWorkflowSession: @0x14f056f5f640>, 'simphony-ams')] in the buffer,
        but will ignored because not part of a workflow chain.
2023-08-29 12:16:02,312 - INFO - received cache_key 062729c0-dac7-4c86-a33f-984cdaeb6cfe
2023-08-29 12:16:02,629 - INFO - 293 CUDS objects have been added to Some Wrapper Sessi

In [9]:
response = requests.get(f"http://{HOST}/task/status/{task_id}")

print("Check that the job was successful task:\n")
pprint(response.json())

Check that the job was successful task:

{'args': None,
 'date_done': '2023-08-29T10:16:05.632829',
 'kwargs': None,
 'state': 'SUCCESS',
 'status': 'SUCCESS',
 'task_id': '8ac4b841-019f-4957-bbb5-be1d5f4b125d',
 'traceback': None}


In [10]:
response = requests.get(f"http://{HOST}/task/result/{task_id}")

print("Get results from the simulation:\n")
result = response.json()
pprint(result)

graph_key = result["result"]["cache_meta"]
raw_data_key = result["result"]["cache_raw"]["0_simphony-ams"]

Get results from the simulation:

{'date_done': '2023-08-29T10:16:05.632829',
 'result': {'cache_meta': '7dd89cd0-9e39-4224-aee3-a3b257892d2e',
            'cache_raw': {'0_simphony-ams': '182491e1-92bb-4257-a56a-f4f8be1140db'}},
 'task_id': '8ac4b841-019f-4957-bbb5-be1d5f4b125d',
 'traceback': None}


In [11]:
print("First of all, get the resulting graph from the simulation:")

response = requests.get(f"http://{HOST}/cache/download/{graph_key}")

with tempfile.NamedTemporaryFile("w", delete=False, suffix=".ttl") as tmp_graph:
    tmp_graph.write(response.text)
print("File name:", tmp_graph.name)

print("Then, get zipped archive with the raw data from the simulation:")

response = requests.get(f"http://{HOST}/cache/download/{raw_data_key}")

with tempfile.NamedTemporaryFile("w", delete=False, suffix=".tar") as tmp_raw:
    tmp_raw.write(response.text)
print("File name:", tmp_raw.name)

First of all, get the resulting graph from the simulation:
File name: /scratch-local/mbueschel.3559299/tmp15xfkrkp.ttl
Then, get zipped archive with the raw data from the simulation:
File name: /scratch-local/mbueschel.3559299/tmpz_87sxq2.tar


Now that we have down our data, we will interact with the data sink on the MarketPlace in order to store and organize our data.

For this purpose, we set our client ID (Make sure that you purchased it on the platform first).

In [12]:
client_id = "edb56699-9377-4f41-b1c7-ef2f46dac707"

And set the access token:

In [13]:
token = getpass()

 ········


And start the MP session:

In [14]:
session = MPSession(access_token=token, client_id=client_id)

In order to follow the conventions of DCAT (the data cataloge), we define a new collection name to which we upload our data. For simplicity, we will use the `task_id` as collection name.

In [22]:
print("Upload the graph:\n")

graph_name = f"graph-{task_id}"
raw_name = f"raw-data-ams-{task_id}"

objects = session.create_dataset_from_path(
        path=tmp_graph.name,
        dataset_name=graph_name,
        collection_name=graph_name ,
)
print(objects)

print("\nAnd upload the raw data:\n")

objects = session.create_dataset_from_path(
        path=tmp_raw.name,
        dataset_name=raw_name,
        collection_name=raw_name ,
)
print(objects)

Upload the graph:

{'last_modified': datetime.datetime(2023, 8, 29, 10, 20, 22, 67540)}
[('graph-8ac4b841-019f-4957-bbb5-be1d5f4b125d', '700396c8-2fe0-4807-be39-9d9374b8eb63'), ('/scratch-local/mbueschel.3559299/tmp15xfkrkp.ttl', None)]

And upload the raw data:

Error: Server returned 413 while creating dataset raw-data-ams-8ac4b841-019f-4957-bbb5-be1d5f4b125d: <html>
<head><title>413 Request Entity Too Large</title></head>
<body>
<center><h1>413 Request Entity Too Large</h1></center>
<hr><center>nginx/1.21.6</center>
</body>
</html>

[('raw-data-ams-8ac4b841-019f-4957-bbb5-be1d5f4b125d', 'eac669e6-a75c-41b9-bc05-443437373020'), ('/scratch-local/mbueschel.3559299/tmpz_87sxq2.tar', None)]


In [23]:
objects = session.list_datasets(collection_name=collection_name)
pprint(objects)

{'items': [{'bytes': 117922,
            'content_type': 'application/octet-stream',
            'hash': 'a48711d6c3dfa9ebf55209c5ba102b14',
            'last_modified': datetime.datetime(2023, 8, 29, 10, 16, 59, 182858),
            'name': 'graph-8ac4b841-019f-4957-bbb5-be1d5f4b125d'}]}


In [24]:
objects = session.get_collection_dcat(collection_name=collection_name)
pprint(objects)

('[{"@id":"http://marketplace-datasink.org/datasets/f7a3b0ed-c805-4baa-a3d6-b6abc40e0f97","@type":["http://www.w3.org/ns/dcat#Dataset"],"http://purl.org/dc/terms/identifier":[{"@value":"f7a3b0ed-c805-4baa-a3d6-b6abc40e0f97"}],"http://purl.org/dc/terms/isPartOf":[{"@value":"http://marketplace-datasink.org/catalogs/166092ae-a38a-4dde-9351-6ffd1b4a59f8"},{"@id":"http://marketplace-datasink.org/catalogs/166092ae-a38a-4dde-9351-6ffd1b4a59f8"}],"http://purl.org/dc/terms/issued":[{"@type":"xsd:date","@value":"2023-08-29 '
 '10:16:59.182858"}],"http://purl.org/dc/terms/modified":[{"@type":"xsd:date","@value":"2023-08-29 '
 '10:16:59.182858"}],"http://www.w3.org/ns/dcat#distribution":[{"@id":"http://marketplace-datasink.org/distributions/f7a3b0ed-c805-4baa-a3d6-b6abc40e0f97/graph-8ac4b841-019f-4957-bbb5-be1d5f4b125d"}]},{"@id":"http://marketplace-datasink.org/distributions/f7a3b0ed-c805-4baa-a3d6-b6abc40e0f97/graph-8ac4b841-019f-4957-bbb5-be1d5f4b125d","@type":["http://www.w3.org/ns/dcat#Distri

And let us send a SPARQL-query to a graph in the data sink:

Give me all calculation individuals?

In [30]:
query = f"""PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
    SELECT ?calculation WHERE {{
        ?calculation rdf:type  <{emmo.AtomisticCalculation.iri}> .
        }}
"""

objects = session.query_dataset(collection_name=graph_name, dataset_name=graph_name, query=query)
pprint(objects)

'[]'


Give all calculations with output TotalEnelectronic Energy?

In [32]:
query = f"""PREFIX skos:<http://www.w3.org/2004/02/skos/core#>

    SELECT ?calculation ?calculationname ?optgeometry ?valuenumber  ?unitsymbol WHERE {{
        ?calculationtype rdfs:subClassOf* <{emmo.Calculation.iri}> .
        ?calculation rdf:type ?calculationtype .
        ?calculationtype skos:prefLabel ?calculationname .
        ?calculation <{emmo.hasOutput.iri}> ?optgeometry .
        ?optgeometry <{emmo.hasProperty.iri}> ?totalelectronicenergy .
        ?totalelectronicenergy rdf:type <{emmo.TotalElectronicEnergy.iri}> .
        ?totalelectronicenergy <{emmo.hasQuantityValue.iri}> ?value .
        ?value <{emmo.hasNumericalData.iri}> ?valuenumber .
        ?totalelectronicenergy <{emmo.hasReferenceUnit.iri}> ?unit .
        ?unit <{emmo.hasSymbolData.iri}> ?unitsymbol .

    }}
"""

objects = session.query_dataset(collection_name=graph_name, dataset_name=graph_name, query=query)
pprint(objects)

'[]'
