# MarketPlace HPC Gateway SDK

The HPC gateway SDK is provide for the app developers or the MarketPlace user to run the time consuming tasks over the clusters.
You can install this python SDK and use it to interact with the cluster to run the simulation jobs.
We have two HPC deployments in the MarketPlace, the EPFL Materials Cloud (mc) and the IWM deployment.

- The `iwm` deployment **does not** have the slurm running properly on the cluster at the moment, therefore the job submit is not working, but all other capabilities are working.
- The EPFL Materials Cloud (mc) deployment support all capabilities and app developers can use it embeded in the app that need to running the heavy calculations. 
However, the `mc` deployment is only for test, the time limit of the job is hard code to 10 minutes. 
The `mc` deployment will not be maintained after March 2023.

## Install the SDK

The HPC gateway SDK is provide for the app developers or the MarketPlace user to run the time consuming tasks over the clusters.
To install the SDK package run:

In [None]:
%pip install marketplace-hpc

## Initialize the app instance

Use `hpc_gateway_sdk.get_app` to create an interface to intereact with the HPC gateway app.
The name can be either `iwm` or `mc` for two deployment. 

Note the `iwm` deployment **don't** have the slurm running properly at the moment, therefore the job submit is not working, but all other capabilities are working.

The EPFL Materials Cloud (mc) deployment support all capabilities and app developers can use it embeded in the app that need to running the heavy calculations. 
However, the `mc` deployment is only for test, the time limit of the job is hard code to 10 minutes. 
The `mc` deployment will not be maintained after March 2023.
To initialize the instance, provide the deployment name and MarketPlace `access_token`. 
The access_token can be relay from the App that using the hpc gateway app as calculation backend.
To run this notebook, put the `.env` file with `ACCESS_TOKEN` set in the same folder.

In [1]:
from hpc_gateway_sdk import get_app
from dotenv import load_dotenv
import os

load_dotenv(".env")

access_token = os.environ.get("ACCESS_TOKEN")
app = get_app(name="mc", access_token=access_token)

The first time using the HPC gateway app, you need to create the user in the database of HPC app to record the job data conrespond to every MarketPlace user account.
Meanwhile, `create_user` will create the user folder in the cluster to store jobs folder repository.

In [2]:
user_info = app.create_user()
print(user_info)

{
  "_id": "638f355e57bd4aa2a97b98d0", 
  "email": "jusong.yu@epfl.ch", 
  "home": "/scratch/snx3000/jyu/firecrest/jusong_yu", 
  "message": "Success: Create user in database.", 
  "name": "Jusong Yu"
}



To create a job, using `create_job` endpoint of the gateway app.
It will create a job folder in the remote cluster to store files and to submit job.
The `jobid` is returned for further operations.
The parameter `new_transformation` is a dict for the job information and is used to create the slurm job script.
The following parameters must be provide to create the job.

- `job_name`: the name of the job.
- `ntasks_per_node`: the numper of tasks per node a.k.a the mpi stacks of your job, is the number follow the `mpirun -n`.
- `partition`: for the EPFL Materials Cloud (mc) deployment, the partition can be used are `debug` and `normal`. 
- `image`: For security and agile deployment purpose, we use singularity to run the simulation inside the container. The image can be a container from a given URI. Supported URIs include:

    - library: Pull an image from the currently configured librar (library://user/collection/container[:tag])
    - docker: Pull a Docker/OCI image from Docker Hub, or another OCI registry.(docker://user/image:tag)
    - shub: Pull an image from Singularity Hub (shub://user/image:tag)
    - oras: Pull a SIF image from an OCI registry that supports ORAS. (oras://registry/namespace/image:tag)
    - http, https: Pull an image using the http(s?) protocol
- executable_cmd: the command to run the simulation inside the container.

We not yet support using private docker register of MarketPlace (distributed on gitlab). 
Once we have a gitlab account for this purpose, just set following environment variables on the remote cluster.

```bash
export SINGULARITY_DOCKER_USERNAME='$oauthtoken'
export SINGULARITY_DOCKER_PASSWORD=<redacted>
```

As mentioned, the EPFL Materials Cloud (mc) deployment is only for test purpose, the time is limited to 10 mins.

To build a container that can run the parallel simulation, please check the example of the LAMMPS and Quantum ESPRESSO dockerfile on https://github.com/containers4hpc.
The container is encouraged to build based on the `base-mpi314` image which use MPICH v3.1.4 that supoort [ABI compatible](https://www.mpich.org/abi/) and can run with multiple compatible MPI library.

In [3]:
jobid = app.create_job(new_transformation={
  "job_name": "demo00",
  "ntasks_per_node": 1,
  "partition": "debug",
  "image": "docker://hello-world:latest",
  "executable_cmd": "> output",
})
print(jobid)

639615ac5be67d529e2187cd


The `create_job` only will prepare the folder and the slurm job script in the remote cluster, to launch the simulation we provide the `launch_job`.
Pass the jobid from the output of `create_job`, the job will be launched in the remote cluster. 
The email of job state will be send to user by the email registered, of the MarketPlace account.

In [4]:
resp = app.launch_job(jobid)
resp

{'jobid': '639615ac5be67d529e2187cd'}

The `check_job_state` is used to getting the file list of the job. 

In [5]:
resp = app.check_job_state(jobid)
resp

{'files': [{'group': 'mrcloud',
   'last_modified': '2022-12-11T18:38:52',
   'link_target': '',
   'name': 'job.sh',
   'permissions': 'rw-r--r--',
   'size': '519',
   'type': '-',
   'user': 'jyu'},
  {'group': 'mrcloud',
   'last_modified': '2022-12-11T18:39:01',
   'link_target': '',
   'name': 'slurm-43437415.out',
   'permissions': 'rw-r--r--',
   'size': '0',
   'type': '-',
   'user': 'jyu'}],
 'message': 'Files in the job folder.'}

You can cancel the job by `cancel_job`.

In [6]:
resp = app.cancel_job(jobid)
resp

{'message': 'Send cancelling signal to job-639615ac5be67d529e2187cd, of f7t job id=43437415'}

The input files are usually needed to run the simulation, the files can be upload by `upload_file` as the example shown below.

In [7]:
app.upload_file(jobid, filename="file_upload_test.txt", source_path="./file_upload_test.txt")
resp = app.check_job_state(jobid)
resp

{'files': [{'group': 'mrcloud',
   'last_modified': '2022-12-11T18:39:07',
   'link_target': '',
   'name': 'file_upload_test.txt',
   'permissions': 'rw-r--r--',
   'size': '7',
   'type': '-',
   'user': 'jyu'},
  {'group': 'mrcloud',
   'last_modified': '2022-12-11T18:38:52',
   'link_target': '',
   'name': 'job.sh',
   'permissions': 'rw-r--r--',
   'size': '519',
   'type': '-',
   'user': 'jyu'},
  {'group': 'mrcloud',
   'last_modified': '2022-12-11T18:39:05',
   'link_target': '',
   'name': 'output',
   'permissions': 'rw-r--r--',
   'size': '807',
   'type': '-',
   'user': 'jyu'},
  {'group': 'mrcloud',
   'last_modified': '2022-12-11T18:39:08',
   'link_target': '',
   'name': 'slurm-43437415.out',
   'permissions': 'rw-r--r--',
   'size': '1489',
   'type': '-',
   'user': 'jyu'}],
 'message': 'Files in the job folder.'}

Once the simulation finished or excepted, the output or the slurm error file should be download to check the job details. 
The binary file is supported, to write it to a file, you can use the following code.

In [8]:
resp = app.download_file(jobid, filename="output")
with open("output", 'wb') as csr:
      for chunk in resp.iter_content(chunk_size=1024):
          if chunk:
              csr.write(chunk)

To delete the file in job folder, use `delete_file`.

In [9]:
app.delete_file(jobid, filename="file_upload_test.txt")
resp = app.check_job_state(jobid)
resp

{'files': [{'group': 'mrcloud',
   'last_modified': '2022-12-11T18:38:52',
   'link_target': '',
   'name': 'job.sh',
   'permissions': 'rw-r--r--',
   'size': '519',
   'type': '-',
   'user': 'jyu'},
  {'group': 'mrcloud',
   'last_modified': '2022-12-11T18:39:05',
   'link_target': '',
   'name': 'output',
   'permissions': 'rw-r--r--',
   'size': '807',
   'type': '-',
   'user': 'jyu'},
  {'group': 'mrcloud',
   'last_modified': '2022-12-11T18:39:08',
   'link_target': '',
   'name': 'slurm-43437415.out',
   'permissions': 'rw-r--r--',
   'size': '1489',
   'type': '-',
   'user': 'jyu'}],
 'message': 'Files in the job folder.'}