Skip to content

Using MsPASS with Singularity (on HPC)

Gilborne edited this page Feb 5, 2023 · 6 revisions

Introduction to Singularity

What’s Singularity

Singularity is a container implementation that runs on HPC cluster environments. Singularity provides similar features as Docker, and can handle the “escalated privilege” issue of Docker. Singularity is compatible with Docker containers, which means that one can develop the container in Docker, and use Singularity as the runtime on HPC clusters.

Getting Started with Singularity

In this section, we will walk through the steps of building a singularity container. Unless otherwise noted, the following commands and outputs are run on the stampede cluster of TACC.

Building A Singularity Image from Docker

Singularity is compatible with docker container, which is a more widely-used implementation. So we would recommend building a singularity image from docker.

First of all, use the following command to load the singularity module on HPC system:

module load tacc-singularity

Then, execute:

singularity build mspass.simg docker://mspass/mspass

The command above will build a Singularity container corresponding to the latest released MsPass docker container, and save to mspass.simg in the current directory.

For development purposes, one may want to build an image from a customized docker image. If your HPC environment supports docker runtime, you can follow these instructions to build a customized singularity image:

docker build -t local/mspass_dev .	# Build the docker image from Dockerfile

singularity build mspass.simg docker-daemon://local/mspass_dev

However, if the HPC environment doesn’t support docker (like TACC), one can first build the docker image on a local machine, and pack it into a tar file using:

docker save -o mspass_dev.tar local/mspass_dev		# Dump the docker image to a tar file on local machine

Then, transport this tar file to an HPC machine using scp or other protocols. To build the singularity image from a docker tar file, use:

singularity build mspass_dev.simg docker-archive://mspass_dev.tar

Sandbox Mode

The approach described above produces images in Singularity Image File (SIF) format. SIF images are read-only and suitable for production. However, you might want to make changes on the image frequently. In such a case, you can add --sandbox option to build command to create a writable directory for interactive development. For example:

singularity build --sandbox mspass_dev.simg docker-archive://mspass_dev.tar

Running a singularity container

After the image is created, launch the container with:

singularity run mspass_dev.simg

Similar to Docker, the Singularity executes the ENTRYPOINT script in the container when initiated by the run command. In the mspass image, the entrypoint script is stored in usr/sbin/start-mspass.sh. You can refer to the code here. The application run in the container is determined by the environment variables passed to it. Without specifying, the container will run all facilities (scheduler, worker, db) in it, as a standalone node. For instructions on setting up roles for containers, please refer to the following sections.

Running MsPass with Singularity

MsPass consists of 3 components: Mongodb server, Workers and Scheduler. In this section, we will first introduce how to configure and run these applications (MongoDB, Dask/Spark) with Singularity, then give the example script to automatically setup all nodes and applications.

Manual Setup

In this part, we introduce how to manually run each application with Singularity.

Running MongoDB with Singularity

MongoDB Single Node Server

Before starting a MongoDB server, please make sure you have a dedicated directory created for the database files. Here we assume that to be ./data. The command to start the mongoDB server for localhost only is:

singularity exec mspass.simg mongod --dbpath ./data --logpath ./log --fork --bind_ip_all

The --dbpath and --logpath options of mongod specify where to keep the database files and logs. The --fork will let the MongoDB server process run in the background. The --bind_ip_all will let the MongoDB instance bind to all ipv4 addresses of the machine.

Then, launch the client locally with:

singularity exec mspass.simg mongo

To stop the MongoDB server, type the following command in the mongo shell:

use admin
db.shutdownServer()

MongoDB Shared Cluster Server

Sharding is a method for distributing data across multiple nodes. MsPass supports sharding for mongodb server to improve the performance. A sharding mongodb cluster consists of 3 components: query router, config server and shards. For more information, please refer to MongoDB Sharding.

Normally, we build the config server and query router on the same machine (master node). Assume we want to run the config server and query router on node-0, and shards on other nodes (node-1, node-2, … node-n):

First launch the config server and query router on node-0 using:

singularity exec mspass.simg bash -c 'mongod --port 27018 --configsvr --replSet configserver --dbpath ./data_config --logpath ./log_config --bind_ip_all &;
mongo --port 27018 --eval "rs.initiate({_id: \"configserver\", configsvr: true, version: 1, members: [{ _id: 0, host : \"node-0:27018\" }]})"; mongos --port 27017 --configdb configserver/node-0:27018 --logpath log_router --bind_ip_all &'

Then, add shard clusters to the mongodb collection on the master node:

singularity exec mspass.simg mongo --host node-1 --port 27017 --eval "sh.addShard(\"node-1\")"
singularity exec mspass.simg mongo --host node-2 --port 27017 --eval "sh.addShard(\"node-2\")"
…
singularity exec mspass.simg mongo --host node-n --port 27017 --eval "sh.addShard(\"node-n\")"

To launch shard nodes, login to each shard node (e.g. node-1) and use:

singularity exec mspass.simg mongod --port 27017 --shardsvr --replSet "rs1" --dbpath ./data_shard --logpath ./log_shard --bind_ip_all
singularity exec mspass.simg mongo --port 27017 --eval "rs.initiate({_id: \"rs1}\", version: 1, members: [{ _id: 0, host : \"node-1:27017\" }]})"

These commands initialize the shard replica set on the node, and configure the shard server. Please note that the “rs1” parts should be replaced with the correct index corresponding to the node number (node-2 -> rs2).

After the sharded cluster is deployed, one can shard the target collections by executing the following commands on the master node:

singularity exec mspass.simg mongo --host node-0 --port 27017 --eval "sh.shardCollection(\"DATABASE.COLLECTION}\", { INDEX: \"hashed\"})"

The example above shards the DATABASE.COLLECTION with a hashed shared key on the INDEX field. For more detailed information and instructions on sharding a collection, please refer to the manual here.

Running Dask with Singularity

Starting a Dask Scheduler

To start a dask scheduler on node-0, use:

singularity exec mspass.simg dask-scheduler --port 8786

You might want to collect the log output of dask-scheduler by adding > ./dask_scheduler_log 2>&1 & at the end of the command.

Starting a Dask Worker Node

Before starting a dask worker node, make sure you have created a directory to store the runtime data for dask. Here we assume that to be ./dask_worker_dir. The command to start a worker node is:

singularity exec mspass.simg dask-worker --nprocs 8 --nthreads 1 --memory-limit 0 --local-directory ./dask_worker_dir tcp://node-0:8786

It will start a worker node, and connect to the scheduler (node-0). And it configures the worker node to use 8 cores with 1 thread each. To learn more about settings for workers, please refer to the manual here.

Running Spark with Singularity

To start a spark scheduler on node-9, you will need to first specify environment variables $SPARK_HOME to the install directory of spark. To launch a spark scheduler, use:

singularity exec mspass.simg $SPARK_HOME/sbin/start-master.sh

To launch a spark worker, use

singularity exec mspass.simg $SPARK_HOME/sbin/start-slave.sh spark://node-0:$SPARK_MASTER_PORT

To integrate MongoDB and Spark, first access the MongoDB in the same way as described above. Then to launch python shell with pyspark, use:

singularity exec mspass.simg pyspark \
--conf "spark.mongodb.input.uri=mongodb://node-0:$27017/test.misc" \
--conf "spark.mongodb.output.uri=mongodb://node-0:27017/test.misc" \
--conf "spark.master=spark://node-0:7077" \
--packages org.mongodb.spark:mongo-spark-connector_2.12:3.0.0

Getting MongoDB, Dask/Spark Running Using Entrypoint Script

In addition to using the Singularity exec command to launch and configure each component, MsPass container supports another simple approach: Entrypoint.

The entrypoint script contains all the commands for setting up and starting MongoDB and Dask/Spark. Users only need to configure some variables before running the script. However, for users who want to customize the configuration for each component, they can still launch all components in the approach described in the last section.

In the entrypoint script, different roles are defined: db, dbmanager, shard, scheduler, worker and frontend. Each role corresponds to some components, specifying the role and running the script will launch the corresponding components.

Here is the definition of each role:

db: A standalone MongoDB server
dbmanager: A MongoDB master node running a query router and a config server, only necessary for a distributed shard cluster
shard: A MongoDB shard node running a shard replica, only necessary for a distributed shard cluster
scheduler: A scheduler node for Dask/Spark, users need to specify $MSPASS_SCHEDULER before running to determine whether dask or spark will be in use
worker: A worker node for Dask/Spark, users need to specify $MSPASS_SCHEDULER before running
frontend: A jupyter notebook instance, usually runned in the same machine as db/dbmanager
all: Default value, it starts MsPass system on a standalone machine, running a MongoDB server, a scheduler, a worker, and a jupyter frontend instance

To launch one certain role for MsPass, first set the SINGULARITYENV_MSPASS_ROLE environment variable, then start the singularity instance using singularity run command. For example, to start a standalone instance, use:

SINGULARITYENV_MSPASS_ROLE=all;
singularity run mspass.simg

For more information on the usage of entrypoint script, please refer to the source code here. Some other important environment variables are listed below:

SINGULARITYENV_MSPASS_WORK_DIR: The directory that stores worker’s runtime data
SINGULARITYENV_MSPASS_SCHEDULER_ADDRESS: The IP Address of the scheduler node
SINGULARITYENV_MSPASS_DB_PATH: The directory that stores mongodb data, should be either ‘tmp’ or ‘scratch’ on the TACC machine
SINGULARITYENV_MSPASS_DB_ADDRESS: The IP Address of the mongodb server

More examples can be found in distributed_node.sh and single_node.

Examples: Setup Scripts

To simplify the setup configuration, users can use the examples given in the repo. In this section, we will briefly describe how to use these scripts.

Please note that these scripts are written for the TACC environment, users might need to make some changes to the scripts to fit with their own HPC environment.

Quick Start

To run the MsPass on TACC Stampede machines, simply use: sbatch ./single_node.sh or sbatch ./distributed_node.sh. The sbatch command will submit the batch to the cluster, the workload manager (Slurm) will allocate the nodes and run the batch script on the first node allocated.

After the script is commited and executed, a file will be created on the current directory (mspass.o{JOBID}), it stores all the log from MsPass:

Sun Mar 20 02:44:20 CDT 2022
primary node c492-092
got login node port 19292
Created reverse ports on Stampede2 logins
cleaning done
[WARN  tini (144729)] [WARN  tini (144730)] Tini is not running as PID 1 and isn't registered as a child subreaper.
Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
Zombie processes will not be re-parented to Tini, so zombie reaping won't work.

[WARN  tini (206416)] Tini is not running as PID 1 and isn't registered as a child subreaper.
Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
[WARN  tini (145112)] Tini is not running as PID 1 and isn't registered as a child subreaper.
Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.
[I 02:46:48.738 NotebookApp] Serving notebooks from local directory: /scratch/08431/ztyang/mspass/workdir
[I 02:46:48.738 NotebookApp] Jupyter Notebook 6.2.0 is running at:
[I 02:46:48.738 NotebookApp] http://c492-092.stampede2.tacc.utexas.edu:8888/?token=fee57b8bf77dc608fc32ecf6ad8d85a70467d84cf52b439d
[I 02:46:48.738 NotebookApp]  or http://127.0.0.1:8888/?token=fee57b8bf77dc608fc32ecf6ad8d85a70467d84cf52b439d
[I 02:46:48.738 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 02:46:48.749 NotebookApp]

    To access the notebook, open this file in a browser:
        file:///home1/08431/ztyang/.local/share/jupyter/runtime/nbserver-145143-open.html
    Or copy and paste one of these URLs:
        http://c492-092.stampede2.tacc.utexas.edu:8888/?token=fee57b8bf77dc608fc32ecf6ad8d85a70467d84cf52b439d
     or http://127.0.0.1:8888/?token=fee57b8bf77dc608fc32ecf6ad8d85a70467d84cf52b439d

As can be seen from above, you will get the token for jupyter notebook from the log. If the reverse tunnel port is enabled, you will get a login node port. Replace the 8888 with the new port number, you will get the external jupyter link. (In this case, http://stampede2.tacc.utexas.edu:19292/?token=fee57b8bf77dc608fc32ecf6ad8d85a70467d84cf52b439d). In the meanwhile, another port will also be open for the status page, the port number is new port number + 1.

When MsPass is running, check the task status using:

showq | grep mspass

To stop the MsPass task, you can simply wait till the end of the run time, or explicitly cancel the task using:

scancel {JOBID}

Single Node

Let’s first look at the single_node.sh, it is for setting up a standalone instance with the mongodb server, dask scheduler and worker, and the jupyter frontend all on a single node.

The script first setup some environment variables for HPCs:

#SBATCH -J mspass           # Job name
#SBATCH -o mspass.o%j       # Name of stdout output file
#SBATCH -p skx-dev          # Queue (partition) name
#SBATCH -N 1                # Total # of nodes (must be 1 for serial)
#SBATCH -n 1                # Total # of mpi tasks (should be 1 for serial)
#SBATCH -t 02:00:00         # Run time (hh:mm:ss)
#SBATCH -A MsPASS           # Allocation name (req'd if you have more than 1)

You might want to change some of these fields to your own interest. Note: the queue name and allocation name here refer to TACC Slurm Queues and TACC Allocation.

Then the script sets the work directory and the singularity image to use:

WORK_DIR=$SCRATCH/mspass/single_workdir
MSPASS_CONTAINER=$WORK2/mspass/mspass_latest.sif
DB_PATH='scratch'

Make sure that you have set the WORK_DIR value to a dedicated directory to store all runtime data, and also set the MSPASS_CONTAINER to the singularity image you created. For the DB_PATH, there are two options: 'scratch' and 'tmp'. 'scratch' saves the DB data to a shared filesystem, 'tmp' saves the DB data to a local filesystem. For more details, please refer to TACC user guide.

The script enables external access to the jupyter webpage by creating reverse tunnel port:

for i in `seq 4`; do
    ssh -q -f -g -N -R $LOGIN_PORT:$NODE_HOSTNAME:8888 login$i
    ssh -q -f -g -N -R $STATUS_PORT:$NODE_HOSTNAME:8787 login$i
done

The code above creates one tunnel for each login node so that the users can just connect to stampede.tacc to visit the jupyter notebook and the status page. If you are not using a TACC machine, you might need to modify or delete this part to run on your own HPC environment.

Distributed Node

To run MsPass on a distributed cluster, use the distributed_node script.

As the single node script, the distribute node script also does setup at the beggining:

#SBATCH -J mspass           # Job name
#SBATCH -o mspass.o%j       # Name of stdout output file
#SBATCH -p skx-dev          # Queue (partition) name
#SBATCH -N 3                # Total # of nodes (must be 1 for serial)
#SBATCH -n 3                # Total # of mpi tasks (should be 1 for serial)
#SBATCH -t 02:00:00         # Run time (hh:mm:ss)
#SBATCH -A MsPASS           # Allocation name (req'd if you have more than 1)

The main difference here is the number of nodes and mpi tasks should be set to more than 1. One can set these two value to any number, but a larger number might result in longer waiting time in queue.

Here are some variables special for distributed node script:

HOSTNAME_BASE='stampede2.tacc.utexas.edu'
DB_SHARDING=true
SHARD_DATABASE="usarraytest"
SHARD_COLLECTIONS=(
    "arrival:_id"
)

HOSTNAME_BASE is used to get the address of each node, make sure you set this value correctly according to the HPC cluster you use.

DB_SHARDING indicate whether the sharding is in use or not.

SHARD_DATABASE and SHARD_COLLECTIONS are used for the MongoDB's Sharding feature, indicating the database, collection and index you want to use for sharding. One can also skip these two variables and set the sharding through mongodb commands. Please refer to the section MongoDB Shared Cluster Server above for more detailed instructions.