Skip to content

Using MsPASS with Singularity (on HPC)

Ian Wang edited this page Oct 3, 2019 · 6 revisions

On machines that have Singularity setup. Use the following command to build the image as mspass.simg in current directory:

singularity build mspass.simg docker://wangyinz/mspass

Getting MongoDB Running with Singularity on a Single Node

Before starting the MongoDB server, please make sure you have a dedicated directory created for the database files. Here we assume that to be ./data. The command to start the mongoDB server for localhost only is:

singularity exec mspass.simg mongod --dbpath ./data --logpath ./log --fork
  • The --dbpath and --logpath options of mongod specify where to keep the database files and logs.
  • The --fork will let the MongoDB server process running in the background.

Then, launch the client locally with:

singularity exec mspass.simg mongo

To stop the MongoDB server, type the following command in the mongo shell:

use admin
db.shutdownServer()

Getting MongoDB Running with Singularity on Multiple Nodes

First, request a interactive session with more than one node. Below we assume the hostname (output of the hostname command) of the two nodes requested are node-1 and node-2. Please make sure to change the names according to your system setup.

Assuming we want to have the MongoDB server running on node-1, for a remote client to connect, start the server with:

singularity exec mspass.simg mongod --dbpath ./data --logpath ./log --fork --bind_ip_all
  • --bind_ip_all will bind the MongoDB server to all IPv4 addresses, so it can be accessed from another node.

To launch the client from node-2, simply ssh node-2 to get to that node and then:

singularity exec mspass.simg mongo --host node-1

It will connect to the MongoDB server running on node-1.

To stop the MongoDB server, type the following command in mongo shell on node-1:

use admin
db.shutdownServer()

Getting Spark and MongoDB Running with Singularity on Multiple Nodes

Assume the two nodes requested in a interactive session are node-1 and node-2. To launch the Spark master and the MongoDB server on node-1, use the following command on node one:

singularity run mspass.simg &

This will require a data directory already created at current directory. It will also create the log files of Spark master and MongoDB in current directory. The & will let the servers running in the background.

To launch a Spark worker on node-2, first ssh node-2, and then run

singularity exec mspass.simg bash -c 'export SPARK_MASTER=node-1; \
    export SPARK_LOG_DIR=path_to_current_dir; \
    export SPARK_WORKER_DIR=path_to_current_dir; \
    $SPARK_HOME/sbin/start-slave.sh spark://$SPARK_MASTER:$SPARK_MASTER_PORT'

You will need to specify three environment variables: SPARK_MASTER, SPARK_LOG_DIR, and SPARK_WORKER_DIR in this version.

To test the setup with the Pi calculation example, use the following command on either node-1 or node-2:

singularity exec mspass.simg /usr/local/spark/bin/run-example --master spark://node-1:7077 SparkPi 10

Each run will create a directory named as app-X-X, which contains the files such as stderr.

The MongoDB can be accessed in the same way as described above.

To launch the Python shell with pyspark, use:

singularity exec mspass.simg pyspark \
    --conf "spark.mongodb.input.uri=mongodb://node-1/test.myCollection?readPreference=primaryPreferred" \
    --conf "spark.mongodb.output.uri=mongodb://node-1/test.myCollection" \
    --conf "spark.master=spark://node-1:7077" \
    --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1