Skip to content

Using MsPASS with Docker

Ian Wang edited this page Dec 17, 2019 · 26 revisions

Key concepts

To understand this procedure you need to be sure to understand a few key points.

  • Docker is a piece of software to create containers (lightweight virtual machine) that will run on your machine. It is best suited for a single host with multiple processors that can be exploited for parallel processing. See the related section on singularity for clusters.
  • A container in docker is a lightweight instance of a virtual machine that share a common configuration. It might be helpful to think of each container as a child of the root docker virtual machine. The containers run largely in isolation from each other but share a common virtual operating system.
  • docker-compose is a related tool for working with multiple docker containers that mspass uses for parallel operations. docker-compose is configured with a YAML file. For mspass an example configuration is stored in the top of the GitHub tree as the file docker-compose.yml.
  • An key feature of Docker is that it provides a way to standardize the setup of MongoDB, which would otherwise be a burden on most users. That step is described below.
  • It can be confusing to understand where data is stored in a virtual machine environment. In the discussion below files or data we reference that reside on a virtual machine will be set in italics. Local files/data will be referred to with normal font text.

Setting up your system

Overview

There are two distinctly different steps needed to set up your system for mspass with Docker: (1) installing Docker and docker-compose, and (2) configuration of the virtual machine environment for running mspass. The next two sections discuss details concerning these two steps.

Setting up docker

To install Docker on machines that you have root access, please refer to the guide here. For HPC systems, please refer to the following section and use Singularity instead.

On Macs Docker is a normal package available for download here. It is installed in the standard manner most Mac user's will be familiar with. It is installed in the Applications folder. To launch the docker daemon simply double click the application icon in the usual Mac way. It will create an icon on the task bar that can be used to terminate the daemon and a few other tasks that are described in the documentation for the software.

For linux system docker would normally be installed through the package manager used by that flavor of unix. For example, here is a useful source for ubuntu.

For linux systems we note two issues you may encounter that will speed this process:

  1. Without some tricks Docker can only be run with a sudo command. That means each "docker" call below would need to be change to "sudo docker". You can do that, but it can get annoying. To avoid this you need to manipulate groups to get your user name in the same group as docker. There are variants in Unix about how groups are handled. Follow this link for instructions on Ubuntu. You also may find it necessary to restart your machine to get the revised groups to be recognized.
  2. You will need both docker and docker-compose. Unix package managers may split them. e.g. on Ubuntu you need to use apt-get for both the key docker and docker-compose.

To proceed from here we assume Docker has been installed and the docker daemon is running in the background.

Configuring the virtual machine

Once you have docker setup properly, use the following command in a terminal to pull the docker image from Docker Hub to your local machine:

docker pull wangyinz/mspass

Be patient as this can take a few minutes. Note you can run this command from anywhere. It loads data only in the virtual machine data space so you will not see anything happen in the directory where you run this command but it will eat of a few hundred megabytes on your disk. Any data created and stored inside the container will be opaque from the local system (outside) except for the ones in the directories that are mounted to the container.

Getting MongoDB Running with Docker

Now you need to make a decision about where you will be storing data on your local system. Most data in mspass is managed by MongoDB. Here we set up MongoDB to run within the container but write to a local directory. We will refer to that top level directory in the remainder of this document as DBDIR.

To start MongoDB cd to the DBDIR you have chosen, and use this command to start the MongoDB server:

docker run --name MsPASS -d --mount src=`pwd`,target=/home,type=bind wangyinz/mspass
  • The --name option will give the container under it will run the name MsPASS.
  • The -d option will let the container run as a daemon so that the process will be kept in the background.
  • The --mount option will bind current directory (DBDIR) on your system to /home within the container. /home is the default directory for database files and logs. This option keeps the files outside of the container, so your data will be accessible after the container is removed. The container will create a data directory for all the database files if /home/data does not already exist.

Alternatively, if you are doing development or want to use MongoDB for another purpose you may want to be access MongoDB server from outside of the docker container. If so, use the following command:

docker run --name MsPASS -d -p 27017:27017 --mount src=`pwd`,target=/home,type=bind wangyinz/mspass

The only difference from the above is we add the -p 27017:27017 arguments to map host port 27017 to the container's port 27017. Note that 27017 is the default communications port for MongoDB. If there are collisions with port 27017 on the host change the first port number (e.g. to use port 9999 on the host use -p 9999:27017.)

Testing setup

On your initial setup you will want to verify this is all working. After the above step, you may have to wait for a couple seconds for the MongoDB server to initialize. Then, you can launch the MongoDB client with:

docker exec -it MsPASS mongo

This will launch the mongo shell within the MsPASS container created from previous command. The -i and -t specifies an interactive pseudo-TTY session. If you get a >> prompt without any errors this has succeeded. You can exit the interactive shell by typing exit or ^D.

Exiting

If you won't be using mspass for a while you will want to stop MongoDB from running and wasting machine resources. If it is not already running, first launch a MongoDB client shell as described in the previous section. To stop the mongoDB server , type the following commands in the mongo shell:

use admin
db.shutdownServer()

and then remove the container with:

docker rm MsPASS

Getting Spark and MongoDB Running with Docker

This section assumes the preliminaries above have all been completed and you have exited the tests (i.e. you did the steps in the previous sections).

We will use the docker-compose command to launch two additional container instances that defined a Spark standalone cluster. One is called mspass-master that will be used in this example to run the MongoDB server and Spark master. The other is called mspass-worker and will be used to run a Spark worker. Both containers will be running on the same machine in this setup.

The configuration for this example is found at the root of this repository as a file called docker_compose.yml. We will call that directory $path_to_MsPASS. Now cd to the root directory for your data area you created above and type

docker-compose -f $path_to_MsPASS/docker-compose.yml up -d

Note that the -d option will let the containers run as daemons so that the processes will be kept in the background.

Once the containers are running, you will see several log files from MongoDB and Spark created in the current directory. Since we have the port mapping feature of Docker enabled, you can also open localhost:8080 in your browser to check the status of Spark through the master’s web UI, where you should see the worker is listed as ALIVE. Note that the links to the worker will not work due to the container's network setup.

First, we want to make sure the Spark cluster is setup and running correctly. This can be done running the pi calculation example within the Spark distribution. To submit the example from mspass-master, use:

docker exec mspass-master /usr/local/spark/bin/run-example --master spark://mspass-master:7077 SparkPi 10

to submit it from mspass-worker, use:

docker exec mspass-worker /usr/local/spark/bin/run-example --master spark://mspass-master:7077 SparkPi 10
  • The docker exec will run the command within the mspass-master or mspass-worker container.
  • The --master option specifies the Spark master, which is mspass-master in our case. The 7077 is the default port of Spark master.

The output of this example is very verbose, but you should see a line of Pi is roughly 3.141... near the end of the stdout, which is the result of the calculation. You should also see the jobs in the Running Applications or Completed Applications session at localhost:8080.

To launch an interactive mongo shell within mspass-master, use:

docker exec -it mspass-master mongo

To access the MongoDB server from mspass-worker, use:

docker exec -it mspass-worker mongo --host mspass-master
  • The -it option opens an interactive pseudo-TTY session
  • The --host option will direct the client to the server running on mspass-master.

To launch an interactive Python session to run Spark jobs, use the pyspark command through mspass-master:

docker exec -it mspass-master pyspark \
  --conf "spark.mongodb.input.uri=mongodb://mspass-master/test.myCollection?readPreference=primaryPreferred" \
  --conf "spark.mongodb.output.uri=mongodb://mspass-master/test.myCollection" \
  --conf "spark.master=spark://mspass-master:7077" \
  --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1

or through mspass-worker:

docker exec -it mspass-worker pyspark \
  --conf "spark.mongodb.input.uri=mongodb://mspass-master/test.myCollection?readPreference=primaryPreferred" \
  --conf "spark.mongodb.output.uri=mongodb://mspass-master/test.myCollection" \
  --conf "spark.master=spark://mspass-master:7077" \
  --packages org.mongodb.spark:mongo-spark-connector_2.11:2.4.1
  • The three --conf options specify the input, output database collections, and the Spark master. The Spark master and the MongoDB server are running on mspass-master, so the urls should point to that in both cases. Please substitute test and myCollection with the database name or collection name desired.
  • The --packages option will setup the MongoDB Spark connector environment in this Python session.

Please refer to this documentation for more details about the MongoDB Spark connector.

To bring down the containers, run:

docker-compose down

or

docker-compose -f path_to_MsPASS/docker-compose.yml down