Clowd is a command-line utility for iteratively developing pipelines, deploying them at scale, and sharing data and derivatives.
- System Requirements
Clowdr can be thought of as a cloud execution utility for Boutiques, the JSON-based descriptive command-line framework. As Boutiques and the Boutiques tools allow the encapsulation, validation, evaluation, and deployment of command-line routines, Clowdr inherits and extends this functionality to remote datasets and computational resources.
Clowdr exposees several levels of evaluation:
local runs tasks using the
system scheduler, and paired with the
-dev flag can enable the rapid prototyping of tools, descriptors, and invocations.
cluster mode generates the exact same executions as in
local but submits them through a cluster's scheduler for parallel
cloud runs the tasks on a remote cloud such as Amazon. Finally, the
share mode launches a light-weight
webserver, ultimately generating a static HTML page which can be stored and redistributed that documents provenance and run
information for the launched tasks.
Clowdr requires Python3 and either Docker or Singularity. It has only been tested on Mac OSX and Linux, though no requirements are specific to these operating systems and suggest that it may also function properly on Windows.
Installation is quite simple - just run:
pip install clowdr
Clowdr is available on Docker Hub, and can be downloaded with:
docker pull clowdr/clowdr
Clowdr is also available on Singularity Hub, and can be downloaded with:
singularity pull clowdr/clowdr
(For up to date command-lines please check out our documentation)
Below we'll explore each of the main three modes of operation for Clowdr. If in doubt, always feel free to turn back to the help-text:
From this directory, assuming the BIDS dataset
ds114 is installed at
your system has Docker installed, run:
clowdr local examples/descriptor_d.json examples/invocation.json examples/task/ /data/ds114/ -v /data/ds114/:/data/ds114 -b
What you just did was launched
local mode, with the tool
examples/descriptor_d.json, invocation at
where outputs will be stored in the Clowdr directory,
examples/task, from data stored at
/data/ds114, and being mounted to that same container
-v /data/ds114:/data/ds114, that happens to be organized according to the BIDS specification,
-b. If you also wanted verbose output,
-V, or to develop,
-d, as well as some other options, there are flags that can be discovered with the help flag,
If the data wasn't organized in BIDS format, we could provide a directory of invocations in place of
examples/invocation.json, or of course a
single invocation and omit the
-b flag in both cases to run either a group of tasks or single task, respectively.
You can now look in the Clowdr directory to see the outputs of this pipeline.
IF you want to scale up your analysis, you can then turn to the cluster mode. The arguments supplied are exactly the same, with some minor
additions such as adding the hostname to your data location, specifying your cluster type, here
slurm, and your account identifier, job
clowdr cluster ./examples/descriptor_s.json ./examples/invocation.json ./examples/task/ server.hostname.ca:/path/to/data/ slurm -v /path/to/data/:/data/ --account my-account-id --jobname clowdr-taskname -b
The execution takes place here exactly like in the local mode, where here we specified a singularity version of the descriptor. Flags such as
-d for development/single-execution mode also are consistent in this mode and helpful for prototyping analyses prior to large executions.
Presuming you ran locally and were happy with the results, but have larger collections of data you'd like to process, and don't have access to a
cluster, you can turn to the cloud. If you've uploaded the same dataset to Amazon Web Services S3 at
s3://mybucket/ds114/, and have your
credentials stored in this directory at
clowdr cloud examples/descriptor_d.json examples/invocation.json s3://mybucket/clowdr/ s3://mybucket/ds114/ aws credentials.csv -bv -r us-east-1
Here, you also did the same as above, except in
cloud mode, with remote data on S3, specifying the Amazon endpoint,
aws, and setting your Amazon
Once Clowdr tasks are launched, they will return a directory which will be home to the output task information - either on Amazon S3 or local, depending on the parameters provided. The share mode allows you to quickly inspect and explore the launched tasks, give updates on their status, and ultimately provides a static HTML page which can be downloaded and shared with the processed derivatives as provenance information about the execution. You can point the share service at either your Clowdr output directory, or in the case of an example packaged with the repository, the line below:
clowdr share ./examples/task/bids-example/clowdr/ -d
For detailed and up-to-date documentation, check out our read-the-docs page, at clowdr.rtfd.io.
This project is covered under the MIT License.
If you're having trouble, notice a bug, or want to contribute (such as a fix to the bug you may have just found) feel free to open a git issue or pull request. Enjoy!