Skip to content

Commit

Permalink
fix modellauncher
Browse files Browse the repository at this point in the history
This fixes issue PecanProject#2262 preventing modellauncher from working.

Updated documentation for modellauncher
Added an example xml file in tests for modellauncher.
  • Loading branch information
robkooper committed Feb 1, 2019
1 parent 7e7b70a commit f79f578
Show file tree
Hide file tree
Showing 5 changed files with 82 additions and 4 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ For more information about this file see also [Keep a Changelog](http://keepacha

## [Unreleased]

### Fixes
- Fixed issue that prevented modellauncher from working properly #2262

### Changed
- Reverting back from PR #2137 to fix issues with MAAT wrappers.
- Moved docker files for models into model specific folder, for example Dockerfile for sipnet now is in models/sipnet/Dockerfile.
Expand Down
7 changes: 5 additions & 2 deletions base/remote/R/setup_modellauncher.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,13 @@ setup_modellauncher <- function(run, rundir, host_rundir, mpirun, binary) {
run_string <- format(run, scientific = FALSE)
run_id_dir <- file.path(rundir, run_string)
launcherfile <- file.path(run_id_dir, "launcher.sh")
file.remove(file.path(run_id_dir, "joblist.txt"))
if (file.exists(file.path(run_id_dir, "joblist.txt"))) {
file.remove(file.path(run_id_dir, "joblist.txt"))
}
jobfile <- file(file.path(run_id_dir, "joblist.txt"), "w")

writeLines(c("#!/bin/bash", paste(mpirun, binary, file.path(host_rundir, run_string, "joblist.txt"))),
con = launcherfile)
writeLines("./job.sh", con = jobfile)
}
return(invisible(jobfile))
}
7 changes: 5 additions & 2 deletions base/remote/R/start.model.runs.R
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,11 @@ start.model.runs <- function(settings, write = TRUE, stop.on.error = TRUE) {
# set up launcher script if we use modellauncher
if (is.null(firstrun)) {
firstrun <- run
setup_modellauncher(run = run, rundir = settings$rundir, host_rundir = settings$host$rundir,
mpirun = settings$host$modellauncher$mpirun, binary = settings$host$modellauncher$binary)
jobfile <- setup_modellauncher(run = run,
rundir = settings$rundir,
host_rundir = settings$host$rundir,
mpirun = settings$host$modellauncher$mpirun,
binary = settings$host$modellauncher$binary)
}
writeLines(c(file.path(settings$host$rundir, run_id_string)), con = jobfile)
pbi <- pbi + 1
Expand Down
11 changes: 11 additions & 0 deletions book_source/03_intermediate_users_guide/01_pecan_xml.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -468,6 +468,10 @@ The following provides a quick overview of XML tags related to remote execution.
<qsub.jobid>Your job ([0-9]+) .*</qsub.jobid>
<qstat>qstat -j @JOBID@ &> /dev/null || echo DONE</qstat>
<job.sh>module load udunits R/R-3.0.0_gnu-4.4.6</job.sh>
<modellauncher>
<binary>/usr/local/bin/modellauncher</binary>
<qsub.extra>-pe omp 20</qsub.extra>
</modellauncher>
</host>
```

Expand All @@ -482,6 +486,13 @@ The `host` section has the following tags:
* `qsub.jobid`: [optional] the regular expression used to find the `jobid` returned from `qsub`. If not specified (and `qsub` is) it will use the default value is `Your job ([0-9]+) .*`
* `qstat`: [optional] the command to execute to check if a job is finished, this should return DONE if the job is finished. There is one parameter this command should take `@JOBID@` which is the ID of the job as returned by `qsub.jobid`. If not specified (and qsub is) it will use the default value is `qstat -j @JOBID@ || echo DONE`
* `job.sh`: [optional] additional options to add to the job.sh at the top.
* `modellauncher`: [optional] this is an experimental section that will allow you to submit all the runs as a single job to a HPC system.

The `modellauncher` section if specified will group all runs together and only submit a single job to the HPC cluster. This single job will leverage of a MPI program that will execute a single run. Some HPC systems will place a limit on the number of jobs that can be executed in parallel, this will only submit a single job (using multiple nodes). In case there is no limit on the number of jobs, a single PEcAn run could potentially submit a lot of jobs resulting in the full cluster running jobs for a single PEcAn run, preventing others from executing on the cluster.

The `modellauncher` has 2 arguements:
* `binary` : [required] The full path to the binary modellauncher. Source code for this file can be found in `pecan/contrib/modellauncher`](https://github.com/PecanProject/pecan/tree/develop/contrib/modellauncher).
* `qsub.extra` : [optional] Additional flags to pass to qsub besides those specified in the `qsub` tag in host. This option can be used to specify that the MPI environment needs to be used and the number of nodes that should be used.

## Advanced features {#xml-advanced}

Expand Down
58 changes: 58 additions & 0 deletions tests/modellauncher.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
<?xml version="1.0"?>
<pecan>
<outdir>pecan</outdir>

<database>
<bety>
<user>bety</user>
<password>bety</password>
<host>postgres</host>
<dbname>bety</dbname>
<driver>PostgreSQL</driver>
<write>FALSE</write>
</bety>
<dbfiles>/data/dbfiles/dbfiles</dbfiles>
</database>

<host>
<name>localhost</name>
<modellauncher>
<binary>/usr/local/bin/modellauncher</binary>
<qsub.extra>-pe omp 20</qsub.extra>
</modellauncher>
</host>

<pfts>
<pft>
<name>temperate.coniferous</name>
</pft>
</pfts>

<meta.analysis>
<iter>3000</iter>
<random.effects>FALSE</random.effects>
<threshold>1.2</threshold>
<update>AUTO</update>
</meta.analysis>

<ensemble>
<size>200</size>
<variable>NPP</variable>
</ensemble>

<model>
<binary>/usr/local/bin/sipnet.runk</binary>
<type>SIPNET</type>
</model>

<run>
<site>
<id>772</id>
</site>
<inputs>
<met>/data/sites/niwot/niwot.clim</met>
</inputs>
<start.date>2002-01-01 00:00:00</start.date>
<end.date>2005-12-31 00:00:00</end.date>
</run>
</pecan>

0 comments on commit f79f578

Please sign in to comment.