Skip to content

Example Code for Parallel Batch Jobs using BatchJobs and BatchExperiments on LRZ Linux-Cluster

Notifications You must be signed in to change notification settings

philippstats/Example-Parallel-SLURM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Example-Parallel-SLURM

Example Code for Parallel Batch Jobs using BatchJobs and BatchExperiments on LRZ Linux-Cluster

First step:

  • Login to LRZ
  • Clone repo: git clone https://github.com/philippstats/Example-Parallel-SLURM.git
  • In bmr.R adjust setwd in line 6.
  • In lrz_parallel.tmpl update the email in line 38.

Second step:

  • Start Jobs typing: Rscript bmr.R
  • Get status using one of the following:
    • Rscript get_status.R
    • squeue -u $USER --clusters=mpp2 (tip of the day: make an alias in your .profile)
  • Takes approx. 7 minutes after starting

Third step:

  • Rscript get_results.R

Result should look like this:

> getJobInfo(reg)
  id       prob        algo repl      time.submitted        time.started
1  1 calhousing RF_parallel    1 2016-05-18 15:42:56 2016-05-18 15:43:07
2  2 calhousing RF_parallel    1 2016-05-18 15:42:56 2016-05-18 15:43:05
3  3 calhousing   RF_serial    1 2016-05-18 15:42:57 2016-05-18 15:43:07
            time.done time.running memory time.queued error.msg      nodename
1 2016-05-18 15:46:50          223   47.4          11      <NA> mpp2r03c02s08
2 2016-05-18 15:44:06           61   47.6           9      <NA> mpp2r07c01s01
3 2016-05-18 15:50:05          418 4454.1          10      <NA> mpp2r08c03s11
  batch.id r.pid     seed
1    91610  5633 45680646
2    91611 25895 45680647
3    91612  1585 45680648

> res = reduceResultsExperiments(reg, ids = findDone(reg), fun = function(job, res) res$aggr)
Reducing 3 results...
reduceResultsExperiments |+++++++++++++++++++++++++++++++++++| 100% (00:00:00)
> res
  id       prob        algo n_cores repl mmce.test.mean
1  1 calhousing RF_parallel       2    1         0.1084
2  2 calhousing RF_parallel      10    1         0.1071
3  3 calhousing   RF_serial      NA    1         0.1079

About

Example Code for Parallel Batch Jobs using BatchJobs and BatchExperiments on LRZ Linux-Cluster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published