@sahilseth sahilseth released this Apr 19, 2016 · 76 commits to master since this release

tl;dr (summary of changes)

  • Flowr Rscript gets further enhancements, taking advantage of improved funr
  • run function now accepts paths.
# 'cherries' version
cd <path to pipelines>
flowr x=mypipeline

# 'dates' version
flowr x=<path to pipelines>/mypipeline
  • Previously, flowr expected a specific structure, now using ~/.flowr.conf,
    one may specify their own structure - enabling flexibility.
# 'cherries' version: a fixed directory structure was recommended:
├── conf
│   ├── flowr.conf
├── pipelines
│   ├── sleep_pipe.R
├── runs

# 'dates' version: one may change the default paths to run, config files etc using 

# this file controls the location of these folders:
flow_base_path  ~/flowr # flowr home
flow_conf_path  {{flow_base_path}}/conf  # path to configuration files, not required if using ~/.flowr.conf
flow_run_path   ~/flowr/runs  # default home of all executed flows, you may change this
flow_pipe_paths ~/flowr/pipelines,<add new paths...> # multiple paths can be specified using ","
  • a few bug fixes in to_flow
  • several other minor changes to messages, errors and warnings.

@sahilseth sahilseth released this Dec 6, 2015 · 122 commits to master since this release

tl;dr (summary of changes)

  • Better handling of multiple flows in terms of running and re-running.
  • Nicer and cleaner messages.

additions/changes to flowr.conf file

  • New: option local_cores, which determines (max) number of cores to use when running local jobs.
  • New:: Now you can add a module_cmds variable to the config file, and this will be prefixed in all script of the pipeline. An example could be:
  • New: flow_pipe_paths now supports multiple paths, seperated by comma. The fetch_pipes() would split the vector at commas.
  • IMP: New version needs additional components in the flowr.conf file
# version >=
# max number of cores to use when running on a local server
local_cores 4

# default module of a pipeline
# version >=
module_cmds ''

# examples: one may define all modules used in a pipeline here, 
# further one may specify any other command which should be run before 
# script executes.
#module_cmds    'module load samtools;export PATH=$PATH:/apps/bin'

addition/changes to status()

  • New: status gets a new argument to turn off progress bar if needed.
  • New enhanced get_wds/status, so that if current wd contains a flow_details file, status is shown for this folder and not sub-folder(s).
## now this works well !
flowr status x=.

addition/changes to run() and rerun() functions

  • New: run function now accepts a custom configuration [conf], parameter. See help(flowr::run) for detail.
    The conf file would specify various parameters used for that pipeline.
  • New: run() re-running as well. i.e. One would generate a new set of commands etc. but execute in the previous folder; possibly from a inter-mediate step (trial feature).
  • New: Now rerun() supports multiple folders. Basically, one may specify a parent folder which has multiple flowr runs and ask it to re-run ALL of them again, from a specific intermediate step.
  • New: Flowr creates a new folder if there are multiple samples in the flowmat; basically containerizes the run, keeping the logs clean and debugging life easier.

other changes

  • New: to_flowdef() can now guess submission and dependency types (experimental feature).
  • IMP: to_flowdef now adds a parameter nodes, to enable specifying number of nodes required per-job.
  • IMP: opts_flow$get replaces get_opts, for reliability etc. Also this closely follows how knitr options are set.
  • fixed bugs in documentation (changed the formatting of output messages).

@sahilseth sahilseth released this Oct 3, 2015 · 225 commits to master since this release

  • Modified the output of status function, to add a status column. Specifically,
    this uses information from other columns and summarizes whether a specific step is
    pending, processing, completed or errored.
|                | total| started| completed| exit_status|status     |
|001.alnCmd1     |   109|     109|       109|           0|completed  |
|007.markCmd     |     3|       3|         0|           0|processing |
    • Switched default value of flow@status to "created".
    • When using status on several folders, it used to be a little cluttered.
      Have added spaces, so that this looks prettier now.
  • Introducing verbose levels:

    One can set the level of verboseness using opts_flow$set(verbose=2).
    Where the level may be 0, 1, 2....
    Level 1 is good for most purposes, where as,
    level 0 is almost silent, producing messages
    only when neccessary.
    While level 2 is good when developing a new pipeline, additional details useful for debugging are
    provided by level 3.

  • Detailed checking of flowdef

checking if required columns are present...
checking if resources columns are present...
checking if dependency column has valid names...
checking if submission column has valid names...
checking for missing rows in def...
checking for extra rows in def...
checking submission and dependency types...
    jobname prev.sub_type --> dep_type --> sub_type: relationship
    1: aln1_a   none --> none --> scatter 
    2: aln2_a   scatter --> none --> scatter 
    3: sampe_a  scatter --> serial --> scatter rel: complex one:one
    4: fixrg_a  scatter --> serial --> scatter rel: complex one:one
    5: merge_a  scatter --> gather --> serial rel: many:one
    6: markdup_a    serial --> serial --> serial rel: simple one:one
    7: target_a serial --> serial --> serial rel: simple one:one
    8: realign_a    serial --> burst --> scatter rel: one:many
    9: baserecalib_a    scatter --> serial --> scatter rel: complex one:one
    10: printreads_a    scatter --> serial --> scatter rel: complex one:one
  • rerun []:
    • Previously one could specify a starting point from where a re-run flow
      would initiate execution. Now one may also specify an arbitary number of
      of steps to re-run using select and ignore; which may need to run again.
  • job killing and submission now sport a progress bar:
    • |============================================================ | 70%
    • This is especially useful flows with thousands of jobs
  • Fixed 2 important bugs in moab.sh, lsf.sh template file, where it was missing the -n argument.
  • Now the status function has a new argument use_cache. If enabled, it skips
    fetching statuses of previously completed jobs; this really speeding things up.
    • Also we have added a progress bar to show the status of this summarization.
  • Several detailed changes to the documentation.
@sahilseth sahilseth released this Aug 24, 2015 · 451 commits to master since this release

  • This release requires params 0.2.4 at the minimum
  • This release adds and changes functionality of several functions.
  • A new function run(), creates and submits a pipeline. Specifically it follows the following steps:
    • One supplies the name of the pipeline, which is used to fetch the pipeline using:
    • create flowmat by running a function called mypipeline(), mypipeline is the name of the pipeline.
    • load configuration file, with paths to tools etc using load_conf()
    • fetch the flow definition using fetch_pipe, and load it using as.flowdef()
    • Further, create a flow object using to_flow()
    • Finally, submit to the cluster, submit_flow()
  • kill(): now a S3 functions, and operates on both a flow object
    and flow_wd folder
  • check(): Now works on flowdef and flowmat
  • as.flowmat(), as.flowdef(): easy ways to fetch and check these tables
  • fetch() along with fetch_pipes() and fetch_conf() simplify finding files
  • Reduce function overload, moving several functions a seperate params pkg
    • moved read_sheet(), write_sheet()
    • moved get_opts(), set_opts()
    • moved .load_conf() load_conf()
    • Here is a link to params package
    • kable function is now a part of params, that removes the dependency to knitr package
  • plot_flow: supports flowdef
@sahilseth sahilseth released this Jul 27, 2015 · 504 commits to master since this release

@sahilseth sahilseth released this Jun 19, 2015 · 595 commits to master since this release

Now supports:
includes to_flow() function to quickly create flows

includes new vignettes and updated documentation.

Several bug fixes

@sahilseth sahilseth released this Apr 3, 2015 · 641 commits to master since this release

@sahilseth sahilseth released this Mar 18, 2015 · 641 commits to master since this release

