Skip to content

Commit

Permalink
Docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
nsheff committed Jan 24, 2017
1 parent a00c9ef commit 27c39ca
Show file tree
Hide file tree
Showing 13 changed files with 115 additions and 157 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@

__`Looper`__ is a pipeline submission engine that parses sample inputs and submits pipelines for each sample. Looper was conceived to use [pypiper](https://github.com/epigen/pypiper/) pipelines, but does not require this.

You can download the latest version from the [releases page](https://github.com/epigen/looper/releases).



# Links

Expand Down
63 changes: 7 additions & 56 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,9 @@ Check out the complete working example in the `microtest repository <https://git
Using cluster resource managers
****************************************

.. warning:: This is still in progress
For each sample, ``looper`` will create one or more submission scripts for that sample. The ``compute`` settings specify how these scripts will be both produced and run. This makes it very portable and easy to change cluster management systems by just changing a few variables in a configuration file. By default, looper builds a shell script for each sample and runs them serially: the shell will block until the each run is finished and control is returned to ``looper`` for the next iteration. Compute settings can be changed using an environment configuration file called ``looperenv``. Several common engines (SLURM and SGE) come by default, but the system gives you complete flexibility, so you can easily configure looper to work with your resource manager.

Looper uses a template-based system for building scripts. By default, looper will just build a shell script and run them serially. Compute settings can be changed using an environment script, which you point to with a shell environment variable called ``LOOPERENV``.

Complete instructions for configuring your compute environment are availble in the looperenv repository at https://github.com/epigen/looperenv.

For each iteration, ``looper`` will create one or more submission scripts for that sample. The ``compute`` settings specify how these scripts will be both produced and run. This makes it very portable and easy to change cluster management systems, or to just use a local compute power like a laptop or standalone server, by just changing the two variables in the ``compute`` section.

Example:
For complete instructions on configuring your compute environment, see the looperenv repository at https://github.com/epigen/looperenv. Here's a brief overview. Here's an example `looperenv` file:

.. code-block:: yaml
Expand All @@ -87,49 +81,16 @@ Example:
partition: queue_name
There are two sub-parameters in the compute section. First, ``submission_template`` is a (relative or absolute) path to the template submission script. This is a template with variables (encoded like ``{VARIABLE}``), which will be populated independently for each sample as defined in ``pipeline_inteface.yaml``. The one variable ``{CODE}`` is a reserved variable that refers to the actual python command that will run the pipeline. Otherwise, you can use any variables you define in your `pipeline_interface.yaml`.
There are two sub-parameters in the compute section. First, ``submission_template`` is a (relative or absolute) path to the template submission script. Looper uses a template-based system for building scripts. This is a template with variables (encoded like ``{VARIABLE}``), which will be populated independently for each sample as defined in ``pipeline_inteface.yaml``. The one variable ``{CODE}`` is a reserved variable that refers to the actual shell command that will run the pipeline. Otherwise, you can use any variables you define in your `pipeline_interface.yaml`.

Second, the ``submission_command`` is the command-line command that ``looper`` will prepend to the path of the produced submission script to actually run it (``sbatch`` for SLURM, `qsub` for SGE, ``sh`` for localhost, etc).

In `Templates <https://github.com/epigen/looper/tree/master/templates>`__ are examples for submission templates for `SLURM <https://github.com/epigen/looper/blob/master/templates/slurm_template.sub>`__, `SGE <https://github.com/epigen/looper/blob/master/templates/sge_template.sub>`__, and `local runs <https://github.com/epigen/looper/blob/master/templates/localhost_template.sub>`__. For a local run, just pass the script to the shell with ``submission_command: sh``. This will cause each sample to run sequentially, as the shell will block until the run is finished and control is returned to ``looper`` for the next iteration.


.. _cluster-resource-managers:

Using cluster resource managers
****************************************

.. warning:: This is still in progress
In `Templates <https://github.com/epigen/looper/tree/master/templates>`__ are examples for submission templates for `SLURM <https://github.com/epigen/looper/blob/master/templates/slurm_template.sub>`__, `SGE <https://github.com/epigen/looper/blob/master/templates/sge_template.sub>`__, and `local runs <https://github.com/epigen/looper/blob/master/templates/localhost_template.sub>`__.

Looper uses a template-based system for building scripts. By default, looper will just build a shell script and run them serially. Compute settings can be changed using an environment script, which you point to with a shell environment variable called ``LOOPERENV``.

Complete instructions for configuring your compute environment are availble in the looperenv repository at https://github.com/epigen/looperenv.

For each iteration, `looper` will create one or more submission scripts for that sample. The `compute` settings specify how these scripts will be both produced and run. This makes it very portable and easy to change cluster management systems, or to just use a local compute power like a laptop or standalone server, by just changing the two variables in the `compute` section.

Example:

.. code-block:: yaml
compute:
default:
submission_template: pipelines/templates/local_template.sub
submission_command: sh
slurm:
submission_template: pipelines/templates/slurm_template.sub
submission_command: sbatch
partition: queue_name
There are two sub-parameters in the compute section. First, `submission_template` is a (relative or absolute) path to the template submission script. This is a template with variables (encoded like `{VARIABLE}`), which will be populated independently for each sample as defined in `pipeline_inteface.yaml`. The one variable ``{CODE}`` is a reserved variable that refers to the actual python command that will run the pipeline. Otherwise, you can use any variables you define in your `pipeline_interface.yaml`.

Second, the `submission_command` is the command-line command that `looper` will prepend to the path of the produced submission script to actually run it (`sbatch` for SLURM, `qsub` for SGE, `sh` for localhost, etc).

In [`templates/`](templates/) are examples for submission templates for [SLURM](templates/slurm_template.sub), [SGE](templates/sge_template.sub), and [local runs](templates/localhost_template.sub). For a local run, just pass the script to the shell with `submission_command: sh`. This will cause each sample to run sequentially, as the shell will block until the run is finished and control is returned to `looper` for the next iteration.



Merge table
Handling multiple input files with a merge table
****************************************

Sometimes you have multiple input files that you want to merge for one sample. Rather than putting multiple lines in your sample annotation sheet, which causes conceptual and analytical challenges, we introduce a *merge table* which maps input files to samples for samples with more than one input file.
Expand All @@ -141,17 +102,7 @@ metadata:

Make sure the ``sample_name`` column of this table matches, and then include any columns you need to point to the data. ``Looper`` will automatically include all of these files as input passed to the pipelines.

Note: to handle different *classes* of input files, like read1 and read2, these are *not* merged and should be handled as different derived columns in the main sample annotation sheet.


Data produced at CeMM
****************************************
In a case of data produced at CeMM by the BSF, three additional columns will allow the discovery of files associated with the sample:

- ``flowcell`` - the name of the BSF flowcell (should be something like BSFXXX)
- ``lane`` - the lane number in the instrument
- ``BSF_name`` - the name used to describe the sample in the BSF annotation.

Note: to handle different *classes* of input files, like read1 and read2, these are *not* merged and should be handled as different derived columns in the main sample annotation sheet (and therefore different arguments to the pipeline).


.. _extending-sample-objects:
Expand All @@ -163,7 +114,7 @@ Looper uses object oriented programming (OOP) under the hood. This means that co

By default we use `generic models <https://github.com/epigen/looper/tree/master/looper/models.py>`__ (see the `API <api.html>`__ for more) to handle samples in Looper, but these can also be reused in other contexts by importing ``looper.models`` or by means of object serialization through YAML files.

Since these models provide useful methods to interact, update and store attributes in the objects (most nobly *samples* - ``Sample`` object), a useful use case is during the run of a pipeline: pipeline scripts can extend ``Sample`` objects with further attributes or methods.
Since these models provide useful methods to interact, update, and store attributes in the objects (most nobly *samples* - ``Sample`` object), a useful use case is during the run of a pipeline: pipeline scripts can extend ``Sample`` objects with further attributes or methods.

Example:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/connecting-pipelines.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Connecting pipelines
=============================================

If you're a pipeline author, you can connect any pipeline to work with looper, giving you the power of all of looper's features on your project. To connect your pipeline, you will need two files:
Pipeline users don't need to worry about this section. If you're a pipeline developer, you can connect any pipeline to work with looper, giving you the power of all of looper's features on your project. To connect your pipeline, you will need two files:

1. **Protocol mappings** - a ``yaml`` file that maps sample **library** to one or more **pipeline scripts**.
2. **Pipeline interface** - a ``yaml`` file telling ``Looper`` the arguments and resources required by each pipeline script.
Expand Down
4 changes: 1 addition & 3 deletions doc/source/define-your-project.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ The second file (**sample annotation sheet**) is where you list your samples, wh
"frog_4", "RNA-seq", "frog4.fq.gz"


With those two simple files, you could run looper, and that's fine for just running a quick test on a few files. You just type: ``looper run path/to/project_config.yaml`` and it will run all your samples through the appropriate pipeline. In practice, you'll probably want to use some of the more advanced features of looper by adding additional information to your configuration ``yaml`` file and your sample annotation ``csv`` file.

For example, by default, your jobs will run serially on your local computer, where you're running ``looper``. If you want to submit to a cluster resource manager (like SLURM or SGE), you just need to add a ``compute`` section to your **project config file**.
With those two simple files, you could run looper, and that's fine for just running a quick test on a few files. You just type: ``looper run path/to/project_config.yaml`` and it will run all your samples through the appropriate pipeline. In practice, you'll probably want to use some of the more advanced features of looper by adding additional information to your configuration ``yaml`` file and your sample annotation ``csv`` file. These advanced options are detailed below.

Now, let's go through the more advanced details of both annotation sheets and project config files:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Simplicity for the beginning, power when you need to expand.

- **Flexible compute:** If you don't change any settings, looper will simply run your jobs serially. But Looper includes a templating system that will let you process your pipelines on any cluster resource manager (SLURM, SGE, etc.). We include default templates for SLURM and SGE, but it's easy to add your own as well. Looper also gives you a way to determine which compute queue/partition to submit on-the-fly, by passing the ``--compute`` parameter to your call to ``looper run``, making it simple to use by default, but very flexible if you have complex resource needs.

- **Standardized project definition:** Looper defines a flexible standard format for describing projects, and there are other tools that can read these same formats. For example, we are working on an R package that will read the same project definition and provide all your sample metadata (and pipeline results) in and R analysis environment, with no additional effort. With a standardized project definition, the possibilities are endless.
- **Standardized project definition:** Looper defines a flexible standard format for describing projects, and there are other tools that can read these same formats. For example, we are working on an R package that will read the same project definition and provide all your sample metadata (and pipeline results) in an R analysis environment, with no additional effort. With a standardized project definition, the possibilities are endless.

- **Subprojects:** Subprojects make it easy to define two very similar projects without duplicating project metadata.

Expand Down
1 change: 1 addition & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Contents
:maxdepth: 2

intro.rst
install.rst
features.rst
usage-and-commands.rst
tutorials.rst
Expand Down
23 changes: 23 additions & 0 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

Installing
=====================================

Installation instructions for the latest release version are specified on the `looper releases page <https://github.com/epigen/looper/releases>`_. You can install the latest release directly from GitHub using pip:

.. code-block:: bash
pip install --user https://github.com/epigen/looper/zipball/master
Update with:

.. code-block:: bash
pip install --user --upgrade https://github.com/epigen/looper/zipball/master
To put the ``looper`` executable in your ``$PATH``, add the following line to your ``.bashrc`` or ``.profile``:

.. code-block:: bash
export PATH=~/.local/bin:$PATH
28 changes: 0 additions & 28 deletions doc/source/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
Introduction
=====================================

Overview
******************************
Looper is a job submitting engine. If you have a pipeline and a bunch of samples you want to run through it, looper can help you organize the inputs and outputs. By defeault, it will just run your jobs sequentially on the local computer, but with a small configuration change, it will create and submit jobs to any cluster resource manager (like SLURM, SGE, or LFS).

Here's the idea: We essentially provide a format specification (the :ref:`project config file <project-config-file>`), which you use to describe your project. You create this single configuration file (in `yaml format <http://www.yaml.org/>`_), which includes:
Expand All @@ -20,32 +18,6 @@ Looper is modular and totally configurable, so it scales as your needs grow. We



Installing
******************************

You can install directly from GitHub using pip:

.. code-block:: bash
pip install --user https://github.com/epigen/looper/zipball/master
Update with:

.. code-block:: bash
pip install --user --upgrade https://github.com/epigen/looper/zipball/master
To have the ``looper`` executable in your ``$PATH``, add the following line to your .bashrc file:

.. code-block:: bash
export PATH=$PATH:~/.local/bin
Support
******************************
Please use the issue tracker at GitHub to file bug reports or feature requests: https://github.com/epigen/looper/issues.
Expand Down
3 changes: 1 addition & 2 deletions doc/source/pipeline-interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Pipeline interface YAML
**************************************************

The pipeline interface file describes how looper, which submits jobs, knows what arguments to pass to the pipeline and (possibly) what resources to request. For each pipeline (defined by the filename of the script itself), you specify some optional and required variables:
The pipeline interface file describes how looper knows what arguments to pass to the pipeline and (possibly) what resources to request. For each pipeline (defined by the filename of the script itself), you specify some optional and required variables:

- **name (recommended)**: Name of the pipeline
- **arguments (required)**: List of key-value pairs of arguments, and attribute sources to pass to the pipeline (details below).
Expand Down Expand Up @@ -80,7 +80,6 @@ Example:
"--genome": transcriptome
"--input": data_path
"--single-or-paired": read_type
resources:
default:
file_size: "0"
Expand Down

0 comments on commit 27c39ca

Please sign in to comment.