Skip to content

Commit

Permalink
Merge old docs update
Browse files Browse the repository at this point in the history
  • Loading branch information
nsheff committed Feb 17, 2017
1 parent 09a7195 commit e8f553b
Show file tree
Hide file tree
Showing 9 changed files with 80 additions and 17 deletions.
16 changes: 10 additions & 6 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,15 @@ Check out the complete working example in the `microtest repository <https://git
Using cluster resource managers
****************************************

For each sample, ``looper`` will create one or more submission scripts for that sample. The ``compute`` settings specify how these scripts will be both produced and run. This makes it very portable and easy to change cluster management systems by just changing a few variables in a configuration file. By default, looper builds a shell script for each sample and runs them serially: the shell will block until the each run is finished and control is returned to ``looper`` for the next iteration. Compute settings can be changed using an environment configuration file called ``looperenv``. Several common engines (SLURM and SGE) come by default, but the system gives you complete flexibility, so you can easily configure looper to work with your resource manager.
.. warning:: This is still in progress

For complete instructions on configuring your compute environment, see the looperenv repository at https://github.com/epigen/looperenv. Here's a brief overview. Here's an example `looperenv` file:
Looper uses a template-based system for building scripts. By default, looper will just build a shell script and run them serially. Compute settings can be changed using an environment script, which you point to with a shell environment variable called ``LOOPERENV``.

Complete instructions for configuring your compute environment are availble in the looperenv repository at https://github.com/epigen/looperenv.

For each iteration, `looper` will create one or more submission scripts for that sample. The `compute` settings specify how these scripts will be both produced and run. This makes it very portable and easy to change cluster management systems, or to just use a local compute power like a laptop or standalone server, by just changing the two variables in the `compute` section.

Example:

.. code-block:: yaml
Expand All @@ -81,15 +87,13 @@ For complete instructions on configuring your compute environment, see the loope
partition: queue_name
There are two sub-parameters in the compute section. First, ``submission_template`` is a (relative or absolute) path to the template submission script. Looper uses a template-based system for building scripts. This is a template with variables (encoded like ``{VARIABLE}``), which will be populated independently for each sample as defined in ``pipeline_inteface.yaml``. The one variable ``{CODE}`` is a reserved variable that refers to the actual shell command that will run the pipeline. Otherwise, you can use any variables you define in your `pipeline_interface.yaml`.
There are two sub-parameters in the compute section. First, `submission_template` is a (relative or absolute) path to the template submission script. This is a template with variables (encoded like `{VARIABLE}`), which will be populated independently for each sample as defined in `pipeline_inteface.yaml`. The one variable ``{CODE}`` is a reserved variable that refers to the actual python command that will run the pipeline. Otherwise, you can use any variables you define in your `pipeline_interface.yaml`.

Second, the ``submission_command`` is the command-line command that ``looper`` will prepend to the path of the produced submission script to actually run it (``sbatch`` for SLURM, `qsub` for SGE, ``sh`` for localhost, etc).
Second, the `submission_command` is the command-line command that `looper` will prepend to the path of the produced submission script to actually run it (`sbatch` for SLURM, `qsub` for SGE, `sh` for localhost, etc).

In `Templates <https://github.com/epigen/looper/tree/master/templates>`__ are examples for submission templates for `SLURM <https://github.com/epigen/looper/blob/master/templates/slurm_template.sub>`__, `SGE <https://github.com/epigen/looper/blob/master/templates/sge_template.sub>`__, and `local runs <https://github.com/epigen/looper/blob/master/templates/localhost_template.sub>`__.




Handling multiple input files with a merge table
****************************************

Expand Down
2 changes: 1 addition & 1 deletion doc/source/config-files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ Configuration files

Looper uses `YAML <http://www.yaml.org/>`_ configuration files to describe a project. Looper is a very modular system, so there are few different YAML files. Here's an explanation of each. Which ones you need to know about will depend on whether you're a pipeline user (running pipelines on your project) or a pipeline developer (building your own pipeline).


Pipeline users
*****************

Expand All @@ -28,4 +29,3 @@ If you want to add a new pipeline to looper, then there are two YAML files that
Finally, if you're using Pypiper to develop pipelines, it uses a pipeline-specific configuration file (detailed in the Pypiper documentation):

- `Pypiper pipeline config file <http://pypiper.readthedocs.io/en/latest/advanced.html#pipeline-config-files>`_: Each pipeline may have a configuration file describing where software is, and parameters to use for tasks within the pipeline
.pypiper
12 changes: 6 additions & 6 deletions doc/source/define-your-project.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The format is simple and modular, so you only need to define the components you
1. **Project config file** - a ``yaml`` file describing input and output file paths and other (optional) project settings
2. **Sample annotation sheet** - a ``csv`` file with 1 row per sample

The first file (**project config**) is just a few lines of ``yaml`` in the simplest case. Here's a minimal example **project_config.yaml**:
In the simplest case, ``project_config.yaml`` is just a few lines of ``yaml``. Here's a minimal example **project_config.yaml**:


.. code-block:: yaml
Expand All @@ -21,9 +21,7 @@ The first file (**project config**) is just a few lines of ``yaml`` in the simpl
pipelines_dir: /path/to/pipelines/repository
The **output_dir** describes where you want to save pipeline results, and **pipelines_dir** describes where your pipeline code is stored.

The second file (**sample annotation sheet**) is where you list your samples, which is a comma-separated value (``csv``) file containing at least a few defined columns: a unique identifier column named ``sample_name``; a column named ``library`` describing the sample type (e.g. RNA-seq); and some way of specifying an input file. Here's a minimal example of **sample_annotation.csv**:
The **output_dir** describes where you want to save pipeline results, and **pipelines_dir** describes where your pipeline code is stored. You will also need a second file to describe samples, which is a comma-separated value (``csv``) file containing at least a unique identifier column named ``sample_name``, a column named ``library`` describing the sample type, and some way of specifying an input file. Here's a minimal example of **sample_annotation.csv**:


.. csv-table:: Minimal Sample Annotation Sheet
Expand All @@ -36,9 +34,11 @@ The second file (**sample annotation sheet**) is where you list your samples, wh
"frog_4", "RNA-seq", "frog4.fq.gz"


With those two simple files, you could run looper, and that's fine for just running a quick test on a few files. You just type: ``looper run path/to/project_config.yaml`` and it will run all your samples through the appropriate pipeline. In practice, you'll probably want to use some of the more advanced features of looper by adding additional information to your configuration ``yaml`` file and your sample annotation ``csv`` file. These advanced options are detailed below.
With those two simple files, you could run looper, and that's fine for just running a quick test on a few files. In practice, you'll probably want to use some of the more advanced features of looper by adding additional information to your configuration ``yaml`` file and your sample annotation ``csv`` file.

For example, by default, your jobs will run serially on your local computer, where you're running ``looper``. If you want to submit to a cluster resource manager (like SLURM or SGE), you just need to specify a ``compute`` section.

Now, let's go through the more advanced details of both annotation sheets and project config files:
Let's go through the more advanced details of both annotation sheets and project config files:

.. include:: sample-annotation-sheet.rst

Expand Down
1 change: 1 addition & 0 deletions doc/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ FAQ
- How can I run my jobs on a cluster? See :ref:`cluster resource managers <cluster-resource-managers>`

- Which configuration file has which settings? Here's a list: :doc:`config files <config-files>`

1 change: 0 additions & 1 deletion doc/source/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ Simplicity for the beginning, power when you need to expand.

- **Flexible pipelines:** Use looper with any pipeline, any library, in any domain. We designed it to work with `Pypiper <http://pypiper.readthedocs.io/>`_, but looper has an infinitely flexible command-line argument system that will let you configure it to work with any script (pipeline) that accepts command-line arguments. You can also configure looper to submit multiple pipelines per sample.


- **Flexible compute:** If you don't change any settings, looper will simply run your jobs serially. But Looper includes a templating system that will let you process your pipelines on any cluster resource manager (SLURM, SGE, etc.). We include default templates for SLURM and SGE, but it's easy to add your own as well. Looper also gives you a way to determine which compute queue/partition to submit on-the-fly, by passing the ``--compute`` parameter to your call to ``looper run``, making it simple to use by default, but very flexible if you have complex resource needs.

- **Standardized project definition:** Looper defines a flexible standard format for describing projects, and there are other tools that can read these same formats. For example, we are working on an R package that will read the same project definition and provide all your sample metadata (and pipeline results) in an R analysis environment, with no additional effort. With a standardized project definition, the possibilities are endless.
Expand Down
1 change: 0 additions & 1 deletion doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ Contents
faq.rst
changelog.rst


Indices and tables
==================

Expand Down
12 changes: 12 additions & 0 deletions doc/source/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Introduction
=====================================


Looper is a job submitting engine. If you have a pipeline and a bunch of samples you want to run through it, looper can help you organize the inputs and outputs. By defeault, it will just run your jobs sequentially on the local computer, but with a small configuration change, it will create and submit jobs to any cluster resource manager (like SLURM, SGE, or LFS).

Here's the idea: We essentially provide a format specification (the :ref:`project config file <project-config-file>`), which you use to describe your project. You create this single configuration file (in `yaml format <http://www.yaml.org/>`_), which includes:
Expand All @@ -18,6 +19,17 @@ Looper is modular and totally configurable, so it scales as your needs grow. We



Installing
******************************

You can install directly from GitHub using pip:

.. code-block:: bash
pip install --user https://github.com/epigen/looper/zipball/master
Support
******************************
Please use the issue tracker at GitHub to file bug reports or feature requests: https://github.com/epigen/looper/issues.
Expand Down
51 changes: 49 additions & 2 deletions doc/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,55 @@ First, install looper and pypiper (since our tutorial uses pypiper pipelines):
Now, you will need to grab a project to run, and some pipelines to run on it. We have a functional working project example and an open source pipeline repository on github.


.. code:: bash
git clone https://github.com/epigen/microtest.git
git clone https://github.com/epigen/open_pipelines.git
Now you can run this project with looper! Just use ``looper run``:

.. code:: bash
looper run microtest/config/microtest_config.yaml
If the looper executable isn't in your path, check out the :doc:`FAQ <faq>`.

Pipeline outputs
^^^^^^^^^^^^^^^^^^^^^^^^^^
Outputs of pipeline runs will be under the directory specified in the ``output_dir`` variable under the ``paths`` section in the project config file (see :doc:`config-files` ) this is usually the name of the project being run.

Inside there will be two directories:

- ``results_pipeline`` [1]_ - a directory containing one directory with the output of the pipelines, for each sample.
- ``submissions`` [2]_ - which holds yaml representations of the samples and log files of the submited jobs.


The sample-specific output of each pipeline type varies and is described in :doc:`pipelines`.

To use pre-made pipelines with your project, all you have to do is :doc:`define your project <define-your-project>` using looper's standard format. To link your own, custom built pipelines, you can :doc:`connect your pipeline to looper <connecting-pipelines>`.



.. rubric:: Footnotes

.. [1] This variable can also be specified in the ``results_subdir`` variable under the ``paths`` section of the project config file
.. [2] This variable can also be specified in the ``submission_subdir`` variable under the ``paths`` section of the project config file
First, install looper and pypiper (since our tutorial uses pypiper pipelines):


.. code:: bash
pip install --user https://github.com/epigen/looper/zipball/master
pip install --user https://github.com/epigen/pypiper/zipball/master
Now, you will need to grab a project to run, and some pipelines to run on it. We have a functional working project example and an open source pipeline repository on github.


.. code:: bash
git clone https://github.com/epigen/microtest.git
Expand Down Expand Up @@ -49,9 +98,7 @@ Inside there will be two directories:
To use pre-made pipelines with your project, all you have to do is :doc:`define your project <define-your-project>` using looper's standard format. To link your own, custom built pipelines, you can :doc:`connect your pipeline to looper <connecting-pipelines>`.



.. rubric:: Footnotes

.. [1] This variable can also be specified in the ``results_subdir`` variable under the ``metadata`` section of the project config file
.. [2] This variable can also be specified in the ``submission_subdir`` variable under the ``metadata`` section of the project config file
1 change: 1 addition & 0 deletions doc/source/usage-and-commands.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Usage and commands
******************************


Looper doesn't just run pipelines, it can also check and summarize the progress of your jobs, as well as remove all files created by them.

Each task is controlled by one of the four main commands ``run``, ``summarize``, ``destroy``, ``check``:
Expand Down

0 comments on commit e8f553b

Please sign in to comment.