Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
samirelanduk committed May 24, 2023
1 parent 29b7517 commit 22dcbc3
Show file tree
Hide file tree
Showing 2 changed files with 94 additions and 65 deletions.
155 changes: 92 additions & 63 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ nextflow.py
.. |license| image:: https://img.shields.io/pypi/l/nextflowpy.svg?color=blue)
:target: https://github.com/goodwright/nextflow.py/blob/master/LICENSE

**IMPORTANT: The name of the package on PyPI has now changed from `nextflow` to `nextflowpy`.**

nextflow.py is a Python wrapper around the Nextflow pipeline framework. It lets
you run Nextflow pipelines from Python code.

Expand All @@ -41,11 +39,6 @@ nextflow.py can be installed using pip::

$ pip install nextflowpy

If you get permission errors, try using ``sudo``::

$ sudo pip install nextflowpy


Development
~~~~~~~~~~~

Expand Down Expand Up @@ -81,45 +74,94 @@ You can opt to only run unit tests or integration tests::
Overview
--------

The starting point for any nextflow.py pipeline is the ``Pipeline``
object. This is initialised with a path to the file in question, and,
optionally, the location of an accompanying config file:

>>> pipeline1 = nextflow.Pipeline("pipelines/my-pipeline.nf")
>>> pipeline2 = nextflow.Pipeline("main.nf", config="nextflow.config")

Running
~~~~~~~

To actually execute the pipeline, the ``run`` method is used:
To run a pipeline, the ``run`` function is used. The only required
parameter is the path to the pipeline file:

>>> pipeline = nextflow.Pipeline("pipelines/my-pipeline.nf")
>>> execution = pipeline.run()

This will return an ``Execution`` object, which represents the pipeline
execution that just took place. You can customise the execution with various
options:
execution that just took place (see below for details on this object). You can
customise the execution with various options:

>>> execution = pipeline.run(location="./rundir", params={"param1": "123"}, profiles=["docker", "test"], version="22.0.1", configs=["env.config"])

>>> execution = pipeline.run(location="./rundir", params={"param1": "123"}, profile=["docker", "test"], version="22.0.1", config=["env.config"])
* ``location`` - The location to run the pipeline from, which by default is just
the current working directory.

* ``params`` - A dictionary of parameters to pass to the pipeline as command.
In the above example, this would run the pipeline with ``--param1=123``.

* ``profiles`` - A list of Nextflow profiles to use when running the pipeline.
These are defined in the ``nextflow.config`` file, and can be used to
configure things like the executor to use, or the container engine to use.
In the above example, this would run the pipeline with ``-profile docker,test``.

* ``version`` - The version of Nextflow to use when running the pipeline. By
default, the version of Nextflow installed on the system is used, but this
can be overridden with this parameter.

* ``configs`` - A list of config files to use when running the pipeline. These
are merged with the config files specified in the pipeline itself, and can
be used to override any of the settings in the pipeline config.

Custom Runners
~~~~~~~~~~~~~~

This sets the execution to take place in a different location, passes
``--param1=123`` as a command line argument when the pipeline is run, uses the
Nextflow profiles 'docker' and 'test', runs with Nextflow version 22.0.1
(regardless of what version of Nextflow is installed), and passes in an extra
config file to use on the run.
When you run a pipeline with nextflow.py, it will generate the command string
that you would use at the command line if you were running the pipeline
manually. This will be some variant of ``nextflow run some-pipeline.nf``, and
will include any parameters, profiles, versions, and config files that you
passed in.

By default, nextflow.py will then run this command using the standard Python
``subprocess`` module. However, you can customise this behaviour by passing in
a custom 'runner' function. This is a function which takes the command string
and submits the job in some other way. For example, you could use a custom
runner to submit the job to a cluster, or to a cloud platform.

This runner function is passed to the ``run`` method as the
``runner`` parameter:

>>> execution = pipeline.run("my-pipeline.nf", runner=my_custom_runner)

Once the run command string has been passed to the runner, nextflow.py will
wait for the pipeline to complete by watching the execution directory, and then
return the ``Execution`` object as normal.

Polling
~~~~~~~

The function described above will run the pipeline and wait while it does, with
the completed ``Execution`` being returned only at the end.

An alternate method is to use ``run_and_poll``, which returns an
``Execution`` object every few seconds representing the state of the
pipeline execution at that moment in time, as a generator::

for execution in pipeline.run_and_poll(sleep=2, location="./rundir", params={"param1": "123"}):
print("Processing intermediate execution")

By default, an ``Execution`` will be returned every second, but you can
adjust this as required with the ``sleep`` paramater. This is useful if you want
to get information about the progress of the pipeline execution as it proceeds.

Executions
##########
~~~~~~~~~~

An ``Execution`` represents a single execution of a
``Pipeline``. It has properties for:
An ``Execution`` represents a single execution of a pipeline. It has
properties for:

* ``id`` - The unique ID of that run, generated by Nextflow.
* ``identifier`` - The unique ID of that run, generated by Nextflow.

* ``started`` - When the pipeline ran (as a UNIX timestamp).
* ``started`` - When the pipeline ran (as a Python datetime).

* ``started_dt`` - When the pipeline ran (as a Python datetime).
* ``finished`` - When the pipeline completed (as a Python datetime).

* ``duration`` - how long the execution took in seconds.
* ``duration`` - how long the pipeline ran for (if finished).

* ``status`` - the status Nextflow reports on completion.

Expand All @@ -131,17 +173,17 @@ An ``Execution`` represents a single execution of a

* ``log`` - the full text of the log file produced.

* ``returncode`` - the exit code of the run - usually 0 or 1.
* ``return_code`` - the exit code of the run - usually 0 or 1.

* ``pipeline`` - the ``Pipeline`` that created the execution.
* ``path`` - the path to the execution directory.

It also has a ``process_executions`` property, which is a list of
``ProcessExecution`` objects. Nextflow processes data by chaining
together isolated 'processes', and each of these has a
``ProcessExecution`` object representing its execution. These have the
following properties:

* ``hash`` - The unique ID generated by Nextflow, of the form ``xx/xxxxxx``.
* ``identifier`` - The unique ID generated by Nextflow, of the form ``xx/xxxxxx``.

* ``process`` - The name of the process that spawned the process execution.

Expand All @@ -153,13 +195,19 @@ following properties:

* ``stderr`` - the stderr of the process execution.

* ``started`` - When the process execution ran (as a UNIX timestamp).
* ``started`` - When the process execution ran (as a Python datetime).

* ``started_dt`` - When the process execution ran (as a Python datetime).
* ``started`` - When the process execution completed (as a Python datetime).

* ``duration`` - how long the process execution took in seconds.

* ``returncode`` - the exit code of the process execution - usually 0 or 1.
* ``return_code`` - the exit code of the process execution - usually 0 or 1.

* ``path`` - the local path to the process execution directory.

* ``full_path`` - the absolute path to the process execution directory.

* ``bash`` - the bash file contents generated for the process execution.

Process executions can have various files passed to them, and will create files
during their execution too. These can be obtained as follows:
Expand All @@ -175,37 +223,18 @@ during their execution too. These can be obtained as follows:
distinguish these once execution is complete, so nextflow.py reports all
output files, not just those which are 'published'.

Polling
~~~~~~~

The method described above will run the pipeline and wait while it does, with
the completed ``Execution`` being returned only at the end.

An alternate method is to use ``run_and_poll``, which returns an
``Execution`` object every few seconds representing the state of the
pipeline execution at that moment in time, as a generator::

for execution in pipeline.run_and_poll(sleep=2, location="./rundir", params={"param1": "123"}, profile=["docker", "test"], version="22.0.1"):
print("Processing intermediate execution")

By default, an ``Execution`` will be returned every 5 seconds, but you
can adjust this as required with the ``sleep`` paramater. This is useful if you
want to get information about the progress of the pipeline execution as it
proceeds.
Changelog
---------

Direct Running
~~~~~~~~~~~~~~
Release 0.6.0
~~~~~~~~~~~~~

If you just want to run a single pipeline without initialising a
``Pipeline`` object first, you can ``run`` or
``run_and_poll`` directly, without needing to create a
``Pipeline``:
`24th May 2023`

>>> import nextflow
>>> execution = nextflow.run(path="pipeline.nf", config=["settings.config"], params={"param1": "123"})
* Added ability to use custom runners for starting jobs.
* Removed pipeline class to.
* Overhauled architecture.

Changelog
---------

Release 0.5.0
~~~~~~~~~~~~~
Expand Down
4 changes: 2 additions & 2 deletions docs/source/installing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ pip

nextflow.py can be installed using pip::

$ pip install nextflow
$ pip install nextflowpy

If you get permission errors, try using ``sudo``::

$ sudo pip install nextflow
$ sudo pip install nextflowpy


Development
Expand Down

0 comments on commit 22dcbc3

Please sign in to comment.