Permalink
Browse files

Infrastructure and docs for building and linting CWL tools.

- Through galaxy-lib updates, lint will now lint CWL tools.
  - Checks for a validation, description, Docker image, and cwlVersion.
- Update ``tool_init`` to take in a ``--cwl`` parameter and generate CWL tools.
- Add a section to the documentation on building CWL tools - replicate the
  first three ``planemo tool_init`` commands for the seqtk_seq example.
- Tests and docstring improvements.

Requires galaxy-lib improvements in 16.7.2.

Fixes #450, xref #408.
  • Loading branch information...
jmchilton committed May 5, 2016
1 parent 3f4ab44 commit a4e6958631140faac324bdc0ab8dd0b2d00498d5
@@ -0,0 +1,102 @@
The Basics
====================================================

.. include:: _writing_using_seqtk.rst

Common Workflow Language tool files are just simple YAML_ files, so at this point
one could just open a text editor and start implementing the tool. Planemo has a
command ``tool_init`` to quickly generate a skeleton to work from, so let's
start by doing that.

::

$ planemo tool_init --cwl --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'

The ``tool_init`` command can take various complex arguments - but three two
most basic ones are shown above ``--cwl``, ``--id`` and ``--name``. The ``--cwl``
flag simply tells Planemo to generate a Common Workflow Language tool. ``--id`` is
a short identifier for this tool and it should be unique across all tools.
``--name`` is a short, human-readable name for the the tool - it corresponds
to the ``label`` attribute in the CWL tool document.

The above command will generate the file ``seqtk_seq.cwl`` - which should look
like this.

.. literalinclude:: writing/seqtk_seq_v1.cwl
:language: yaml

This tool file has the common fields required for a CWL tool with TODO notes,
but you will still need to open up the editor and fill out the command, describe
input parameters, tool outputs, writeup a help section, etc....

The ``tool_init`` command can do a little bit better than this as well. We can
use the test command we tried above ``seqtk seq -a 2.fastq > 2.fasta`` as
an example to generate a command block by specifing the inputs and the outputs
as follows.

::

$ planemo tool_init --force \
--cwl \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta

This will generate the following CWL tool definition - which now has correct
definitions for the input, output, and command specified. These represent a best
guess by planemo, and in most cases will need to be tweaked manually after the
tool is generated.

.. literalinclude:: writing/seqtk_seq_v2.cwl
:language: yaml

.. include:: _writing_from_help_command.rst

::

$ planemo tool_init --force \
--cwl \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta
--help_from_command 'seqtk seq'

This command generates the following CWL YAML file.

.. literalinclude:: writing/seqtk_seq_v3.cwl
:language: yaml

.. include:: _writing_lint_intro.rst

::

$ planemo l seqtk_seq.cwl
Linting tool /home/john/workspace/planemo/docs/writing/seqtk_seq.cwl
Applying linter general... CHECK
.. CHECK: Tool defines a version [0.0.1].
.. CHECK: Tool defines a name [Convert to FASTA (seqtk)].
.. CHECK: Tool defines an id [seqtk_seq_v3].
Applying linter cwl_validation... CHECK
.. INFO: CWL appears to be valid.
Applying linter docker_image... WARNING
.. WARNING: Tool does not specify a DockerPull source.
Applying linter new_draft... CHECK
.. INFO: Modern CWL version [cwl:draft-3]
Failed linting

Here the linting failed because we have not yet defined a Docker image for the
the tool. A later revision of this document will cover specifying a Docker image
for this tool with the ``--container`` argument and discuss defining more
parameters for this tool.

For more information on the Common Workflow Language check out the Draft 3
`User Guide`_ and Specification_.

.. _YAML: http://yaml.org/
.. _User Guide: http://www.commonwl.org/draft-3/UserGuide.html
.. _Specification: http://www.commonwl.org/draft-3/CommandLineTool.html

@@ -0,0 +1,10 @@
As shown at the beginning of this section, the command ``seqtk seq`` generates
a help message for the ``seq`` command. ``tool_init`` can take that help message and
stick it right in the generated tool file using the ``help_from_command`` option.

Generally command help messages aren't exactly appropriate for tools
since they mention argument names and simillar details that are abstracted
away by the tool - but they can be an excellent place to start.

The following planemo ``tool_init`` call has been enhanced to use ``--help_from_command``.

@@ -1,56 +1,11 @@
The Basics
====================================================

This guide is going to demonstrate building up Galaxy tools wrappers for
commands from Heng Li's Seqtk_ package - a package for processing sequence
data in FASTA and FASTQ files. For fully worked through Seqtk wrappers -
checkout Eric Rasche's `wrappers <https://github.com/galaxyproject/tools-
iuc/tree/master/tools/seqtk>`_ on Github.
.. include:: _writing_using_seqtk.rst

To get started let's install Seqtk, download an example FASTQ file, and test
out the a simple Seqtk command - ``seq`` which converts FASTQ files into
FASTA. Here we are going to use ``brew`` to install seqtk - but however you
obtain it should be fine.

::

$ brew tap homebrew/science
$ brew install seqtk
==> Installing seqtk from homebrew/homebrew-science
==> Downloading https://github.com/lh3/seqtk/archive/73866e7.tar.gz
######################################################################## 100.0%
==> make
/home/john/.linuxbrew/Cellar/seqtk/1.0-r68: 3 files, 208K, built in 2 seconds
$ wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq
$ seqtk seq

Usage: seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT mask bases with quality lower than INT [0]
-X INT mask bases with quality higher than INT [255]
-n CHAR masked bases converted to CHAR; 0 for lowercase [0]
-l INT number of residues per line; 0 for 2^32-1 [0]
-Q INT quality shift: ASCII-INT gives base quality [33]
-s INT random seed (effective with -f) [11]
-f FLOAT sample FLOAT fraction of sequences [1]
-M FILE mask regions in BED or name list FILE [null]
-L INT drop sequences with length shorter than INT [0]
-c mask complement region (effective with -M)
-r reverse complement
-A force FASTA output (discard quality)
-C drop comments at the header lines
-N drop sequences containing ambiguous bases
-1 output the 2n-1 reads only
-2 output the 2n reads only
-V shift quality by '(-Q) - 33'
$ seqtk seq -a 2.fastq > 2.fasta
$ cat 2.fasta
>EAS54_6_R1_2_1_413_324
CCCTTCTTGTCTTCAGCGTTTCTCC
>EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
>EAS54_6_R1_2_1_443_348
GTTGCTTCTGGCGTGGGTGGGGGGG
For fully worked through Seqtk wrappers - checkout Eric Rasche's
`wrappers <https://github.com/galaxyproject/tools-iuc/tree/master/tools/seqtk>`__
on Github.

Galaxy tool files are just simple XML files, so at this point one could just
open a text editor and start implementing the tool. Planemo has a command
@@ -74,7 +29,7 @@ like this.
.. literalinclude:: writing/seqtk_seq_v1.xml
:language: xml

This tool file has the common sections required for Galaxy tool but you will
This tool file has the common sections required for a Galaxy tool but you will
still need to open up the editor and fill out the command template, describe
input parameters, tool outputs, writeup a help section, etc....

@@ -100,12 +55,7 @@ definitions for the input and output as well as an actual command template.
:language: xml
:emphasize-lines: 8-16

As shown above the command ``seqtk seq`` generates a help message for the
``seq`` command. ``tool_init`` can take that help message and stick it right
in the generated tool file using the ``help_from_command`` option. Generally
command help messages aren't exactly appropriate for Galaxy tool wrappers
since they mention argument names and simillar details that are abstracted
away by the tool - but they can be a good place to start.
.. include:: _writing_from_help_command.rst

::

@@ -120,18 +70,15 @@ away by the tool - but they can be a good place to start.
--cite_url 'https://github.com/lh3/seqtk' \
--help_from_command 'seqtk seq'

In addition to demonstrating ``--help_from_command``, this demonstrates generating
a test case from our example with ``--test_case`` and additing a citation for the
underlying tool. The resulting tool XML file is:

.. literalinclude:: writing/seqtk_seq_v3.xml
:language: xml
:emphasize-lines: 17-58

At this point we have a fairly a functional tool with test and help. This was
a pretty simple example - usually you will need to put more work into the tool
XML to get to this point - ``tool_init`` is really just designed to get you
started.

Now lets lint and test the tool we have developed. The planemo ``lint`` (or
just ``l``) command will reviews tools for obvious mistakes and compliance
with best practices.
.. include:: _writing_lint_intro.rst

::

@@ -164,4 +111,3 @@ command. This will print a lot of output but should ultimately reveal our one
test passed.

.. _DOI: http://www.doi.org/
.. _Seqtk: https://github.com/lh3/seqtk
@@ -0,0 +1,8 @@
At this point we have a fairly a functional tool with test and help. This was
a pretty simple example - usually you will need to put more work into the tool
to get to this point - ``tool_init`` is really just designed to get you
started.

Now lets lint and test the tool we have developed. The planemo ``lint`` (or
just ``l``) command will reviews tools for obvious mistakes and compliance
with best practices.
@@ -0,0 +1,52 @@
This guide is going to demonstrate building up tools for commands from Heng
Li's Seqtk_ package - a package for processing sequence data in FASTA_ and
FASTQ_ files.

To get started let's install Seqtk, download an example FASTQ file, and test
out the a simple Seqtk command - ``seq`` which converts FASTQ files into
FASTA. Here we are going to use ``brew`` to install Seqtk - but however you
obtain it should be fine.

::

$ brew tap homebrew/science
$ brew install seqtk
==> Installing seqtk from homebrew/homebrew-science
==> Downloading https://github.com/lh3/seqtk/archive/73866e7.tar.gz
######################################################################## 100.0%
==> make
/home/john/.linuxbrew/Cellar/seqtk/1.0-r68: 3 files, 208K, built in 2 seconds
$ wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq
$ seqtk seq

Usage: seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT mask bases with quality lower than INT [0]
-X INT mask bases with quality higher than INT [255]
-n CHAR masked bases converted to CHAR; 0 for lowercase [0]
-l INT number of residues per line; 0 for 2^32-1 [0]
-Q INT quality shift: ASCII-INT gives base quality [33]
-s INT random seed (effective with -f) [11]
-f FLOAT sample FLOAT fraction of sequences [1]
-M FILE mask regions in BED or name list FILE [null]
-L INT drop sequences with length shorter than INT [0]
-c mask complement region (effective with -M)
-r reverse complement
-A force FASTA output (discard quality)
-C drop comments at the header lines
-N drop sequences containing ambiguous bases
-1 output the 2n-1 reads only
-2 output the 2n reads only
-V shift quality by '(-Q) - 33'
$ seqtk seq -a 2.fastq > 2.fasta
$ cat 2.fasta
>EAS54_6_R1_2_1_413_324
CCCTTCTTGTCTTCAGCGTTTCTCC
>EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
>EAS54_6_R1_2_1_443_348
GTTGCTTCTGGCGTGGGTGGGGGGG

.. _Seqtk: https://github.com/lh3/seqtk
.. _FASTA: https://en.wikipedia.org/wiki/FASTA_format
.. _FASTQ: https://en.wikipedia.org/wiki/FASTQ_format
@@ -16,6 +16,7 @@ Contents:
configuration
appliance
writing
writing_cwl
publishing
commands
standards/docs/best_practices
@@ -18,5 +18,29 @@ planemo tool_init --force \
--example_input 2.fastq \
--example_output 2.fasta \
--test_case \
--cite_url 'https://github.com/lh3/seqtk' \
--help_from_command 'seqtk seq'
mv seqtk_seq.xml seqtk_seq_v3.xml


planemo tool_init --cwl --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'
mv seqtk_seq.cwl seqtk_seq_v1.cwl

planemo tool_init --force \
--cwl \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta
mv seqtk_seq.cwl seqtk_seq_v2.cwl

planemo tool_init --force \
--cwl \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta \
--help_from_command 'seqtk seq'
mv seqtk_seq.cwl seqtk_seq_v3.cwl
@@ -0,0 +1,11 @@
#!/usr/bin/env cwl-runner
cwlVersion: 'cwl:draft-3'
class: CommandLineTool
id: "seqtk_seq"
label: "Convert to FASTA (seqtk)"
inputs: [] # TODO
outputs: [] # TODO
baseCommand: []
arguments: []
description: |
TODO: Fill in description.
@@ -0,0 +1,25 @@
#!/usr/bin/env cwl-runner
cwlVersion: 'cwl:draft-3'
class: CommandLineTool
id: "seqtk_seq"
label: "Convert to FASTA (seqtk)"
inputs:
- id: input1
type: File
description: |
TODO
inputBinding:
position: 1
prefix: "-a"
outputs:
- id: output1
type: File
outputBinding:
glob: out
baseCommand:
- "seqtk"
- "seq"
arguments: []
stdout: out
description: |
TODO: Fill in description.
Oops, something went wrong.

0 comments on commit a4e6958

Please sign in to comment.