Skip to content

Commit a4e6958

Browse files
committed
Infrastructure and docs for building and linting CWL tools.
- Through galaxy-lib updates, lint will now lint CWL tools. - Checks for a validation, description, Docker image, and cwlVersion. - Update ``tool_init`` to take in a ``--cwl`` parameter and generate CWL tools. - Add a section to the documentation on building CWL tools - replicate the first three ``planemo tool_init`` commands for the seqtk_seq example. - Tests and docstring improvements. Requires galaxy-lib improvements in 16.7.2. Fixes #450, xref #408.
1 parent 3f4ab44 commit a4e6958

22 files changed

+904
-134
lines changed

docs/_writing_cwl_intro.rst

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
The Basics
2+
====================================================
3+
4+
.. include:: _writing_using_seqtk.rst
5+
6+
Common Workflow Language tool files are just simple YAML_ files, so at this point
7+
one could just open a text editor and start implementing the tool. Planemo has a
8+
command ``tool_init`` to quickly generate a skeleton to work from, so let's
9+
start by doing that.
10+
11+
::
12+
13+
$ planemo tool_init --cwl --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'
14+
15+
The ``tool_init`` command can take various complex arguments - but three two
16+
most basic ones are shown above ``--cwl``, ``--id`` and ``--name``. The ``--cwl``
17+
flag simply tells Planemo to generate a Common Workflow Language tool. ``--id`` is
18+
a short identifier for this tool and it should be unique across all tools.
19+
``--name`` is a short, human-readable name for the the tool - it corresponds
20+
to the ``label`` attribute in the CWL tool document.
21+
22+
The above command will generate the file ``seqtk_seq.cwl`` - which should look
23+
like this.
24+
25+
.. literalinclude:: writing/seqtk_seq_v1.cwl
26+
:language: yaml
27+
28+
This tool file has the common fields required for a CWL tool with TODO notes,
29+
but you will still need to open up the editor and fill out the command, describe
30+
input parameters, tool outputs, writeup a help section, etc....
31+
32+
The ``tool_init`` command can do a little bit better than this as well. We can
33+
use the test command we tried above ``seqtk seq -a 2.fastq > 2.fasta`` as
34+
an example to generate a command block by specifing the inputs and the outputs
35+
as follows.
36+
37+
::
38+
39+
$ planemo tool_init --force \
40+
--cwl \
41+
--id 'seqtk_seq' \
42+
--name 'Convert to FASTA (seqtk)' \
43+
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
44+
--example_input 2.fastq \
45+
--example_output 2.fasta
46+
47+
This will generate the following CWL tool definition - which now has correct
48+
definitions for the input, output, and command specified. These represent a best
49+
guess by planemo, and in most cases will need to be tweaked manually after the
50+
tool is generated.
51+
52+
.. literalinclude:: writing/seqtk_seq_v2.cwl
53+
:language: yaml
54+
55+
.. include:: _writing_from_help_command.rst
56+
57+
::
58+
59+
$ planemo tool_init --force \
60+
--cwl \
61+
--id 'seqtk_seq' \
62+
--name 'Convert to FASTA (seqtk)' \
63+
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
64+
--example_input 2.fastq \
65+
--example_output 2.fasta
66+
--help_from_command 'seqtk seq'
67+
68+
This command generates the following CWL YAML file.
69+
70+
.. literalinclude:: writing/seqtk_seq_v3.cwl
71+
:language: yaml
72+
73+
.. include:: _writing_lint_intro.rst
74+
75+
::
76+
77+
$ planemo l seqtk_seq.cwl
78+
Linting tool /home/john/workspace/planemo/docs/writing/seqtk_seq.cwl
79+
Applying linter general... CHECK
80+
.. CHECK: Tool defines a version [0.0.1].
81+
.. CHECK: Tool defines a name [Convert to FASTA (seqtk)].
82+
.. CHECK: Tool defines an id [seqtk_seq_v3].
83+
Applying linter cwl_validation... CHECK
84+
.. INFO: CWL appears to be valid.
85+
Applying linter docker_image... WARNING
86+
.. WARNING: Tool does not specify a DockerPull source.
87+
Applying linter new_draft... CHECK
88+
.. INFO: Modern CWL version [cwl:draft-3]
89+
Failed linting
90+
91+
Here the linting failed because we have not yet defined a Docker image for the
92+
the tool. A later revision of this document will cover specifying a Docker image
93+
for this tool with the ``--container`` argument and discuss defining more
94+
parameters for this tool.
95+
96+
For more information on the Common Workflow Language check out the Draft 3
97+
`User Guide`_ and Specification_.
98+
99+
.. _YAML: http://yaml.org/
100+
.. _User Guide: http://www.commonwl.org/draft-3/UserGuide.html
101+
.. _Specification: http://www.commonwl.org/draft-3/CommandLineTool.html
102+

docs/_writing_from_help_command.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
As shown at the beginning of this section, the command ``seqtk seq`` generates
2+
a help message for the ``seq`` command. ``tool_init`` can take that help message and
3+
stick it right in the generated tool file using the ``help_from_command`` option.
4+
5+
Generally command help messages aren't exactly appropriate for tools
6+
since they mention argument names and simillar details that are abstracted
7+
away by the tool - but they can be an excellent place to start.
8+
9+
The following planemo ``tool_init`` call has been enhanced to use ``--help_from_command``.
10+

docs/_writing_intro.rst

Lines changed: 11 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,56 +1,11 @@
11
The Basics
22
====================================================
33

4-
This guide is going to demonstrate building up Galaxy tools wrappers for
5-
commands from Heng Li's Seqtk_ package - a package for processing sequence
6-
data in FASTA and FASTQ files. For fully worked through Seqtk wrappers -
7-
checkout Eric Rasche's `wrappers <https://github.com/galaxyproject/tools-
8-
iuc/tree/master/tools/seqtk>`_ on Github.
4+
.. include:: _writing_using_seqtk.rst
95

10-
To get started let's install Seqtk, download an example FASTQ file, and test
11-
out the a simple Seqtk command - ``seq`` which converts FASTQ files into
12-
FASTA. Here we are going to use ``brew`` to install seqtk - but however you
13-
obtain it should be fine.
14-
15-
::
16-
17-
$ brew tap homebrew/science
18-
$ brew install seqtk
19-
==> Installing seqtk from homebrew/homebrew-science
20-
==> Downloading https://github.com/lh3/seqtk/archive/73866e7.tar.gz
21-
######################################################################## 100.0%
22-
==> make
23-
/home/john/.linuxbrew/Cellar/seqtk/1.0-r68: 3 files, 208K, built in 2 seconds
24-
$ wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq
25-
$ seqtk seq
26-
27-
Usage: seqtk seq [options] <in.fq>|<in.fa>
28-
29-
Options: -q INT mask bases with quality lower than INT [0]
30-
-X INT mask bases with quality higher than INT [255]
31-
-n CHAR masked bases converted to CHAR; 0 for lowercase [0]
32-
-l INT number of residues per line; 0 for 2^32-1 [0]
33-
-Q INT quality shift: ASCII-INT gives base quality [33]
34-
-s INT random seed (effective with -f) [11]
35-
-f FLOAT sample FLOAT fraction of sequences [1]
36-
-M FILE mask regions in BED or name list FILE [null]
37-
-L INT drop sequences with length shorter than INT [0]
38-
-c mask complement region (effective with -M)
39-
-r reverse complement
40-
-A force FASTA output (discard quality)
41-
-C drop comments at the header lines
42-
-N drop sequences containing ambiguous bases
43-
-1 output the 2n-1 reads only
44-
-2 output the 2n reads only
45-
-V shift quality by '(-Q) - 33'
46-
$ seqtk seq -a 2.fastq > 2.fasta
47-
$ cat 2.fasta
48-
>EAS54_6_R1_2_1_413_324
49-
CCCTTCTTGTCTTCAGCGTTTCTCC
50-
>EAS54_6_R1_2_1_540_792
51-
TTGGCAGGCCAAGGCCGATGGATCA
52-
>EAS54_6_R1_2_1_443_348
53-
GTTGCTTCTGGCGTGGGTGGGGGGG
6+
For fully worked through Seqtk wrappers - checkout Eric Rasche's
7+
`wrappers <https://github.com/galaxyproject/tools-iuc/tree/master/tools/seqtk>`__
8+
on Github.
549

5510
Galaxy tool files are just simple XML files, so at this point one could just
5611
open a text editor and start implementing the tool. Planemo has a command
@@ -74,7 +29,7 @@ like this.
7429
.. literalinclude:: writing/seqtk_seq_v1.xml
7530
:language: xml
7631

77-
This tool file has the common sections required for Galaxy tool but you will
32+
This tool file has the common sections required for a Galaxy tool but you will
7833
still need to open up the editor and fill out the command template, describe
7934
input parameters, tool outputs, writeup a help section, etc....
8035

@@ -100,12 +55,7 @@ definitions for the input and output as well as an actual command template.
10055
:language: xml
10156
:emphasize-lines: 8-16
10257

103-
As shown above the command ``seqtk seq`` generates a help message for the
104-
``seq`` command. ``tool_init`` can take that help message and stick it right
105-
in the generated tool file using the ``help_from_command`` option. Generally
106-
command help messages aren't exactly appropriate for Galaxy tool wrappers
107-
since they mention argument names and simillar details that are abstracted
108-
away by the tool - but they can be a good place to start.
58+
.. include:: _writing_from_help_command.rst
10959

11060
::
11161

@@ -120,18 +70,15 @@ away by the tool - but they can be a good place to start.
12070
--cite_url 'https://github.com/lh3/seqtk' \
12171
--help_from_command 'seqtk seq'
12272

73+
In addition to demonstrating ``--help_from_command``, this demonstrates generating
74+
a test case from our example with ``--test_case`` and additing a citation for the
75+
underlying tool. The resulting tool XML file is:
76+
12377
.. literalinclude:: writing/seqtk_seq_v3.xml
12478
:language: xml
12579
:emphasize-lines: 17-58
12680

127-
At this point we have a fairly a functional tool with test and help. This was
128-
a pretty simple example - usually you will need to put more work into the tool
129-
XML to get to this point - ``tool_init`` is really just designed to get you
130-
started.
131-
132-
Now lets lint and test the tool we have developed. The planemo ``lint`` (or
133-
just ``l``) command will reviews tools for obvious mistakes and compliance
134-
with best practices.
81+
.. include:: _writing_lint_intro.rst
13582

13683
::
13784

@@ -164,4 +111,3 @@ command. This will print a lot of output but should ultimately reveal our one
164111
test passed.
165112

166113
.. _DOI: http://www.doi.org/
167-
.. _Seqtk: https://github.com/lh3/seqtk

docs/_writing_lint_intro.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
At this point we have a fairly a functional tool with test and help. This was
2+
a pretty simple example - usually you will need to put more work into the tool
3+
to get to this point - ``tool_init`` is really just designed to get you
4+
started.
5+
6+
Now lets lint and test the tool we have developed. The planemo ``lint`` (or
7+
just ``l``) command will reviews tools for obvious mistakes and compliance
8+
with best practices.

docs/_writing_using_seqtk.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
This guide is going to demonstrate building up tools for commands from Heng
2+
Li's Seqtk_ package - a package for processing sequence data in FASTA_ and
3+
FASTQ_ files.
4+
5+
To get started let's install Seqtk, download an example FASTQ file, and test
6+
out the a simple Seqtk command - ``seq`` which converts FASTQ files into
7+
FASTA. Here we are going to use ``brew`` to install Seqtk - but however you
8+
obtain it should be fine.
9+
10+
::
11+
12+
$ brew tap homebrew/science
13+
$ brew install seqtk
14+
==> Installing seqtk from homebrew/homebrew-science
15+
==> Downloading https://github.com/lh3/seqtk/archive/73866e7.tar.gz
16+
######################################################################## 100.0%
17+
==> make
18+
/home/john/.linuxbrew/Cellar/seqtk/1.0-r68: 3 files, 208K, built in 2 seconds
19+
$ wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq
20+
$ seqtk seq
21+
22+
Usage: seqtk seq [options] <in.fq>|<in.fa>
23+
24+
Options: -q INT mask bases with quality lower than INT [0]
25+
-X INT mask bases with quality higher than INT [255]
26+
-n CHAR masked bases converted to CHAR; 0 for lowercase [0]
27+
-l INT number of residues per line; 0 for 2^32-1 [0]
28+
-Q INT quality shift: ASCII-INT gives base quality [33]
29+
-s INT random seed (effective with -f) [11]
30+
-f FLOAT sample FLOAT fraction of sequences [1]
31+
-M FILE mask regions in BED or name list FILE [null]
32+
-L INT drop sequences with length shorter than INT [0]
33+
-c mask complement region (effective with -M)
34+
-r reverse complement
35+
-A force FASTA output (discard quality)
36+
-C drop comments at the header lines
37+
-N drop sequences containing ambiguous bases
38+
-1 output the 2n-1 reads only
39+
-2 output the 2n reads only
40+
-V shift quality by '(-Q) - 33'
41+
$ seqtk seq -a 2.fastq > 2.fasta
42+
$ cat 2.fasta
43+
>EAS54_6_R1_2_1_413_324
44+
CCCTTCTTGTCTTCAGCGTTTCTCC
45+
>EAS54_6_R1_2_1_540_792
46+
TTGGCAGGCCAAGGCCGATGGATCA
47+
>EAS54_6_R1_2_1_443_348
48+
GTTGCTTCTGGCGTGGGTGGGGGGG
49+
50+
.. _Seqtk: https://github.com/lh3/seqtk
51+
.. _FASTA: https://en.wikipedia.org/wiki/FASTA_format
52+
.. _FASTQ: https://en.wikipedia.org/wiki/FASTQ_format

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ Contents:
1616
configuration
1717
appliance
1818
writing
19+
writing_cwl
1920
publishing
2021
commands
2122
standards/docs/best_practices

docs/writing/gen.sh

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,29 @@ planemo tool_init --force \
1818
--example_input 2.fastq \
1919
--example_output 2.fasta \
2020
--test_case \
21+
--cite_url 'https://github.com/lh3/seqtk' \
2122
--help_from_command 'seqtk seq'
2223
mv seqtk_seq.xml seqtk_seq_v3.xml
24+
25+
26+
planemo tool_init --cwl --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'
27+
mv seqtk_seq.cwl seqtk_seq_v1.cwl
28+
29+
planemo tool_init --force \
30+
--cwl \
31+
--id 'seqtk_seq' \
32+
--name 'Convert to FASTA (seqtk)' \
33+
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
34+
--example_input 2.fastq \
35+
--example_output 2.fasta
36+
mv seqtk_seq.cwl seqtk_seq_v2.cwl
37+
38+
planemo tool_init --force \
39+
--cwl \
40+
--id 'seqtk_seq' \
41+
--name 'Convert to FASTA (seqtk)' \
42+
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
43+
--example_input 2.fastq \
44+
--example_output 2.fasta \
45+
--help_from_command 'seqtk seq'
46+
mv seqtk_seq.cwl seqtk_seq_v3.cwl

docs/writing/seqtk_seq_v1.cwl

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/usr/bin/env cwl-runner
2+
cwlVersion: 'cwl:draft-3'
3+
class: CommandLineTool
4+
id: "seqtk_seq"
5+
label: "Convert to FASTA (seqtk)"
6+
inputs: [] # TODO
7+
outputs: [] # TODO
8+
baseCommand: []
9+
arguments: []
10+
description: |
11+
TODO: Fill in description.

docs/writing/seqtk_seq_v2.cwl

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
#!/usr/bin/env cwl-runner
2+
cwlVersion: 'cwl:draft-3'
3+
class: CommandLineTool
4+
id: "seqtk_seq"
5+
label: "Convert to FASTA (seqtk)"
6+
inputs:
7+
- id: input1
8+
type: File
9+
description: |
10+
TODO
11+
inputBinding:
12+
position: 1
13+
prefix: "-a"
14+
outputs:
15+
- id: output1
16+
type: File
17+
outputBinding:
18+
glob: out
19+
baseCommand:
20+
- "seqtk"
21+
- "seq"
22+
arguments: []
23+
stdout: out
24+
description: |
25+
TODO: Fill in description.

0 commit comments

Comments
 (0)