Skip to content

Commit

Permalink
Initial outline of tool builder.
Browse files Browse the repository at this point in the history
  • Loading branch information
jmchilton committed Feb 16, 2015
1 parent d759965 commit 78f8274
Show file tree
Hide file tree
Showing 11 changed files with 794 additions and 8 deletions.
36 changes: 33 additions & 3 deletions docs/commands/tool_init.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,41 @@ This section is auto-generated from the help text for the planemo command

**Help**

Not yet implemented, but someone should write a wizard for initializing
Galaxy tools and open a pull request :).
Generate a tool outline from supplied arguments.

**Options**::


--help Show this message and exit.
--test_case For use with --example_commmand, generate a tool
test case from the supplied example.
--container TEXT Add a Docker image identifier for this tool.
--requirement TEXT Add a tool requirement package (e.g. 'seqtk' or
'seqtk@1.68').
--help_from_command TEXT Auto populate help from supplied command.
--help_text TEXT Help text (reStructuredText)
--named_output TEXT Create a named output for use with command block
for example specify --named_output=output1.bam and
then use '-o $ouput1' in your command block.
--output TEXT An output location (e.g. output.bam), the Galaxy
datatype is inferred from the extension.
--input TEXT An input description (e.g. input.fasta)
--version_command TEXT Command to print version (e.g. 'seqtk --version')
--example_output TEXT For use with --example_command, replace input file
(e.g. 2.fastq with a tool output).
--example_input TEXT For use with --example_command, replace input file
(e.g. 2.fastq with a data input parameter).
--example_command TEXT Example to command with paths to build Cheetah
template from (e.g. 'seqtk seq -a 2.fastq >
2.fasta'). Option cannot be used with --command,
should be used --example_input and
--example_output.
-c, --command TEXT Command potentially including cheetah variables
()(e.g. 'seqtk seq -a $input > $output')
-d, --description TEXT Short description for new tool (user facing)
--version TEXT Tool XML version.
-n, --name TEXT Name for new tool (user facing)
-t, --tool PATH Output path for new tool (default is <id>.xml)
-f, --force Overwrite existing tool if present.
-i, --id TEXT Short identifier for new tool (no whitespace)
--help Show this message and exit.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Contents:
readme
installation
configuration
writing
commands
contributing
appliance
Expand Down
8 changes: 8 additions & 0 deletions docs/planemo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,14 @@ planemo.shed module
:undoc-members:
:show-inheritance:

planemo.tool_builder module
---------------------------

.. automodule:: planemo.tool_builder
:members:
:undoc-members:
:show-inheritance:


Module contents
---------------
Expand Down
189 changes: 189 additions & 0 deletions docs/writing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
======================
Building a Galaxy Tool
======================

This guide is going to demontrate building up Galaxy tools wrappers for
commands from Heng Li's Seqtk_ package - a package for processing sequence
data in FASTA and FASTQ files. For fully worked through Seqtk wrappers -
checkout Eric Rasche's `wrappers <https://github.com/galaxyproject/tools-
iuc/tree/master/tools/seqtk>`_ on Github.

To get started lets install Seqtk, download an example FASTQ file, and test
out the a simple Seqtk command - ``seq`` which converts FASTQ files into
FASTA. Here we are going to use ``brew`` to install seqtk - but however you
obtain it should be fine.

::

% brew tap homebrew/science
% brew install seqtk
==> Installing seqtk from homebrew/homebrew-science
==> Downloading https://github.com/lh3/seqtk/archive/73866e7.tar.gz
######################################################################## 100.0%
==> make
/home/john/.linuxbrew/Cellar/seqtk/1.0-r68: 3 files, 208K, built in 2 seconds
% wget https://raw.githubusercontent.com/galaxyproject/galaxy-test-data/master/2.fastq
% seqtk seq

Usage: seqtk seq [options] <in.fq>|<in.fa>

Options: -q INT mask bases with quality lower than INT [0]
-X INT mask bases with quality higher than INT [255]
-n CHAR masked bases converted to CHAR; 0 for lowercase [0]
-l INT number of residues per line; 0 for 2^32-1 [0]
-Q INT quality shift: ASCII-INT gives base quality [33]
-s INT random seed (effective with -f) [11]
-f FLOAT sample FLOAT fraction of sequences [1]
-M FILE mask regions in BED or name list FILE [null]
-L INT drop sequences with length shorter than INT [0]
-c mask complement region (effective with -M)
-r reverse complement
-A force FASTA output (discard quality)
-C drop comments at the header lines
-N drop sequences containing ambiguous bases
-1 output the 2n-1 reads only
-2 output the 2n reads only
-V shift quality by '(-Q) - 33'
% seqtk seq -a 2.fastq > 2.fasta
% cat 2.fasta
>EAS54_6_R1_2_1_413_324
CCCTTCTTGTCTTCAGCGTTTCTCC
>EAS54_6_R1_2_1_540_792
TTGGCAGGCCAAGGCCGATGGATCA
>EAS54_6_R1_2_1_443_348
GTTGCTTCTGGCGTGGGTGGGGGGG

Galaxy tool files are just simple XML files, so at this point one could just
open a text editor and start implementing the tool. Planemo has a command
``tool_init`` to quickly generate some of the boilerplate XML, so lets
start by doing that.

::

% planemo tool_init --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'

The ``tool_init`` command can take various complex arguments - but the two
most basic ones are shown above ``--id`` and ``--name``. Every Galaxy tool
needs an ``id` (this a short identifier used by Galaxy itself to identify the
tool) and a ``name`` (this is display to the Galaxy user and should be a short
description of the tool). In general ``name`` can have whitespace and ``id``
should not.

The above command will generate the file ``seqtk_seq.xml`` - which should look
like this.

.. literalinclude:: writing/seqtk_seq_v1.xml
:language: xml

This tool file has the common sections required for Galaxy tool but you will
still need to open up the editor and fill out the command template, describe
input parameters, tool outputs, writeup a help section, etc....

The ``tool_init`` command can do a little bit better than this as well. We can
use the test command we generated above ``seqtk seq -a 2.fastq > 2.fasta`` as
an example to generate a command block by specifing the inputs and the outputs
as follows.

::

% planemo tool_init --force \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--requirement seqtk@1.0-r68 \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta

This will generate the following tool XML file - which now has correct
definitions for the input and output as well as an actual command template.

.. literalinclude:: writing/seqtk_seq_v2.xml
:language: xml

As shown above the command ``seqtk seq`` generates a help message for this the
``seq`` command. ``tool_init`` can take that help message and stick it right
in the generated tool file using the ``help_from_command`` option. Generally
command help messages aren't exactly appropriate for Galaxy tool wrappers
since they mention argument names and simillar details that are abstracted
away by the tool - but they can be a good place to start.

::

% planemo tool_init --force \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--requirement seqtk@1.0-r68 \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta \
--test_case \
--help_from_command 'seqtk seq'

.. literalinclude:: writing/seqtk_seq_v3.xml
:language: xml

At this point we have a fairly a functional tool with test and help. This was
a pretty simple example - usually you will need to put more work into the tool
XML to get to this point - ``tool_init`` is really just designed to get you
started.

Now lets lint and test the tool we have developed. The planemo ``lint`` (or
just ``l``) command will reviews tools for obvious mistakes and compliance
with best practices.

::

% planemo l
Linting tool /home/john/test/seqtk_seq.xml
Applying linter lint_top_level... CHECK
.. CHECK: Tool defines a version.
.. CHECK: Tool defines a name.
.. CHECK: Tool defines an id name.
Applying linter lint_tests... CHECK
.. CHECK: 1 test(s) found.
Applying linter lint_output... CHECK
.. INFO: 1 output datasets found.
Applying linter lint_inputs... CHECK
.. INFO: Found 1 input parameters.
Applying linter lint_help... CHECK
.. CHECK: Tool contains help section.
.. CHECK: Help contains valid reStructuredText.
Applying linter lint_command... CHECK
.. INFO: Tool contains a command.
Applying linter lint_citations... WARNING
.. WARNING: No citations found, consider adding citations to your tool.

By default ``lint`` will find all the tools in your current working directory,
but we could have specified a particular tool with ``planemo lint
seqtk_seq.xml``. The only warning we received here is telling us the tool
lacks citations. Seqtk_ is unpublished, but if there were a paper to cite we
could have done so by passing in the DOI_ (e.g. ``--doi '10.1101/010538'``).

Next we can run our tool's functional test with the ``test`` (or just ``t``)
command. This will print a lot of output but should ultimately reveal our one
test passed.

::

% planemo --galaxy_root=/path/to/galaxy t
...
All 1 test(s) executed passed.
seqtk_seq[0]: passed

Now we can open Galaxy

::

% planemo --galaxy_root=/path/to/galaxy s
...
serving on http://127.0.0.1:9090

Open up http://127.0.0.1:9090 in a web browser to view your new tool.

More information:

* `Galaxy's Tool XML Syntax <https://wiki.galaxyproject.org/Admin/Tools/ToolConfigSyntax>`_
* `Big List of Tool Development Resources <https://wiki.galaxyproject.org/Develop/ResourcesTools>`_

.. _DOI: http://www.doi.org/
.. _Seqtk: https://github.com/lh3/seqtk
22 changes: 22 additions & 0 deletions docs/writing/gen.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#!/bin/bash

planemo tool_init --id 'seqtk_seq' --name 'Convert to FASTA (seqtk)'
mv seqtk_seq.xml seqtk_seq_v1.xml
planemo tool_init --force \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--requirement seqtk@1.0-r68 \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta
mv seqtk_seq.xml seqtk_seq_v2.xml
planemo tool_init --force \
--id 'seqtk_seq' \
--name 'Convert to FASTA (seqtk)' \
--requirement seqtk@1.0-r68 \
--example_command 'seqtk seq -a 2.fastq > 2.fasta' \
--example_input 2.fastq \
--example_output 2.fasta \
--test_case \
--help_from_command 'seqtk seq'
mv seqtk_seq.xml seqtk_seq_v3.xml
17 changes: 17 additions & 0 deletions docs/writing/seqtk_seq_v1.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
<stdio>
<exit_code range="1:" />
</stdio>
<requirements>
</requirements>
<command><![CDATA[
TODO: Fill in command template.
]]></command>
<inputs>
</inputs>
<outputs>
</outputs>
<help><![CDATA[
TODO: Fill in help.
]]></help>
</tool>
20 changes: 20 additions & 0 deletions docs/writing/seqtk_seq_v2.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
<stdio>
<exit_code range="1:" />
</stdio>
<requirements>
<requirement type="package" version="1.0-r68">seqtk</requirement>
</requirements>
<command><![CDATA[
seqtk seq -a $input1 > $output1
]]></command>
<inputs>
<param type="data" name="input1" format="fastq" />
</inputs>
<outputs>
<data name="output1" format="fasta" from_work_dir="2.fasta" />
</outputs>
<help><![CDATA[
TODO: Fill in help.
]]></help>
</tool>
47 changes: 47 additions & 0 deletions docs/writing/seqtk_seq_v3.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<tool id="seqtk_seq" name="Convert to FASTA (seqtk)" version="0.1.0">
<stdio>
<exit_code range="1:" />
</stdio>
<requirements>
<requirement type="package" version="1.0-r68">seqtk</requirement>
</requirements>
<command><![CDATA[
seqtk seq -a $input1 > $output1
]]></command>
<inputs>
<param type="data" name="input1" format="fastq" />
</inputs>
<outputs>
<data name="output1" format="fasta" from_work_dir="2.fasta" />
</outputs>
<tests>
<test>
<param name="input1" value="2.fastq"/>
<output name="output1" file="2.fasta"/>
</test>
</tests>
<help><![CDATA[
Usage: seqtk seq [options] <in.fq>|<in.fa>
Options: -q INT mask bases with quality lower than INT [0]
-X INT mask bases with quality higher than INT [255]
-n CHAR masked bases converted to CHAR; 0 for lowercase [0]
-l INT number of residues per line; 0 for 2^32-1 [0]
-Q INT quality shift: ASCII-INT gives base quality [33]
-s INT random seed (effective with -f) [11]
-f FLOAT sample FLOAT fraction of sequences [1]
-M FILE mask regions in BED or name list FILE [null]
-L INT drop sequences with length shorter than INT [0]
-c mask complement region (effective with -M)
-r reverse complement
-A force FASTA output (discard quality)
-C drop comments at the header lines
-N drop sequences containing ambiguous bases
-1 output the 2n-1 reads only
-2 output the 2n reads only
-V shift quality by '(-Q) - 33'
]]></help>
</tool>
Loading

0 comments on commit 78f8274

Please sign in to comment.