Skip to content

Latest commit

 

History

History
330 lines (242 loc) · 15.6 KB

writing_how_do_i.rst

File metadata and controls

330 lines (242 loc) · 15.6 KB

How do I...

This section contains a number of smaller topics with links and examples meant to provide relatively concrete answers for specific tool development scenarios.

... deal with index/reference data?

Galaxy's concept of data tables are meant to provide tools with access reference datasets or index data not tied to particular histories or users. A common example would be FASTA files for various genomes or mapper-specific indices of those files (e.g. a BWA index for the hg19 genome).

Galaxy data managers are specialized tools designed to populate tool data tables.

... cite tools without an obvious DOI?

In the absence of an obvious DOI, tools may contain embedded BibTeX directly.

Futher reading:

  • bibtex.xml (test tool with a bunch of random examples)
  • bwa-mem.xml (BWA-MEM tool by Anton Nekrutenko demonstrating citation of an arXiv article)
  • macros.xml (Macros for vcflib tool demonstrating citing a github repository)

... declare a Docker container for my tool?

Galaxy tools can be decorated to with container tags indicated Docker container ids that the tools can run inside of.

The longer term plan for the Tool Shed ecosystem is to be able to automatically build Docker containers for tool dependency descriptions and thereby obtain this Docker functionality for free and in a way that is completely backward compatible with non-Docker deployments.

Further reading:

... do extra validation of parameters?

Tool parameters support a validator element (syntax) to perform validation of a single parameter. More complex validation across parameters can be performed using arbitrary Python functions using the code file syntax but this feature should be used sparingly.

Further reading:

  • validator XML tag syntax on the Galaxy wiki.
  • fastq_filter.xml (a FASTQ filtering tool demonstrating validator constructs)
  • gffread.xml (a tool by Jim Johnson demonstrating using regular expressions with validator tags)
  • code_file.xml, code_file.py (test files demonstrating defining a simple constraint in Python across two parameters)
  • deseq2 tool by Björn Grüning demonstrating advanced code file validation.

... check input type in command blocks?

Input data parameters may specify multiple formats. For example

<param name="input" type="data" format="fastq,fasta" label="Input" />

If the command-line under construction doesn't require changes based on the input type - this may just be referenced as $input. However, if the command-line under construction uses different argument names depending on type for instance - it becomes important to dispatch on the underlying type.

In this example $input.ext - would return the short code for the actual datatype of the input supplied - for instance the string fasta or fastqsanger would be valid responses for inputs to this parameter for the above definition.

While .ext may sometimes be useful - there are many cases where it is inappropriate because of subtypes - checking if .ext is equal to fastq in the above example would not catch fastqsanger inputs for instance. To check if an input matches a type or any subtype thereof - the is_of_type method can be used. For instance

$input.is_of_type('fastq')

would check if the input is of type fastq or any derivative types such as fastqsanger.

... handle arbitrary output data formats?

If the output format of a tool's output cannot be known ahead of time, Galaxy can be instructed to "sniff" the output and determine the data type using the same method used for uploads. Adding the auto_format="true" attribute to a tool's output enables this.

<output name="out1" auto_format="true" label="Auto Output" />

... determine the user submitting a job?

The variable $__user_email__ (as well as $__user_name__ and $__user_id__) is available when building up your command in the tool's <command> block. The following tool demonstrates the use of this and a few other special parameters available to all tools.

... test with multiple value inputs?

To write tests that supply multiple values to a multiple="true" select or data parameter - simply specify the multiple values as a comma seperated list.

Here are examples of each:

... test dataset collections?

Here are some examples of testing tools that consume collections with type="data_collection" parameters.

Here are some examples of testing tools that produce collections with output_collection elements.

... test discovered datasets?

Tools which dynamically discover datasets after the job is complete, either using the <discovered_datasets> element, the older default pattern approach (e.g. finding files with names like primary_DATASET_ID_sample1_true_bam_hg18), or the undocumented galaxy.json approach can be tested by placing discovered_dataset elements beneath the corresponding output element with the designation corresponding to the file to test.

<test>
  <param name="input" value="7" />
  <output name="report" file="example_output.html">
    <discovered_dataset designation="world1" file="world1.txt" />
    <discovered_dataset designation="world2">
      <assert_contents>
        <has_line line="World Contents" />
      </assert_contents>
    </discovered_dataset>
  </output>
</test>

The test examples distributed with Galaxy demonstrating dynamic discovery and the testing thereof include:

... test composite dataset contents?

Tools which consume Galaxy composite datatypes can generate test inputs using the composite_data element demonstrated by the following tool.

Tools which produce Galaxy composite datatypes can specify tests for the individual output files using the extra_files element demonstrated by the following tool.

... test index (.loc) data?

There is an idiom to supply test data for index during tests using Planemo.

To create this kind of test, one needs to provide a tool_data_table_conf.xml.test beside your tool's tool_data_table_conf.xml.sample file that specifies paths to test .loc files which in turn define paths to the test index data. Both the .loc files and the tool_data_table_conf.xml.test can use the value ${__HERE__} which will be replaced with the path to the directory the file lives in. This allows using relative-like paths in these files which is needed for portable tests.

An example commit demonstrating the application of this approach to a Picard tool can be found here.

These tests can then be run with the Planemo test command.

Warning

This idiom does not work with the Tool Shed test automated framework at this time and so these tests will largely only pass with Planemo.

... test exit codes?

A test element can check the exit code of the underlying job using the check_exit_code="n" attribute.

... test failure states?

Normally, all tool test cases described by a test element are expected to pass - but on can assert a job should fail by adding expect_failure="true" to the test element.

... test output filters work?

If your tool contains filter elements, you can't verify properties of outputs that are filtered out and do not exist. The test element may contain an expect_num_outputs attribute to specify the expected number of outputs, this can be used to verify that outputs not listed are expected to be filtered out during tool execution.

... test metadata?

Output metadata can be checked using metadata elements in the XML description of the output.

... test tools installed in an existing Galaxy instance?

Do not use planemo, Galaxy should be used to test its tools directly. The following two commands can be used to test Galaxy tools in an existing instance.

$ sh run_tests.sh --report_file tool_tests_shed.html --installed

This above command specifies the --installed flag when calling run_tests.sh, this tells the test framework to test Tool Shed installed tools and only those tools.

$ GALAXY_TEST_TOOL_CONF=config/tool_conf.xml sh run_tests.sh --report_file tool_tests_tool_conf.html functional.test_toolbox

The second command sets GALAXY_TEST_TOOL_CONF environment variable, which will restrict the testing framework to considering a single tool conf file (such as the default tools that ship with Galaxy config/tool_conf.xml.sample and which must have their dependencies setup manually). The last argument to run_tests.sh, functional.test_toolbox tells the test framework to run all the tool tests in the configured tool conf file.

Note

Tip: To speed up tests you can use a pre-migrated database file the way Planemo does by setting the following environment variable before running run_tests.sh.

$ export GALAXY_TEST_DB_TEMPLATE="https://github.com/jmchilton/galaxy-downloads/raw/master/db_gx_rev_0127.sqlite"