ToC
Contents
- Generating KNIME Nodes for External Tools
- Prerequisites
- Running Example
- Preparation: Building samtools and Downloading GenericKnimeNodes
- Preparation: Installing KNIME File Handling
- Overview
- Obtaining the Demo Workflow Plugin Directory
- Creating an Exclipse Plugin from the Plugin Directory
- Importing the Generated Projects into Eclipse
- Launching Eclipse with our Nodes
- Anatomy of a Plugin Directory
- Generating KNIME Nodes for SeqAn Apps
$HOME/eclipse_knime_2.8.0
).We will adapt some functions from the samtools package to KNIME:
samtools view -o ${OUT} ${IN}
.samtools view -Sb -o ${OUT} ${IN}
.samtools sort -f -o ${OUT} ${IN}
.Hint
The -f
flag is required for the integration of samtools without a wrapper, since it would append .bam
to ${OUT}
for getting the output name.
However, only the current trunk version from the samtools GitHub project supports this flag.
As mentioned above, we have to build the current trunk version of samtools for the sort_bam tool to work. The following shell commands download the current samtools trunk from GitHub and build samtools. We will work in a new directory knime_samtools (we will assume that the directory is directly in your $HOME for the rest of the tutorial.
knime_samtools # git clone https://github.com/samtools/samtools
...
knime_samtools # cd samtools
samtools # make
...
samtools # ls -l samtools
-rwxr-xr-x 1 user group 1952339 May 7 16:36 samtools
samtools # cd ..
knime_samtools #
Then, we need to download GenericKnimeNodes:
knime_samtools # git clone git://github.com/genericworkflownodes/GenericKnimeNodes.git
We need to install support for file handling nodes in KNIME.
For this, open the window for installing Eclipse plugins; in the program's main menu: Help > Install New Software...
.
Here, enter http://www.knime.org/update/2.8/
into the Work with:
field, enter file
into the search box, and finally select KNIME File Handling Nodes
in the list.
Then, click Next
and follow through with the installation of the plugin. When done, Eclipse must be restarted.
KNIME nodes are shipped as Eclipse plugins. The GenericKnimeNodes (GWN) package provides the infrastructure to automatically generate such nodes from the description of their command line. The description of the command line is kept in XML files called Common Tool Descriptor (CTD) files. The input of the GWN package is a directory tree with the following structure:
plugin_dir │ ├── plugin.properties │ ├── descriptors (place your ctd files and mime.types here) │ ├── payload (place your binaries here) │ ├── icons (the icons to be used must be here) │ ├── DESCRIPTION (A short description of the project) │ ├── LICENSE (Licensing information of the project) │ └── COPYRIGHT (Copyright information of the project)
The GWN project provides tools to convert such a plugin directory into an Eclipse plugin. This plugin can then be launched together with KNIME. The following picture illustrates the process.
Please download the file :download:`workflow_plugin_dir.zip <workflow_plugin_dir.zip>` and look around in the archive.
Also have a look into binaries_*_*.zip
files in payload.
The structure of this ZIP file is explained below in :ref:`Anatomy of a Plugin Directory <how-to-generate-knime-nodes-for-external-tools-anatomy-of-a-plugin-directory>`.
The next step is to use GKN to create an Eclipse plugin from the workflow plugin directory. For this, change to the directory GenericKnimeNodes that we cloned using git earlier. We then execute ant and pass the variables knime.sdk with the path to the KNIME SDK that you downloaded earlier and plugin.dir with the path of our plugin directory.
knime_samtools # cd GenericKnimeNodes
GenericKnimeNodes # ant -Dknime.sdk=${HOME}/eclipse_knime_2.8.0 \
-Dplugin.dir=$HOME/knime_samtools/workflow_plugin_dir
This generates an Eclipse plugin with wrapper classes for our nodes. The generated files are within the generated_plugin directory of the directory GenericKnimeNodes.
In the main menu File > Import...
. In the Import
window, select General > Existing Project Into Workspace
In the next dialog, click Browse...
next to Select root directory
.
Then, select the directory of your "GenericWorkflowNodes" checkout. The final dialog should then look as follows.
Clicking finish will import (1) the GKN classes themselves and (2) your generated plugin's classes.
Now, the packages of the GKN classes and your plugin show up in the left Package Explorer
pane of Eclipse.
Hint
Information: Synchronizing ant
build result with Eclipse.
Since the code generation happens outside of Eclipse, there are often problems caused by Eclipse not recognizing updates in generated .java files.
After each call to ant
, you should clean all built files in all projects by selecting the menu entries Project > Clean...
, selecting Clean all projects
, and then clicking OK
.
Then, select all projects in the Package Explorer
, right-click and select Refresh
.
Finally, we have to launch KNIME with our plugin. We have to create a run configuration for this. Select Run > Run Configurations...
.
In the Run Configurations
window, select Eclipse Application
on the left, then create the small New launch configuration
icon on the top left (both marked in the following screenshot).
Now, set the Name
field to "KNIME", select Run an application
and select org.knime.product.KNIME_APPLICATION
in the drop down menu.
Finally, click Run
.
Your tool will show up in the tool selector in community/SAM and BAM
.
Here is an example KNIME workflow with the nodes that we just created.
You can download a ZIP archive of the resulting project :download:`from the attached file workflow\_plugin\_dir.zip <workflow_plugin_dir.zip>`. We will ignore the contents of icons, DESCRIPTION, LICENSE, and COPYRIGHT here. You can see all relevant details by inspecting the ZIP archive.
The content of the file plugin.properties is as follows:
# the package of the plugin pluginPackage=net.sf.samtools # the name of the plugin pluginName=SamTools # the version of the plugin pluginVersion=0.1.17 # the path (starting from KNIMEs Community Nodes node) nodeRepositoyRoot=community executor=com.genericworkflownodes.knime.execution.impl.LocalToolExecutor commandGenerator=com.genericworkflownodes.knime.execution.impl.CLICommandGenerator
When creating your own plugin directory, you only have to update the first three properties:
The contents of the file is as shown below. Each line contains the definition of a MIME type. The name of the mime type is followed (separated by a space) by the file extensions associated with the file type. There may be no ambiguous mappings, i.e. giving the extension for both application/x-fasta and application/x-fastq.
application/x-fasta fa fasta application/x-fastq fq fastq application/x-sam sam application/x-bam bam
This file descripes the SortBam tool for sorting BAM files. We do not describe the files descriptors/samtools_sam_to_bam.ctd and descriptors/samtools_bam_to_sam.ctd in the same detail as you can interpolate from here.
<?xml version="1.0" encoding="UTF-8"?>
<tool name="SortBam" version="0.1.17" category="SAM and BAM"
docurl="http://samtools.sourceforge.net/samtools.shtml">
<executableName>samtools</executableName>
<description><![CDATA[SAMtools BAM Sorting.]]></description>
<manual><![CDATA[samtools sort]]></manual>
<docurl>Direct links in docs</docurl>
<cli>
<clielement optionIdentifier="sort" isList="false" />
<clielement optionIdentifier="-f" isList="false" />
<!-- Following clielements are arguments. You should consider
providing a help text to ease understanding. -->
<clielement optionIdentifier="" isList="false">
<mapping referenceName="bam_to_sam.argument-0" />
</clielement>
<clielement optionIdentifier="" isList="false">
<mapping referenceName="bam_to_sam.argument-1" />
</clielement>
<clielement optionIdentifier="" isList="false">
<mapping referenceName="bam_to_sam.argument-2" />
</clielement>
</cli>
<PARAMETERS version="1.4"
xsi:noNamespaceSchemaLocation="http://open-ms.sourceforge.net/schemas/Param_1_4.xsd"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<NODE name="bam_to_sam" description="SAMtools BAM to SAM conversion">
<ITEM name="argument-0" value="" type="input-file" required="true"
description="Input BAM file." supported_formats="*.bam" />
<ITEM name="argument-1" value="" type="output-file" required="true"
description="Output BAM file." supported_formats="*.bam" />
<ITEM name="argument-2" value="" type="string" required="true"
description="Sort by query name (-n) instead of position (default)" restrictions=",-n" />
</NODE>
</PARAMETERS>
</tool>
Here is a description of the tags and the attributes:
<clielement>
tags.
These tags describe the command line options and arguments of the tool.
The command line options and arguments can be mapped to parameters which are configurable through the UI.
The parameters are stored in tool/PARAMETERS-l
option of ls
, this is -l
.true
and false
.<ITEM>
s in tool/PARAMETERS are stored in nested <NODE>
tags and this gives the path to the specific parameter.<NODE>
and <ITEM>
tags.
The <PARAMETERS>
tag is in a diferent namespace and provides its own XSI.<PARAMETERS>
section.string
, int
, double
, input-file
, output-path
, input-prefix
, or output-prefix
.
Booleans are encoded as string
with the restrictions
attribute set to "true,false"
."*.bam,*.sam"
.int
or double
types, the restrictions have the form min:
, :max
, min:max
and give the smallest and/or largest number a value can have.
In the case of string
types, restrictions gives the list of allowed values, e.g. one,two,three
.
If the type is string
and the restriction field equals "true,false"
, then the parameter is a boolean and set in case true
is selected in the GUI.
A good example for this would be the -l
flag of the ls
program.Hint
If a <clielement>
does provides an empty <tt>optionIdentifier</tt> then it is a positional argument without a flag (examples for parameters with flags are -n 1
, --number 1
).
If a <clielement>
does not provide a <mapping>
then it is passed regardless of whether has been configured or not.
The samtools_sort_bam
tool from above does not provide any configurable options but only two arguments.
These are by convention called argument-0
and argument-1
but could have any name.
Also, we always call the program with view -f
as the first two command line arguments since we do not provide a mapping for these arguments.
The directory payload
contains ZIP files with the executable tool binaries.
There is one ZIP file for each platform (Linux, Windows, and Mac Os X) and each architecture (32 bit and 64 bit).
The names of the files are binaries_${plat}_${arch}.zip
where ${plat}
is one of lnx
, win
, or mac
, and ${arch}
is one of 32
and 64
.
Each ZIP file contains a directory /bin
which is used as the search path for the binary given by <executableName>
.
Also, it provides an INI file /binaries.ini
which can be used to define environment variables to set before executing any tools.
The ZIP file can also provide other files in directories such as /share
.
You can generate a workflow plugin directory for the SeqAn apps using the prepare_workflow_plugin
target.
Then, you can generate the Knime Nodes/Eclipse plugins as described above using ant.
~ # svn co http://svn.seqan.de/seqan/trunk seqan-trunk
~ # mkdir -p seqan-trunk-build/release
~ # seqan-trunk-build/release
release # cmake ../../seqan-trunk
release # make prepare_workflow_plugin
release # cd ~/knime_samtools/GenericKnimeNodes
GenericKnimeNodes # ant -Dknime.sdk=${HOME}/eclipse_knime_2.8.0 \
-Dplugin.dir=$HOME/seqan-trunk-build/release/workflow_plugin_dir