A Galaxy tool wrapper to Mauro Tutino's Amplicon_analysis
pipeline
script at https://github.com/MTutino/Amplicon_analysis
The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq (Casava >= 1.8) and performs the following operations:
- QC and clean up of input data
- Removal of singletons and chimeras and building of OTU table and phylogenetic tree
- Beta and alpha diversity of analysis
Usage of the tool (including required inputs) is documented within
the help
section of the tool XML.
The following sections describe how to install the tool files, dependencies and reference data, and how to configure the Galaxy instance to detect the dependencies and reference data correctly at run time.
The core tool is hosted on the Galaxy toolshed, so it can be installed directly from there (this is the recommended route):
Alternatively it can be installed manually; in this case there are two files to install:
amplicon_analysis_pipeline.xml
(the Galaxy tool definition)amplicon_analysis_pipeline.py
(the Python wrapper script)
Put these in a directory that is visible to Galaxy (e.g. a
tools/Amplicon_analysis/
folder), and modify the tools_conf.xml
file to tell Galaxy to offer the tool by adding the line e.g.:
<tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />
The script References.sh
from the pipeline package at
https://github.com/MTutino/Amplicon_analysis can be run to install
the reference data, for example:
cd /path/to/pipeline/data wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh /bin/bash ./References.sh
will install the data in /path/to/pipeline/data
.
NB The final amount of data downloaded and uncompressed will be around 9GB.
The final step is to make your Galaxy installation aware of the location of the reference data, so it can locate them both when the tool is run.
The tool locates the reference data via an environment variable called
AMPLICON_ANALYSIS_REF_DATA_PATH
, which needs to set to the parent
directory where the reference data has been installed.
There are various ways to do this, depending on how your Galaxy installation is configured:
For local instances: add a line to set it in the
config/local_env.sh
file of your Galaxy installation (you may need to create a new empty file first), e.g.:export AMPLICON_ANALYSIS_REF_DATA_PATH=/path/to/pipeline/dataFor production instances: set the value in the
job_conf.xml
configuration file, e.g.:<destination id="amplicon_analysis"> <env id="AMPLICON_ANALYSIS_REF_DATA_PATH">/path/to/pipeline/data</env> </destination>and then specify that the pipeline tool uses this destination:
<tool id="amplicon_analysis_pipeline" destination="amplicon_analysis"/>(For more about job destinations see the Galaxy documentation at https://docs.galaxyproject.org/en/master/admin/jobs.html#job-destinations)
To ensure that HTML outputs are displayed correctly in Galaxy
(for example the Vsearch OTU table heatmaps), Galaxy needs to be
configured not to sanitize the outputs from the Amplicon_analysis
tool.
Either:
- For local instances: set
sanitize_all_html = False
inconfig/galaxy.ini
(nb don't do this on production servers or public instances!); or- For production instances: add the
Amplicon_analysis
tool to the display whitelist in the Galaxy instance:
- Set
sanitize_whitelist_file = config/whitelist.txt
inconfig/galaxy.ini
and restart Galaxy;- Go to
Admin>Manage Display Whitelist
, check the box forAmplicon_analysis
(hint: use your browser's 'find-in-page' search function to help locate it) and click onSubmit new whitelist
to update the settings.
Some other things to be aware of:
- Note that using the Silva database requires a minimum of 18Gb RAM
Only the
VSEARCH
pipeline in Mauro's script is currently available via the Galaxy tool; theUSEARCH
andQIIME
pipelines have yet to be implemented.The images in the tool help section are not visible if the tool has been installed locally, or if it has been installed in a Galaxy instance which is served from a subdirectory.
These are both problems with Galaxy and not the tool, see galaxyproject/galaxy#4490 and galaxyproject/galaxy#1676
If the tool is installed from the Galaxy toolshed (recommended) then the dependencies should be installed automatically and this step can be skipped.
Otherwise the install_amplicon_analysis_deps.sh
script can be used
to fetch and install the dependencies locally, for example:
install_amplicon_analysis.sh /path/to/local_tool_dependencies
(This is the same script as is used to install dependencies from the
toolshed.) This can take some time to complete, and when completed will
have created a directory called Amplicon_analysis-1.2.3
containing
the dependencies under the specified top level directory.
NB The installed dependencies will occupy around 2.6G of disk space.
You will need to make sure that the bin
subdirectory of this
directory is on Galaxy's PATH
at runtime, for the tool to be able
to access the dependencies - for example by adding a line to the
local_env.sh
file like:
export PATH=/path/to/local_tool_dependencies/Amplicon_analysis-1.2.3/bin:$PATH
Version | Changes |
1.3.5.0 | Updated to Amplicon_Analysis_Pipeline version 1.3.5. |
1.2.3.0 | Updated to Amplicon_Analysis_Pipeline version 1.2.3; install dependencies via tool_dependencies.xml. |
1.2.2.0 | Updated to Amplicon_Analysis_Pipeline version 1.2.2 (removes jackknifed analysis which is not captured by Galaxy tool) |
1.2.1.0 | Updated to Amplicon_Analysis_Pipeline version 1.2.1 (adds option to use the Human Oral Microbiome Database v15.1, and updates SILVA database to v123) |
1.1.0 | First official version on Galaxy toolshed. |
1.0.6 | Expand inline documentation to provide detailed usage guidance. |
1.0.5 | Updates including:
|
1.0.4 | Various updates:
|
1.0.3 | Take the sample names from the collection dataset names when
using collection as input (this is now the default input mode);
collect additional output dataset; disable usearch -based
pipelines (i.e. UPARSE and QIIME ). |
1.0.2 | Enable support for FASTQs supplied via dataset collections and fix some broken output datasets. |
1.0.1 | Initial version |