-
Notifications
You must be signed in to change notification settings - Fork 0
HIgh Throughput Microbial ANalysis
License
fangly/hitman
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
HiTMAn - High Throughput Microbial Analysis Hitman is a pipeline for analysing 16S rRNA amplicon datasets. The default values make it suitable for bacterial and archaeal profiling in 454 and Illumina datasets. INSTALLATION Please install these dependencies first: perl Getopt::Euclid perl module IPC::System::Simple perl module bpipe 0.9.8.5 pigz 2.1.6 sff2fastq 0.8.0 pear 0.9 fastx_toolkit 0.0.13.2 trimmomatic 0.32 ea-utils 1.1.2 acacia 1.50 usearch v7 emboss 6.6.0 bio-community 0.001005 bioperl Then add the ./scripts/ and ./contrib/ folder into your PATH environment variable. DESCRIPTION Hitman is a bioinformatic workflow based around the UPARSE methodology and composed using bpipe. In brief, Hitman: 1/ demultiplexes libraries using EA-utils' fastq-multx tool 2/ joins read pairs with PEAR, but keeps the forward read when pairs cannot be joined 3/ clips sequencing adapters using TRIMMOMATIC 4/ truncates the 3' end of sequences at the first residue below a threshold quality Q value using TRIMMOMATIC 5/ trims the 3' end of all sequences to a target length L using TRIMMOMATIC, discarding all smaller sequences 6/ removes sequences exceeding the maximum number of expected errors E using USEARCH's fastq_filter 7/ uses USEARCH's cluster_otus to form operational taxonomic units (OTUs) from high-fidelity sequences (stringent QA in steps 4 and 6) that are sorted by decreasing abundance, occur at least twice in the dataset and have at least O% similarity 8/ discards chimeric OTUs in a reference-independent using USEARCH's cluster_otus, and in a reference-based fashion using UCHIME with database C 9/ assigns regular-fidelity sequences (less stringent QA in steps 4 and 6) to each OTU using USEARCH's usearch_global 10/ formats the results in BIOM format using Bio-Community's bc_convert_files 11/ gives a taxonomic assignment to each OTU by globally aligning their reference sequences against a database T of reference sequences trimmed to the target region (keeping only the best-matching alignment with at least I% identity) using USEARCH's usearch_global 12/ removes OTUs belonging to specific taxa W using Bio-Community's bc_manage_samples 13/ rarefies the microbiome profiles at the given depth D with Bio-Community's bc_accumulate 14/ corrects gene-copy number (GCN) bias using CopyRighter. Many of these steps are optional or accept user-specified parameters. The output is a BIOM file containing OTU-level, rarefied, GCN-corrected microbial profiles with taxonomic affiliations. RUNNING HITMAN Three types of input files are necessary: 1/ SFF or FASTQ files containing regular or paired-end amplicon reads (can be gzipped) 2/ FASTA file containing the PCR primers used 3/ If the sequencing run was 454-multiplexed, a 2-column tabular mapping files indicating the multiplex identifiers (MIDs) used for each sample The pipeline itself comprises three steps: 1/ hitman_hire: Collect amplicon reads to analyze from SFF/FASTQ files. 2/ hitman_deploy: Quality-filter and trim the sequences. 3/ hitman_execute: Produce a rarefied OTU table with taxonomic assignments. Run these scripts with the --man flag to learn more about what they do. A typical 454 GS-FLX Ti processing looks like this: hitman_hire --dir 01-hire/ --input Gasket67.sff --mapping Gasket67.mapping ln -s 01-hire/*cat_files.fastq seqs.fastq hitman_deploy --dir 02-deploy/ --input seqs.fastq ln -s 02-deploy/*trunc_seqs.fna seqs_qa.fna hitman_execute --dir 03-execute/ --input seqs_qa.fna --primers primers.fna For typical Illumina MiSeq processing (with paired-end reads and no mapping file), we can skip the Acacia homopolymer denoising step and increase rarefaction depth from 1,000 (by default) to 10,000 reads: hitman_hire --dir 01-hire/ --five Sample1.R1.fasta.gz Sample2.R1.fasta.gz --three Sample1.R2.fasta.gz Sample2.R2.fasta.gz ln -s 01-hire/*cat_files.fastq seqs.fastq hitman_deploy --dir 02-deploy/ --input seqs.fastq -p SKIP_DENOISING=1 -p LIB_ADAPTERS=/path/to/trimmomatic/adapters/NexteraPE-PE.fa ln -s 02-deploy/*trunc_seqs.fna seqs_qa.fna hitman_execute --dir 03-execute/ --input seqs_qa.fna --primers primers.fna -p RAREFACTION_DEPTH=10000 If you are feeling comfortable with the process, though, I advise running a four-step process: 1/ hitman_hire (as usual) 2/ hitman_deploy, using stringent quality cutoffs (these reads will be used to determine OTU representative sequences only) 3/ hitman_deploy, using regular quality cutoffs (these reads will be mapped to high-quality OTUs) 4/ hitman_execute, using --input and --input2 to pass high- and regular- quality reads Note: Hitman relies quite heavily on file names and extensions to work. Make sure your files are named properly. COPYRIGHT Copyright 2012-2014 Florent ANGLY <florent.angly@gmail.com> Hitman is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. Hitman is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with Hitman. If not, see <http://www.gnu.org/licenses/>.
About
HIgh Throughput Microbial ANalysis
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published