illumiprocessor is a tool to batch process illumina sequencing reads using the excellent trimmomatic package. The program takes a configuration file that is formatted in Microsoft Windows INI file format (key:value pairs, see the example file).
illumiprocessor will trim adapter contamination from SE and PE illumina reads and is capable of dealing with double-indexed reads and read trimming (example to come shortly). The current version of illumiprocessor uses trimmomatic instead of scythe and sickle (used in v1.x) because we have found the performance of trimmomatic to be better, particularly when dealing with double-indexed illumina reads. However, you may find that running scythe after trimming with illumiprocessor or trimmomatic ensures that every bit of potential adapter contamination is removed.
illumiprocessor is suited to parallel processing in which each set of illumina reads are processed on a separate (physical) compute core. illumiprocessor assumes that all fastq files input to the program represent individuals samples (i.e., that you have merged mulitple files for each read from the same sample by combining fastq.gz files).
If you use illumiprocessor in your work, you can cite the software as follows:
Faircloth, BC. 2013. illumiprocessor: a trimmomatic wrapper for parallel
adapter and quality trimming. http://dx.doi.org/10.6079/J9ILL.
Please be sure also to cite trimmomatic:
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible
trimmer for Illumina Sequence Data. Bioinformatics.
http://dx.doi.org/10.1093/bioinformatics/btu170.
Illumiprocessor uses trimmomatic, which is a JAVA program, so you need to install JAVA for your platform.
If you are using anaconda or the conda package manager, you can
automatically install everything you need by editing ~/.condarc
to add the
faircloth-lab
repository, so that the file looks like:
# channel locations. These override conda defaults, i.e., conda will
# search *only* the channels listed here, in the order given. Use "default"
# to automatically include all default channels.
channels:
- defaults
- http://conda.binstar.org/faircloth-lab
Then run:
conda install illumiprocessor
This will install trimmomatic and illumiprocessor.
Ensure that you have installed JAVA. Install trimmomatic. Once those are completed, download the source, then:
python setup.py install
To run illumiprocessor, you setup a config file (<path-to-config-file>
)
like:
[adapters]
i7:AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
[tag sequences]
BFIDT-030:ATGAGGC
BFIDT-003:AATACTT
[tag map]
F09-44_ATGAGGC:BFIDT-030
F09-96_AATACTT:BFIDT-003
[names]
F09-44_ATGAGGC:F09-44
F09-96_AATACTT:F09-96
Then you run illumiprocessor against the config file using:
illumiprocessor \
--input <path-to-directory-of-read-files-to-clean> \
--output <path-to-directory-of-cleaned-reads-to-output> \
--config <path-to-config-file> \
--cores 12
This will output a directory containing reads organised using the following structure:
sample1-name/
adapters.fasta
raw-reads/
[symlink to R1]
[symlink to R2]
split-adapter-quality-trimmed/
sample1-name-READ1.fastq.gz
sample1-name-READ2.fastq.gz
sample1-name-READ-singleton.fastq.gz
stats/
sample1-name-adapter-contam.txt
sample2-name/
...
sample3-name
For more information and a more complete description of all of these steps, please see the documentation.