illumiprocessor is a tool to batch process illumina sequencing reads using the excellent trimmomatic package. The program takes a configuration file that is formatted in Microsoft Windows INI file format (key:value pairs, see the example file).
illumiprocessor will trim adapter contamination from SE and PE illumina reads and is capable of dealing with double-indexed reads and read trimming (example to come shortly). The current version of illumiprocessor uses trimmomatic instead of scythe and sickle (used in v1.x) because we have found the performance of trimmomatic to be better, particularly when dealing with double-indexed illumina reads. However, you may find that running scythe after trimming with illumiprocessor or trimmomatic ensures that every bit of potential adapter contamination is removed.
illumiprocessor is suited to parallel processing in which each set of illumina reads are processed on a separate (physical) compute core. illumiprocessor assumes that all fastq files input to the program represent individuals samples (i.e., that you have merged mulitple files for each read from the same sample by combining fastq.gz files).
If you use illumiprocessor in your work, you can cite the software as follows:
Faircloth, BC. 2013. illumiprocessor: a trimmomatic wrapper for parallel adapter and quality trimming. http://dx.doi.org/10.6079/J9ILL.
Please be sure also to cite trimmomatic:
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics. http://dx.doi.org/10.1093/bioinformatics/btu170.
Illumiprocessor uses trimmomatic, which is a JAVA program, so you need to install JAVA for your platform.
# channel locations. These override conda defaults, i.e., conda will # search *only* the channels listed here, in the order given. Use "default" # to automatically include all default channels. channels: - defaults - http://conda.binstar.org/faircloth-lab
conda install illumiprocessor
Ensure that you have installed JAVA. Install trimmomatic. Once those are completed, download the source, then:
python setup.py install
To run illumiprocessor, you setup a config file (
[adapters] i7:AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT [tag sequences] BFIDT-030:ATGAGGC BFIDT-003:AATACTT [tag map] F09-44_ATGAGGC:BFIDT-030 F09-96_AATACTT:BFIDT-003 [names] F09-44_ATGAGGC:F09-44 F09-96_AATACTT:F09-96
Then you run illumiprocessor against the config file using:
illumiprocessor \ --input <path-to-directory-of-read-files-to-clean> \ --output <path-to-directory-of-cleaned-reads-to-output> \ --config <path-to-config-file> \ --cores 12
This will output a directory containing reads organised using the following structure:
sample1-name/ adapters.fasta raw-reads/ [symlink to R1] [symlink to R2] split-adapter-quality-trimmed/ sample1-name-READ1.fastq.gz sample1-name-READ2.fastq.gz sample1-name-READ-singleton.fastq.gz stats/ sample1-name-adapter-contam.txt sample2-name/ ... sample3-name
For more information and a more complete description of all of these steps, please see the documentation.