pre-process illumina reads
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

illumiprocessor Build Status

illumiprocessor is a tool to batch process illumina sequencing reads using the excellent trimmomatic package. The program takes a configuration file that is formatted in Microsoft Windows INI file format (key:value pairs, see the example file).

illumiprocessor will trim adapter contamination from SE and PE illumina reads and is capable of dealing with double-indexed reads and read trimming (example to come shortly). The current version of illumiprocessor uses trimmomatic instead of scythe and sickle (used in v1.x) because we have found the performance of trimmomatic to be better, particularly when dealing with double-indexed illumina reads. However, you may find that running scythe after trimming with illumiprocessor or trimmomatic ensures that every bit of potential adapter contamination is removed.

illumiprocessor is suited to parallel processing in which each set of illumina reads are processed on a separate (physical) compute core. illumiprocessor assumes that all fastq files input to the program represent individuals samples (i.e., that you have merged mulitple files for each read from the same sample by combining fastq.gz files).

Citing Illumiprocessor

If you use illumiprocessor in your work, you can cite the software as follows:

Faircloth, BC. 2013. illumiprocessor: a trimmomatic wrapper for parallel
adapter and quality trimming. http://dx.doi.org/10.6079/J9ILL.

Please be sure also to cite trimmomatic:

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible 
trimmer for Illumina Sequence Data. Bioinformatics.
http://dx.doi.org/10.1093/bioinformatics/btu170.

installation

Illumiprocessor uses trimmomatic, which is a JAVA program, so you need to install JAVA for your platform.

conda

If you are using anaconda or the conda package manager, you can automatically install everything you need by editing ~/.condarc to add the faircloth-lab repository, so that the file looks like:

# channel locations. These override conda defaults, i.e., conda will
# search *only* the channels listed here, in the order given. Use "default"
# to automatically include all default channels.

channels:
  - defaults
  - http://conda.binstar.org/faircloth-lab

Then run:

conda install illumiprocessor

This will install trimmomatic and illumiprocessor.

standard

Ensure that you have installed JAVA. Install trimmomatic. Once those are completed, download the source, then:

python setup.py install

quick-start

To run illumiprocessor, you setup a config file (<path-to-config-file>) like:

[adapters]
i7:AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG
i5:AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

[tag sequences]
BFIDT-030:ATGAGGC
BFIDT-003:AATACTT

[tag map]
F09-44_ATGAGGC:BFIDT-030
F09-96_AATACTT:BFIDT-003

[names]
F09-44_ATGAGGC:F09-44
F09-96_AATACTT:F09-96

Then you run illumiprocessor against the config file using:

illumiprocessor \
    --input <path-to-directory-of-read-files-to-clean> \
    --output <path-to-directory-of-cleaned-reads-to-output> \
    --config <path-to-config-file> \
    --cores 12

This will output a directory containing reads organised using the following structure:

sample1-name/
    adapters.fasta
    raw-reads/
        [symlink to R1]
        [symlink to R2]
    split-adapter-quality-trimmed/
        sample1-name-READ1.fastq.gz
        sample1-name-READ2.fastq.gz
        sample1-name-READ-singleton.fastq.gz
    stats/
        sample1-name-adapter-contam.txt
sample2-name/
    ...
sample3-name

More information

For more information and a more complete description of all of these steps, please see the documentation.