- About the Project
- Getting Started
- Usage
- Examples
- Contributing
- License
- Acknowledgements and References
- Contact
Based on 180 Pattern-Matching (PM) Algorithms Analisys the main idea of the project is to create a simple and fast tool to remove fragments of adapters located on FASTQ files. After testing all 180 algorithms utilizing the SMART Tool, we analysed the results and end up with 5 algorithms that had good performance with the approximated pattern length of a adapter (between 8 and 16 nitrogenous bases). QF43 and Sbndmq-4 had the best results, however Sbndmq-4 was slightly better with patterns of 8 nitrogenous bases, ending as our choice for this project. More informations about FAIR and 180 Pattern-Matching Algorithms Analysis can be found at:
The project was built mainly with C++, but some funcionalities are based on python scripts, including the 180 Pattern-Matching Algorithms Analisys present on this repository.
FAIR works with single, both forward/reverse, and interlaced fastq files to identify, trim and remove adapters and low-quality / N bases from sequences. It's possible to choose the quantity of threads during processing, require a Phred-offset quality identification and/or adapter identification. At the end of the execution a new fastq file is created on the directory choosed by the user with the segments of adapters removed and a additional file with the deleted bases. FAIR does not works yet with tar.gz files.
This repository can be built with any C++ compiler. During the conception of the project we used gcc with any major problem. Additionally, Python is necessary for some extra funcionalities.
- gcc
sudo apt-get install gcc
- python
sudo apt-get install python
If you want to execute algorithm evaluation located on utils
some extra Python Frameworks are required, namely: pandas, matplotlib and numpy. Thankfully, you can install them all at once using pip.
pip install -r requirements.txt --user
- Clone the repo
git clone https://github.com/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal.git
- Build with compiler
cd FAIR-Fast-Adapter-Identification-and-Removal
g++ source/main.cpp -o FAIR
Bellow are listed all FAIR avaiable parameters.
Usage: /home/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal [options] -o <output_dir>
Basic options:
-o/--output <output_dir> directory to store all the resulting files (required)
-h/--help prints this usage message
-v/--version prints version
Input data:
-s/--single <filename> file with unpaired reads
-f/--forward <filename> file with forward paired-end reads
-r/--reverse <filename> file with reverse paired-end reads
-i/--interlaced <filename> file with interlaced forward and reverse paired-end reads
Pipeline options:
--only-identify runs only adapter identification (without removal)
--only-remove runs only adapter removal (without identification)
need to set adapter(s) if this option is set
--trim trim ambiguous bases (N) at 5'/3' termini
--trim-quality trim bases at 5'/3' termini with quality scores <= to
--min-quality value
--min-quality <int> minimal quality value to trim
Advanced options:
--adapter <adapter> adapter sequence that will be removed (unpaired reads)
required with --only-remove
--forward-adapter <adapter> adapter sequence that will be removed
in the forward paired-end reads (required with --only-remove)
--reverse-adapter <adapter> adapter sequence that will be removed
in the reverse paired-end reads (required with --only-remove)
-t/--threads <int> number of threads
[default: 4]
--phred-offset <33 or 64> PHRED quality offset in the input reads (33 or 64)
[default: auto-detect]
For more examples, please refer to the Documentation
You can test the program utilizing the samples sample1.fastq
and sample2.fastq
located at data
. The new files are stored on results
. Some common usages are listed bellow.
- Remove Adapters from Single FASTQ File with Adapter and Quality Identification
./FAIR --output results/ --single sample1.fastq
- Remove Adapters from Forward and Reverse FASTQ Files with Adapter and Quality Identification
./FAIR --output results/ --forward sample1.fastq --reverse sample2.fastq
- Remove Adapters from Forward and Reverse FASTQ Files without Adapters Identification
./FAIR --output results/ --forward sample1.fastq --reverse sample2.fastq --only-remove --forward-adapter CCCCCCC --reverse-adapter CCCATCC
- Remove Adapters from Single FASTQ File with Trim, Trim-Quality, Min-Quality, Number of Threads and Phred-Offset
./FAIR --output results/ --single sample1.fastq --trim --trim-quality 90 --min-quality 90 --threads 8 --phread-offset 33
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
- R. S. Boyer and J. S. Moore. A fast string searching algorithm. Commun. ACM,20(10):762–772, 1977.
- Koloud Al-Khamaiseh and ShadiALShagarin, “A Survey of String Matching Algorithms” in Int. Journal of Engineering Research and Applications, IJERA, ISSN: 2248-9622, Vol. 4, Issue 7 (Version 2), July 2014, pp.144-156
- B. Durian, H. Peltola, L. Salmela, and J. Tarhio. Bit-parallel search algorithms forlong patterns. In P. Festa, editor, Symposium on Experimental Algorithms, LNCS6049, 129–140, Springer-Verlag, Berlin, 2010.
- G. Navarro, M. Raffinot, “A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching”, in Proc. of the 9th Annual Symposium on Combinatorial Pattern Matching, No. 1448.
- SMART (String Matching Algorithm Research Tool)
- bio-playground by Brent Perdensen
João V. Canavarro - jvcanavarro@gmail.com
Project Link: https://github.com/jvcanavarro/FAIR-Fast-Adapter-Identification-and-Removal