Skip to content
Patrick Douglas edited this page Mar 29, 2019 · 15 revisions

SeqsExtractor

logo

Support, feedback or questions to patrick@ufpa.br

Version 1.0 (Mar 15, 2017) By Patrick Douglas email: patrick@ufpa.br

What Is SeqsExtractor ?

A simple tool to help extraction of:

  • BLASTed sequences using the tabular BLAST file and query sequences, producing a .FASTA file containing only sequences that matched with a specific percentage defined by the user. This software also turns BLAST command line more friendly.

  • Sequences containing microssatellites of a MISA file, producing a .FASTA file with only the sequences that contain the microssatellites. SeqsExtractor allows you to run a MISA search and after extract the sequences, or just to extract the sequences from a MISA file already generated.

  • Sequences from any .FASTA file through a text file containing the IDs of the target sequences.

  • Sequences from Trinity DE pipeline

It works with shell bash. SeqsExtractor is developed for Linux, it should also work with any UNIX operating system with Debian based and was tested under Linux Mint 17.3 xfce and Ubuntu 16.4 LTs.

SeqsExtractor it's also compatible with MAC OSX (See Minimum System Requirements)

Download SeqsExtractor

Click here to download the latest version of SeqsExtractor

Download Example files

The option 8 in the SeqsExtractor main interface allows you to download SeqsExtractor example files, or you can download files by clicking here. To use this option just hit command bellow:

SeqsExtractor -8

Please, read the user manual to get more information, you also can access an example for BLAST analysis using SeqsExtractor

Paper to cite SeqsExtractor:

Avaliable at: https://peerj.com/preprints/3364/

Seqs-Extractor: Automated sequences extraction to reduce tedious manual corrections of large datasets

Patrick D C Pereira, Cleyssian Dias, Mauro A D Melo, Nara G M Magalhães, Cristovam G Diniz, Cristovam W P Diniz

Abstract

The analysis of large numbers of sequences requires the reduction of ambiguities during the analytical work to ensure that the effort will focus only on confirmed sequences. Performing this work automatically may help to minimize potential errors associated with tedious manual correction, allowing more effective results. Basic local alignment search tool (BLAST) seems to be the most widely used sequence analysis program. It is free, but commercial parties enhanced BLAST applications and charge a fee for their uses. There are some tools of public domain that can perform the search of microsatellites in the next generation sequencing (NGS) data, as the microsatellite identification tool (MISA), which has some features to discover microsatellites in large datasets. Here, we developed a basic shell script (BASH script) to be ran under Linux environment that can be used to extract from a sequence dataset only confirmed (BLASTed) sequences from both nucleotide (BLASTN) and protein (BLASTX) databases and extract sequences that contains microsatellites using MISA tool, using a friendly interface and no fees charged. SeqsExtractor is a helpful tool that may enhance the analysis of large datasets in BLAST+ and MISA by minimizing the time of management, reducing potential errors caused by manipulating data and no fees charged. SeqsExtractor is available at https://github.com/patrick-douglas/SeqsExtractor/wiki.

Cite: EndNote; BibTex; RefMan; RefWorks

Clone this wiki locally