Skip to content

Extract sequences using Blast result

Patrick Douglas edited this page Mar 29, 2019 · 1 revision

The option SeqsExtractor-extract-from-blast allow you to only extract your sequences using the BLAST results file (if you already performed a BLAST search), however you will need a tabular BLAST results file (outfmt6)to use this option.

At time, SeqsExtractor only can run with tabular file format. See BLAST+ documentation to get help about tabular format.

USAGE:
Example: SeqsExtractor-extract-from-blast -i query.fa -o /home/user/test -b new-arriv_wint_pre-mig_vs_ncbi_protein.blastx.outfmt6 -p 90-100

Required arguments: 
-i <string> | Query fasta
-o <string> | Output directory
-b <string> | Blast+ outfmt6 file
-p <string  | Pct. of identity to_extract Sequences

To perform the sequences extraction you will need your .FASTA file and the BLAST tabular results.

Example commandline:

SeqsExtractor-extract-from-blast -i M.musculus_NCBI_entire_genome.fasta -o /home/user/test -b new-arriv_wint_pre-mig_vs_ncbi_protein.blastx.outfmt6 -p 90-100
Input FASTA file:

Enter the fasta file to be used as a query

-i /home/me/M.musculus_NCBI_entire_genome.fasta
Output directory to save all results:

Enter the fasta file to be used as a query

-o /home/me/test

Percentage of identity to extract Sequences:

Now you can choose a specific percentage to extract your sequences. The all available options are provided bellow:

10  to get only the sequences that match with 10%	
20  to get only the sequences that match with 20%	
30  to get only the sequences that match with 30%		
40  to get only the sequences that match with 40%		
50  to get only the sequences that match with 50%		
60  to get only the sequences that match with 60%		
70  to get only the sequences that match with 70%		
80  to get only the sequences that match with 80%		
90  to get only the sequences that match with 90%		
100  to get only the sequences that match with 100%
10-100  to get only the sequences that match with 10% to 100% of hits	
20-100  to get only the sequences that match with 20% to 100% of hits	
30-100  to get only the sequences that match with 30% to 100% of hits	
40-100  to get only the sequences that match with 40% to 100% of hits	
50-100  to get only the sequences that match with 50% to 100% of hits	
60-100  to get only the sequences that match with 60% to 100% of hits	
70-100  to get only the sequences that match with 70% to 100% of hits	
80-100  to get only the sequences that match with 80% to 100% of hits	
90-100  to get only the sequences that match with 90% to 100% of hits	

Or type all to no filter and get all sequences the match in the blast search.

Example:

-p 90-100

Will extract the sequences that match 90% to 100% percent of identity


The final screen will indicate the name of files that will be stored in the output directory.

image

Clone this wiki locally