Similarity Search Flags

These are the flags specific to Similarity Searching using DIAMOND. These will be used via the command line (denoted CMD) or ini file (denoted INI).

-d / --database [CMD]

Specify any number of FASTA formatted databases you would like to configure for EnTAP
Not necessary if you already have DIAMOND configured databases (.dmnd)

--data-type [INI]

Specify which EnTAP database you'd like to use for execution (UniProt, Gene Ontology, and Taxonomy lookups)
- 1. Binary Database (default) - This will be much quicker and is recommended
- 1. SQL Database - Slower although will be more easily compatible with every system
This can be flagged multiple times (ex: - - data-type 0 - - data-type 1)
I would not use this flag unless you are experiencing issues with the EnTAP Binary Database

-c / --contam [multi-string] [INI]

Specify :ref:`contaminant<tax-label>` level of filtering

Multiple contaminants can be selected through repeated flags

--taxon [string] [INI]

This flag will allow for :ref:`taxonomic<tax-label>` 'favoring' of hits that are closer to your target species or lineage. Any lineage can be used as referenced by the NCBI Taxonomic database, such as genus, phylum, or species.

Format must replace all spaces with underscores ('_') as follows: "- -taxon homo_sapiens" or "- -taxon primates"

--level [multi-integer] [INI]

Specify Gene Ontology levels you would like to normalize to (ex: 0, 1, 2, 3, 4)

A level of '0' indicates all levels will be printed

Any amount of these flags can be used

Default: 1

More information at: http://geneontology.org/page/ontology-structure

-e [scientific] [INI]

Specify minimum E-value cutoff for similarity searching (scientific notation)

Default: 10E-5

--tcoverage [decimal] [INI]

Specify minimum target coverage for similarity searching

Default: 50%

--qcoverage [decimal] [INI]

Specify minimum query coverage for similarity searching

Default: 50%

--uninformative [string] [INI]

Path to a list of terms you would like to be deemed "uninformative"

The file must be formatted with one term on each line of the file

Example (defaults):

conserved

predicted

unnamed

hypothetical

putative

unidentified

uncharacterized

unknown

uncultured

uninformative

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

similarity_search_flags.rst

similarity_search_flags.rst

Similarity Search Flags

-d / --database [CMD]

--data-type [INI]

-c / --contam [multi-string] [INI]

--taxon [string] [INI]

--level [multi-integer] [INI]

-e [scientific] [INI]

--tcoverage [decimal] [INI]

--qcoverage [decimal] [INI]

--uninformative [string] [INI]

Files

similarity_search_flags.rst

Latest commit

History

similarity_search_flags.rst

File metadata and controls

Similarity Search Flags

-d / --database [CMD]

--data-type [INI]

-c / --contam [multi-string] [INI]

--taxon [string] [INI]

--level [multi-integer] [INI]

-e [scientific] [INI]

--tcoverage [decimal] [INI]

--qcoverage [decimal] [INI]

--uninformative [string] [INI]