Skip to content

Latest commit

 

History

History
69 lines (56 loc) · 2.73 KB

similarity_search_flags.rst

File metadata and controls

69 lines (56 loc) · 2.73 KB

Similarity Search Flags

These are the flags specific to Similarity Searching using DIAMOND. These will be used via the command line (denoted CMD) or ini file (denoted INI).

-d / --database [CMD]

  • Specify any number of FASTA formatted databases you would like to configure for EnTAP
  • Not necessary if you already have DIAMOND configured databases (.dmnd)

--data-type [INI]

  • Specify which EnTAP database you'd like to use for execution (UniProt, Gene Ontology, and Taxonomy lookups)

      1. Binary Database (default) - This will be much quicker and is recommended
      1. SQL Database - Slower although will be more easily compatible with every system
  • This can be flagged multiple times (ex: - - data-type 0 - - data-type 1)

  • I would not use this flag unless you are experiencing issues with the EnTAP Binary Database

-c / --contam [multi-string] [INI]

--taxon [string] [INI]

  • This flag will allow for :ref:`taxonomic<tax-label>` 'favoring' of hits that are closer to your target species or lineage. Any lineage can be used as referenced by the NCBI Taxonomic database, such as genus, phylum, or species.
  • Format must replace all spaces with underscores ('_') as follows: "- -taxon homo_sapiens" or "- -taxon primates"

--level [multi-integer] [INI]

  • Specify Gene Ontology levels you would like to normalize to (ex: 0, 1, 2, 3, 4)
  • A level of '0' indicates all levels will be printed
  • Any amount of these flags can be used
  • Default: 1
  • More information at: http://geneontology.org/page/ontology-structure

-e [scientific] [INI]

  • Specify minimum E-value cutoff for similarity searching (scientific notation)
  • Default: 10E-5

--tcoverage [decimal] [INI]

  • Specify minimum target coverage for similarity searching
  • Default: 50%

--qcoverage [decimal] [INI]

  • Specify minimum query coverage for similarity searching
  • Default: 50%

--uninformative [string] [INI]

  • Path to a list of terms you would like to be deemed "uninformative"

  • The file must be formatted with one term on each line of the file

  • Example (defaults):

    • conserved
    • predicted
    • unnamed
    • hypothetical
    • putative
    • unidentified
    • uncharacterized
    • unknown
    • uncultured
    • uninformative