Skip to content
martinghunt edited this page Sep 25, 2015 · 4 revisions

Task: get_dnaa

This downloads a set of genes from uniprot, by default searching for dnaA genes. It filters by checking for dnaA (or any other regex supplid by the user) in the name, for sequence length, and only takes one sequence per species. The remaining amino acid sequences are reverse translated into nucleotide sequences (so that they can be used with promer when running fixstart). This generates the default set of dnaA genes used by Circlator.

Usage and options

The general usage is

circlator get_dnaa [options] <outprefix>

There are the following options:

  • --min_length INT: minimum length in amino acids. Default: 333.
  • --max_length INT: maximum length in amino acids. Default: 500.
  • --uniprot_search STRING: Uniprot search term. Default: dnaa.
  • --name_re STRING: Each sequence name must match this regular expression. Default: dnaa.
  • --name_re_case_sensitive: Do a case-sensitive match to regular expression given by --name_re. Default is to ignore case.

Output files

The FASTA file of genes is called outprefix.nucleotides.fa. The amino acid FASTA file downloaded from uniprot is called outprefix.aa.fa. A log file called outprefix.log is written that has information on why sequences were removed. It is tab-delimited with two columns. The first column gives the reason for removing the sequence and the second column has the sequence name.