Genbank Mutation Locator: give it a mutation and it will tell you the location in the reference genome (and vice versa)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



gemucator is short for "Genbank Mutation Locator". It is a simple Python3 class for incorporating into tools that, if you give it a mutation, it will tell its location in the reference genome (and vice versa). The gemucator class accepts a path to a genbank file; since I am working with M. tuberculosis this is the H37rV genbank file by default, but any genbank file should work.

The package comes with a simple script called that shows how it works. All these examples are for TB.

First, let's see what happens when we give it an amino acid mutation (which has to be, be definition, in the coding sequence of a gene).

> --mutation rpoB_S450L
761153 t
761154 c
761155 g

It returns three rows, since there are three bases in the triplet, each with the position in the H37rV reference genome and the reference base. Note that the code is very defensive and checks that tcg is a Serine, which happily it is. If we get the mutation wrong, then the code will stop and catch fire.

> --mutation rpoB_K450L
Traceback (most recent call last):
  File "/Users/fowler/Library/Python/3.5/bin/", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/fowler/packages/gemucator/bin/", line 14, in <module>
  File "/Users/fowler/packages/gemucator/gemucator/", line 127, in locate_mutation
    assert before==bases.translate(), "wildtype amino acid specified in mutation does not match the "+self.genbank_file+" genbank file"
AssertionError: wildtype amino acid specified in mutation does not match the config/H37rV.gbk genbank file

Now we can go the other way as well.

> --location 761153
> --location 761154
> --location 761155

It also handles promoter (nucleotide) mutations. e.g.

> --mutation pncA_t-12c
2289252 t

Now a single row is returned. Again the code will check that what you give it matches the genbank file! Likewise, it checks that you are giving it a nucleotide.

> --mutation pncA_x-12c
Traceback (most recent call last):
  File "/Users/fowler/Library/Python/3.5/bin/", line 6, in <module>
    exec(compile(open(__file__).read(), __file__, 'exec'))
  File "/Users/fowler/packages/gemucator/bin/", line 14, in <module>
  File "/Users/fowler/packages/gemucator/gemucator/", line 61, in locate_mutation
    assert before in ['c','t','g','a'], before+" is not a nucleotide!"
AssertionError: x is not a nucleotide!

Note that mutation->location->mutation is not uniquely defined for some 'promoters' i.e. the promoter for gene X may lie within the coding region of gene Y which makes assigning it as a CDS mutation or a PROM mutation difficult.

Finally, it will parse insertions and deletions as long as they conform to the format like in the example below.

> --mutation rpoB_1300_ins_*
761106 t

This means an insertion (ins) of any length (*) at nucleotide 1300 in the coding sequence of the rpoB gene. You can replace the wildcard with a positive integer to be specific about the number of bases inserted (e.g. for a frame shift). Likewise, for a deletion replace ins with del.


First clone the repository to your local machine

> git clone

Now enter the directory and install

> cd gemucator
> python3 install --user

The --user flag will install the python package in the $HOME directory of this user and means you don't need the root password etc. The only dependency is BioPython version 1.70 or newer and the above process will download and install it if it cannot find BioPython on your machine. Now the script should be in your $PATH so try typing one of the examples above!