System-3 N-Label Editor - automatically replaces ambiguous basecalls in AB1 tracefiles (PhD Thesis Project)
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


System-3 N-Label Editor (S3)

Created:        2017-09-17
Author:         Eddie Ma

This is the trained network for performing automated N-Label replacements in
Applied Biosystems (AB1) tracefiles from Sanger Sequencing.

Basecalls emitted using the KB algorithm have an associated quality value --
those basecalls marked with a QV < 20 indicate a p(error) > 1%.

This software reads ab1 files, and replaces a subset of the N-labels with
discrete basecalls only if its own internal estimation of error is below 0.8%;

In my validation, this results in an observed error rate below 1%;
On average, validation data contained 14 N-labels per tracefile,
of which seven N-labels are recovered.
The other seven generally produce rates of error higher than 1%,
and are usually ignored by S3.

The training and validation of this system is described in:

This software was developed, tested, and evaluated for downstream bioinformatics
impact and DNA Barcoding applications in my PhD thesis.


-- Requires: Python 2.7 with Cython 0.27
-- Do "make" to build the c-based Cython modules.


$ python someAbiFile.ab1

== OUTPUT ==

A json file is printed to stdout, with the following structure:

        'build'     :   "2015-09-17",
        'version'   :   "3.50",
        'comment'   :   "",
        'authors'   :   "Eddie YT Ma, Sujeevan Ratnasingham, Stefan C Kremer",
        'qv_convert':   20,     # basecalls below QV 20 are treated as N-labels
        'threshold' :   0.008,  # only accept edits when error is at most 0.8%
        'ibase'     :[], # base index corresponding to an N-Label editing event
        'ploc'      :[], # editing event AB1 tracefile peak location
        'recall'    :'', # a possible base label replacement
        'votes'     :'', # ensemble system votes
        'lzhao'     :[], # mean conditional variance value
        'perror'    :[], # predicted error computed from mean variance
        'accept'    :'', # accept replacement? [Y]es|[N]o
    'KB'            :"", # the original KB basecaller sequence
    'ge20'          :"", # KB basecaller sequence, bases with QV < 20 as N
    'sys3'          :"", # resulting System-3 N-label Editor sequence