Locate, error-correct, trim, and organize (hierarchically) sequence-tagged next-generation DNA reads
Python C Shell
Switch branches/tags
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
demuxipy
docs
examples
.gitignore
.gitmodules
HOWTO.md
LICENSE.md
README.rst
distribute_setup.py
setup.py

README.rst

Introduction

demuxipy is a program for the parsing, error-correcting, and tracking of DNA reads identified using molecular sequence tags. Specifically demuxipy is for demultiplexing ("demuxing") hierarchically tagged sequence reads - meaning those identified using multiple sequence tags as plate and well markers, respectively. demuxipy differs from previous software/methodological approaches because:

  • it demultiplexes `hierarchical reads`_. Hierarchical tagging vastly expands the number of sequence pools that may be mixed during any single next-generation sequencing run
  • it corrects sequencing errors within tags using fuzzy string matching along with sequence tags of Levenshtein distance >= X to identify and correct sequencing errors, which are somewhat common on certain next-generation sequencing platforms. The Levenshtein distance differs from other implementations (i,e. Hamming distance) in that the distance represents the number of insertions, deletions and substitutions needed to get from one sequence of characters to another. For additional information see Suggested Readings, below.
  • it organizes sequence read data, by tag or other metadata, in a relational database sqlite to ease downstream processing by sequence group(s)
  • it can take advantage of multiprocessing for the parallel parsing and error correcting of sequence reads to reduce overall processing time
  • it allows the user to specify linkers and hierarchical tag combinations in a flexible and easily-edited configuration file. Once the appropriate sequence tags are added to the file, it is only a matter of providing the combinations used within a particular run.
  • it intelligently creates regular expression groups based on tag combinations so that only those combinations within a given run are search for.
  • it can search for potential concatemers within sequence reads

Dependencies

Installation

demuxipy requires numpy_ and seqtools, as mentioned above. After installing these dependencies, to install demuxipy:

  • from source:

    tar -xzvf ~/your/download/location/demuxipy-*.tar.gz
    python setup.py install
    
  • using easy_install:

    easy_install demuxipy
    
  • using pip:

    pip install demuxipy
    

Tests

To run the unit tests associated with demuxipy, you have several options. If you have installed python-nose, then you can do the following:

>>> import demuxipy
>>> demuxipy.test()

Alternatively, you can run::

python setup.py test

Or, you can run::

python demuxipy/tests/run.py

If you would like to run the tests for seqtools, please see the README file for that library.

Sequence tagging

We have written a software package to help you design edit-distance sequence tags named edittag. edittag is available from github and pypi.

Running demuxi.py

To run demuxipy, prepare a valid configuration file and run::

demuxi.py my_configuration_file.conf

Using your data

After you have demultiplexed your data, you probably want to be able to use it, rather than just keep it within the database. You have several options.

  • we provide helper scripts to pull data out of the database and write those data to appropriate output files
  • the database stores an object for each read containing the read itself and the read's quality, and tag data. You can write a query to grab the objects your looking for, unpickle those, and use them directly in whatever code/pipeline you'd like.

Frequently asked questions

Please see the FAQ.

Suggested reading

If you'd like to know more about the history of sequence tagging, sequence tags, error correction, and error-correcting codes, the following should be of interest: