demuxipy is a program for the parsing, error-correcting, and tracking of DNA reads identified using molecular sequence tags. Specifically demuxipy is for demultiplexing ("demuxing") hierarchically tagged sequence reads - meaning those identified using multiple sequence tags as plate and well markers, respectively. demuxipy differs from previous software/methodological approaches because:
- it demultiplexes `hierarchical reads`_. Hierarchical tagging vastly expands the number of sequence pools that may be mixed during any single next-generation sequencing run
- it corrects sequencing errors within tags using fuzzy string matching along with sequence tags of Levenshtein distance >= X to identify and correct sequencing errors, which are somewhat common on certain next-generation sequencing platforms. The Levenshtein distance differs from other implementations (i,e. Hamming distance) in that the distance represents the number of insertions, deletions and substitutions needed to get from one sequence of characters to another. For additional information see Suggested Readings, below.
- it organizes sequence read data, by tag or other metadata, in a relational database sqlite to ease downstream processing by sequence group(s)
- it can take advantage of multiprocessing for the parallel parsing and error correcting of sequence reads to reduce overall processing time
- it allows the user to specify linkers and hierarchical tag combinations in a flexible and easily-edited configuration file. Once the appropriate sequence tags are added to the file, it is only a matter of providing the combinations used within a particular run.
- it intelligently creates regular expression groups based on tag combinations so that only those combinations within a given run are search for.
- it can search for potential concatemers within sequence reads
tar -xzvf ~/your/download/location/demuxipy-*.tar.gz python setup.py install
pip install demuxipy
>>> import demuxipy >>> demuxipy.test()
Alternatively, you can run::
python setup.py test
Or, you can run::
To run demuxipy, prepare a valid configuration file and run::
Using your data
After you have demultiplexed your data, you probably want to be able to use it, rather than just keep it within the database. You have several options.
- we provide helper scripts to pull data out of the database and write those data to appropriate output files
- the database stores an object for each read containing the read itself and the read's quality, and tag data. You can write a query to grab the objects your looking for, unpickle those, and use them directly in whatever code/pipeline you'd like.
Frequently asked questions
Please see the FAQ.
If you'd like to know more about the history of sequence tagging, sequence tags, error correction, and error-correcting codes, the following should be of interest:
- Meyer M, Stenzel U, Myles S, Prüfer K, Hofreiter M (2007) Targeted high-throughput sequencing of tagged nucleic acid samples. Nucleic Acids Research 35(15):e97
- Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R (2008) Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature Methods 5 (3):235-237
- Binladen J, Gilbert MT, Bollback JP, Panitz F, Bendixen C, Nielsen R, Willerslev E (2009) The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. BMC Bioinformatics 10:362
- Levenshtein VI (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10:707–10
- Hamming RW (1950) Error detecting and error correcting codes. Bell System Technical Journal 26 (2):147–160