A toolset for high throughput sequence analysis using a streaming approach facilitated by Linux pipes.
Please see the web page for install and usage instructions: https://s4hts.github.io/HTStream/
If you prefer to install via Docker image: https://hub.docker.com/r/dzs74/htstream.
If you encounter any bugs or have suggestions for improvement, please post them to: https://github.com/s4hts/HTStream/issues
MultiQC support is in development. Check out https://github.com/bnjenner/MultiQC for progress on HTML reports from HTStream log files!
Thanks for trying HTStream!
Until we get a publication for HTStream, please cite this GitHub repository. Additionally, an old version of SuperDeduper with the same algorithm is no longer maintained but was presented/published with this citation:
Petersen, Kristen R., David A. Streett, Alida T. Gerritsen, Samuel S. Hunter, and Matthew L. Settles. "Super deduper, fast PCR duplicate detection in fastq files." In Proceedings of the 6th ACM Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 491-492. 2015.
Additional information and a tutorial on building an example pipeline can be found here: