Skip to content

Latest commit

 

History

History
32 lines (27 loc) · 909 Bytes

paper.md

File metadata and controls

32 lines (27 loc) · 909 Bytes
title tags authors date bibliography
sourmash: a library for MinHash sketching of DNA
MinHash
k-mers
Python
name orcid affiliation
C. Titus Brown
0000-0001-6001-2677
University of California, Davis
name orcid affiliation
Luiz Irber
0000-0003-4371-9659
University of California, Davis
13 Sep 2016
paper.bib

Summary

sourmash is a toolbox for creating, comparing, and manipulating MinHash sketches of genomic data.

MinHash sketches provide a lightweight way to store "signatures" of large DNA or RNA sequence collections, and then compare or search them using a Jaccard index. MinHash sketches can be used to identify samples, find similar samples, identify data sets with shared sequences, and build phylogenetic trees [@ondov2015fast].

sourmash provides a command line script, a Python library, and a CPython module for MinHash sketches.

References