Skip to content

Commit

Permalink
documentation etc.
Browse files Browse the repository at this point in the history
  • Loading branch information
ctb committed Jun 5, 2016
1 parent a1bbf40 commit f7b3bcd
Show file tree
Hide file tree
Showing 8 changed files with 113 additions and 16 deletions.
23 changes: 19 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ Compute MinHash signatures for DNA sequences.

Usage:

./sourmash compute *.fq.gz
./sourmash compare *.sig -o distances
./plot-comparison.py distances
sourmash compute *.fq.gz
sourmash compare *.sig -o distances
sourmash plot distances

We have demo notebooks on binder:
We have demo notebooks on binder that you can interact with:

[![Binder](http://mybinder.org/badge.svg)](http://mybinder.org/repo/dib-lab/sourmash)

Expand All @@ -21,6 +21,12 @@ We have demo notebooks on binder:
The name is a riff off of [Mash](https://github.com/marbl/Mash), combined with
my love of whiskey. (Sour mash is used in making whiskey.)

Authors: [C. Titus Brown](mailto:titus@idyll.org) and Luiz C. Irber, Jr.

sourmash is a product of the
[Lab for Data-Intensive Biology](http://ivory.idyll.org/lab/) at the
[UC Davis School of Veterinary Medicine](http://www.vetmed.ucdavis.edu).

## Installation

You can do:
Expand All @@ -36,8 +42,17 @@ The comparison code (`sourmash compare`) uses numpy, and the plotting
code uses matplotlib and scipy, but most of the code is usable without
these.

## Support

Please ask questions and files issues
[on Github](https://github.com/dib-lab/sourmash/issues). The developers
sometimes hang out [on gitter](https://gitter.im/dib-lab/khmer).

## Development

Development happens on github at
[dib-lab/sourmash](https://github.com/dib-lab/sourmash).

`sourmash` is the main command-line entry point; run it for help.

`sourmash_lib/` contains the library code.
Expand Down
29 changes: 29 additions & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
=======================
``sourmash`` Python API
=======================

The primary programmatic way of interacting with ``sourmash`` is via
its Python API. (The core MinHash Python API closely mirrors the
underlying C++ code, but for now this is undocumented.)

.. contents::
:depth: 2

``Estimators``: basic MinHash sketch functionality
==================================================

.. automodule:: sourmash_lib
:members:

``SourmashSignature``: save and load MinHash sketches in YAML
=============================================================

.. automodule:: sourmash_lib.signature
:members:

``sourmash_lib.fig``: make plots and figures
============================================

.. automodule:: sourmash_lib.fig
:members:

3 changes: 3 additions & 0 deletions doc/command-line.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ sourmash uses a subcommand syntax, so all commands start with
``sourmash`` followed by a subcommand specifying the action to be
taken.

.. contents::
:depth: 3

An example
==========

Expand Down
6 changes: 3 additions & 3 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('..'))

# -- General configuration ------------------------------------------------

Expand Down Expand Up @@ -52,8 +52,8 @@

# General information about the project.
project = 'sourmash'
copyright = '2016, C. Titus Brown'
author = 'C. Titus Brown'
copyright = '2016, C. Titus Brown and Luiz Irber'
author = 'C. Titus Brown and Luiz Irber'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
Expand Down
37 changes: 35 additions & 2 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,48 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to sourmash's documentation!
====================================
Welcome to sourmash!
====================

sourmash is a command-line tool and Python library for computing
`MinHash sketches <https://en.wikipedia.org/wiki/MinHash>`__ from DNA
sequences, compare them to each other, and plot the results. This
allows you to estimate sequence similarity quickly and accurately.

Please see the `mash <http://mash.readthedocs.io/en/latest/>`__
software and the `mash paper (Ondov et al., 2016)
<http://biorxiv.org/content/early/2015/10/26/029827>`__ for background
information on how and why MinHash sketches work.

In brief,

* ``sourmash`` provides command line utilities for creating, comparing,
and searching MinHash sketches, as well as plotting and clustering
sketches by distance (see `the command-line docs <command-line.html>`__).

* ``sourmash`` supports saving, loading, and communication of MinHash
sketches via `YAML <http://yaml.org/>`__, a ~human-readable & editable
format.

* ``sourmash`` also has a simple Python API for interacting with sketches,
including support for online updating and querying of sketches
(see `the API docs <api.html>`__).

* ``sourmash`` isn't terribly slow, and relies on an underlying CPython
module.

* ``sourmash`` is developed `on GitHub
<https://github.com/dib-lab/sourmash>`__ and is freely and openly
available under the BSD 3-clause license.

Contents:
---------

.. toctree::
:maxdepth: 2

command-line
api


Indices and tables
Expand Down
13 changes: 12 additions & 1 deletion sourmash_lib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,19 @@
class Estimators(object):
"""
A simple bottom n-sketch MinHash implementation.
Usage::
E = Estimators(n=1000, ksize=31)
E.add_sequence(dna)
...
E.jaccard(other_E)
``Estimator`` supports the pickle protocol.
"""

def __init__(self, n=None, ksize=None, protein=False):
"Create a new MinHash estimator with size n and k-mer size ksize."
if n is None:
raise ValueError("n is required")
if ksize is None:
Expand Down Expand Up @@ -45,14 +55,15 @@ def __eq__(self, other):
return self.__getstate__() == other.__getstate__()

def add(self, kmer):
"Add kmer into sketch, keeping sketch sorted."
"Add kmer into sketch."
self.mh.add_sequence(kmer)

def add_sequence(self, seq, force=False):
"Sanitize and add a sequence to the sketch."
self.mh.add_sequence(seq, force)

def jaccard(self, other):
"Calculate Jaccard index of two sketches."
return self.mh.compare(other.mh)
similarity = jaccard

Expand Down
13 changes: 9 additions & 4 deletions sourmash_lib/fig.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,30 @@
#! /usr/bin/env python
"""
Plot things associated with the distance matrix+labels output by
'sourmash compare'.
Make plots using the distance matrix+labels output by ``sourmash compare``.
"""
try:
import numpy
import scipy
import pylab
import scipy.cluster.hierarchy as sch
except (RuntimeError, ImportError): # for tests, ignore.
except (RuntimeError, ImportError):
pass


def load_matrix_and_labels(basefile):
"""Load the comparison matrix and associated labels.
Returns a square numpy matrix & list of labels.
"""
D = numpy.load(open(basefile, 'rb'))
labeltext = [x.strip() for x in open(basefile + '.labels.txt')]
return (D, labeltext)


def plot_composite_matrix(D, labeltext, show_labels=True, show_indices=True,
vmax=1.0, vmin=0.0):
"""Build a composite plot showing dendrogram + distance matrix/heatmap.
Returns a matplotlib figure."""
if show_labels:
show_indices = True

Expand Down
5 changes: 3 additions & 2 deletions sourmash_lib/signature.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

SIGNATURE_VERSION=0.4


class SourmashSignature(object):
"Main class for signature information."

Expand Down Expand Up @@ -69,7 +70,7 @@ def similarity(self, other):


def load_signatures(data, select_ksize=None, ignore_md5sum=False):
"""Load a YAML file with signatures into classes.
"""Load a YAML string with signatures into classes.
Returns list of SourmashSignature objects.
"""
Expand Down Expand Up @@ -131,7 +132,7 @@ def _load_one_signature(sketch, email, name, filename, ignore_md5sum=False):


def save_signatures(siglist, fp=None):
"""Save multiple signatures into a YAML string."""
"Save multiple signatures into a YAML string (or into file handle 'fp')"
top_records = {}
for sig in siglist:
email, name, filename, sketch = sig.save()
Expand Down

0 comments on commit f7b3bcd

Please sign in to comment.