In space, no one can hear you miao. And graphs!
Clone or download
ctb Merge pull request #210 from spacegraphcats/benchmarking
[WIP] scripts that isolate interesting algorithms for benchmarking purposes
Latest commit 3afcc82 Nov 5, 2018

README.md

spacegraphcats Build Status codecov DOI

Explore large, annoying graphs using hierarchies of dominating sets - because in space, no one can hear you miao!

This is a collaboration between the Theory In Practice lab at NC State and the Lab for Data Intensive Biology at UC Davis, generously supported by the Moore Foundation's Data Driven Discovery Initiative.

spacegraphcats graph

Installation and execution quickstart

See installation instructions and the run guide.

For help or support with this software, please file an issue on GitHub. Thank you!

Notable dependencies

spacegraphcats uses code from BBHash, a C++ library for building minimal perfect hash functions (Guillaume Rizk, Antoine Limasset, Rayan Chikhi; see Limasset et al., 2017, arXiv, as wrapped by pybbhash.

spacegraphcats also uses functionality from khmer and sourmash.

Citation information

This is pre-publication code; a manuscript is in preparation. Please contact the authors for the current citation information if you wish to use it and cite it.

Pointers to interesting code

Interesting algorithms

The rdomset code for efficently calculating a dominating set of a graph at a given radius R is in spacegraphcats/catlas/rdomset.py.

The graph denoising code for removing low-abundance pendants from BCALM cDBGs is in function contract_degree_two in cdbg/bcalm_to_gxt.py.

Part of the indexPieces code for indexing cDBG nodes by dominating nodes is index/index_contigs_by_kmer.py. The remainder is implemented in search, below.

The search code for extracting query neighborhoods is in search/extract_nodes_by_query.py; see especially the call to kmer_idx.get_match_counts(...).

Interesting library functionality

Code for indexing large FASTQ/FASTA read files by cDBG unitig, and extracting the reads corresponding to individual unitigs from BGZF files, is available in cdbg/label_cdbg.py and search/search_utils.py, get_reads_by_cdbg, respectively.