-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sourmash-utils repo? #201
Comments
+1 sourmash-utils repo |
I think |
OK, I went ahead and create dib-lab/sourmash-utils, and a PR for adding the As an example, I set up the
|
As sourmash expands to consume many (but not all!) things, we are building many a utility script. For example,
subtract.py —
https://gist.github.com/ctb/a3aad5020039b014b26a4d7c2b097465
some hash-clustering-fu that we are applying to TARA and soil --
https://github.com/ctb/tara-sourmash/blob/master/sourmash-sigs/abundtrim/README.md
downsampling of —scaled signatures --
https://github.com/ctb/tara-sourmash/blob/master/pick-k-and-downsample.py
tetramer nucleotide analysis:
https://github.com/ngs-docs/2017-ucsc-metagenomics/blob/clustering/files/sourmash_tetramer-cluster-extract.ipynb
I am not sure how to manage all of this but it seems increasingly clear that having it all in different repos is confusing :). Shall we start a sourmash-utils repo that has somewhat looser testing and contribution requirements than sourmash itself?
A few other thoughts —
I want to avoid adding too many things to the ‘sourmash’ command line itself, e.g. ‘sourmash signature downsample’ or something like it will eventually be there but it’s probably a niche function for the moment;
this could be viewed as a way to enrich and flesh out the internal Python API of sourmash, which has been simplified recently but could use more use cases;
we also need to do something like protocols or recipes for sourmash; this would be something different, more like khmer/sandbox/ (which is, yes, a festering pile of cat turds, but that’s scientific software development for you :).
the lack of fast sourmash releases is a problem here; we'd probably have a sourmash-utils branch in the sourmash repo so that we could spot-fix various problems as discovered them, and before they make their way into the master branch. This seems preferable to releasing new versions of sourmash quickly, which (at least with our current software development approach) would lead to lots of buggy or ill-considered sourmash releases, not to mention a fair amount of overhead.
The text was updated successfully, but these errors were encountered: