Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourmash-utils repo? #201

Open
ctb opened this issue May 3, 2017 · 3 comments
Open

sourmash-utils repo? #201

ctb opened this issue May 3, 2017 · 3 comments

Comments

@ctb
Copy link
Contributor

ctb commented May 3, 2017

As sourmash expands to consume many (but not all!) things, we are building many a utility script. For example,

subtract.py —
https://gist.github.com/ctb/a3aad5020039b014b26a4d7c2b097465

some hash-clustering-fu that we are applying to TARA and soil --
https://github.com/ctb/tara-sourmash/blob/master/sourmash-sigs/abundtrim/README.md

downsampling of —scaled signatures --
https://github.com/ctb/tara-sourmash/blob/master/pick-k-and-downsample.py

tetramer nucleotide analysis:
https://github.com/ngs-docs/2017-ucsc-metagenomics/blob/clustering/files/sourmash_tetramer-cluster-extract.ipynb

I am not sure how to manage all of this but it seems increasingly clear that having it all in different repos is confusing :). Shall we start a sourmash-utils repo that has somewhat looser testing and contribution requirements than sourmash itself?

A few other thoughts —

  • I want to avoid adding too many things to the ‘sourmash’ command line itself, e.g. ‘sourmash signature downsample’ or something like it will eventually be there but it’s probably a niche function for the moment;

  • this could be viewed as a way to enrich and flesh out the internal Python API of sourmash, which has been simplified recently but could use more use cases;

  • we also need to do something like protocols or recipes for sourmash; this would be something different, more like khmer/sandbox/ (which is, yes, a festering pile of cat turds, but that’s scientific software development for you :).

  • the lack of fast sourmash releases is a problem here; we'd probably have a sourmash-utils branch in the sourmash repo so that we could spot-fix various problems as discovered them, and before they make their way into the master branch. This seems preferable to releasing new versions of sourmash quickly, which (at least with our current software development approach) would lead to lots of buggy or ill-considered sourmash releases, not to mention a fair amount of overhead.

@taylorreiter
Copy link
Contributor

+1 sourmash-utils repo

@luizirber
Copy link
Member

I think sourmash-utils repo is a good idea, we can still tie into sourmash as a subcommand (sourmash utils <scriptname>).

@luizirber
Copy link
Member

luizirber commented May 17, 2017

OK, I went ahead and create dib-lab/sourmash-utils, and a PR for adding the sourmash utils subcommand to sourmash. If sourmash-utils is not installed, the command will not be available in sourmash.

As an example, I set up the subtract script there. Still need to do some boilerplate code fixes, but it works =P

$ sourmash-utils also works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants