Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] revise sourmash tax user experience and output formats #2158

Merged
merged 31 commits into from
Aug 2, 2022

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jul 29, 2022

Miscellaneous sourmash tax UX tidying, motivated by STAMPS 2022 tutorial.

Please see the updated tutorial, added to this PR, here.

This PR:

  • adds -F as a synonym for --output-format to tax metagenome and tax genome
  • adds -F human as a type of output format to tax metagenome and tax genome
  • adds -F lineage_csv as output format for tax metagenome (see tax genome should directly support lineage CSVs as an output format #2153)
  • quotes output filenames in error messages
  • only outputs information about missing lineages when there are some :)
  • switches to using forward quotes instead of backquotes for format output message

Fixes #2153

TODO:

  • add, like, tests, maaaan
  • add headers
  • create issue to change default output to human for sourmash v5?
  • add (adjusted) tutorial?
  • 'human' output filenames - should we add anything to them, similarly to 'krona.tsv'?
  • revisit hackmd formatting...

@codecov
Copy link

codecov bot commented Jul 29, 2022

Codecov Report

Merging #2158 (a83ff3b) into latest (2f38f6c) will increase coverage by 0.08%.
The diff coverage is 98.73%.

@@            Coverage Diff             @@
##           latest    #2158      +/-   ##
==========================================
+ Coverage   84.34%   84.43%   +0.08%     
==========================================
  Files         130      130              
  Lines       15320    15370      +50     
  Branches     2176     2192      +16     
==========================================
+ Hits        12922    12978      +56     
+ Misses       2095     2092       -3     
+ Partials      303      300       -3     
Flag Coverage Δ
python 91.81% <98.73%> (+0.09%) ⬆️
rust 65.29% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/tax/__main__.py 88.47% <95.45%> (+0.49%) ⬆️
src/sourmash/cli/tax/annotate.py 100.00% <100.00%> (+10.52%) ⬆️
src/sourmash/cli/tax/genome.py 90.32% <100.00%> (+6.45%) ⬆️
src/sourmash/cli/tax/metagenome.py 89.28% <100.00%> (+7.14%) ⬆️
src/sourmash/tax/tax_utils.py 98.08% <100.00%> (+0.13%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@ctb
Copy link
Contributor Author

ctb commented Jul 29, 2022

@bluegenes what do you think of the new human-readable format?

I made a new copy of the STAMPS 2022 tutorial and updated the sourmash tax commands there - https://hackmd.io/vr6BT3mdQKWdlXEqcLUnZg?view. The necessary signatures are attached to this comment in a zip, if you want to give it a try!
stamps2022-files.zip

@bluegenes
Copy link
Contributor

column headers would be helpful for printed output! otherwise looking better!

src/sourmash/tax/__main__.py Outdated Show resolved Hide resolved
src/sourmash/tax/__main__.py Outdated Show resolved Hide resolved
src/sourmash/tax/tax_utils.py Outdated Show resolved Hide resolved
@ctb
Copy link
Contributor Author

ctb commented Jul 30, 2022

@bluegenes should we use ANI in the human-readable output? Or add ANI in there? Seems like a good idea...

@bluegenes
Copy link
Contributor

@bluegenes should we use ANI in the human-readable output? Or add ANI in there? Seems like a good idea...

At minimum, we should provide it for tax genome results!

@ctb
Copy link
Contributor Author

ctb commented Jul 30, 2022

done! example output -

sample name    proportion   ANI    lineage
-----------    ----------   ---    -------
MAG3_1             5.3%     91.0%  d__Bacteria;p__Bacteroidota;c__Chlorobia;o__Chlorobiales;f__Chlorobiaceae;g__Prosthecochloris;s__Prosthecochloris vibrioformis
MAG2_1             5.0%     90.8%  d__Bacteria;p__Bacteroidota;c__Chlorobia;o__Chlorobiales;f__Chlorobiaceae;g__Chlorobaculum;s__Chlorobaculum parvum_B
MAG1_1             1.1%     86.5%  d__Bacteria;p__Patescibacteria;c__Paceibacteria;o__Moranbacterales;f__UBA1568;g__JAAXTX01;s__JAAXTX01 sp013334245

@ctb ctb changed the title [WIP] revisit sourmash tax user experience and output formats [MRG] revise sourmash tax user experience and output formats Jul 31, 2022
@ctb
Copy link
Contributor Author

ctb commented Jul 31, 2022

Ready for review & merge, but no hurry @bluegenes

@bluegenes
Copy link
Contributor

sample name proportion ANI lineage

should we call this cANI (for "containment ANI"), as I'm using for the writeup?

doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
@ctb
Copy link
Contributor Author

ctb commented Aug 1, 2022

Fixed - ready for review & merge @bluegenes !

doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
doc/tutorial-lemonade.md Show resolved Hide resolved
doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
doc/tutorial-lemonade.md Outdated Show resolved Hide resolved
src/sourmash/cli/tax/genome.py Show resolved Hide resolved
tests/test_tax.py Outdated Show resolved Hide resolved
Copy link
Contributor

@bluegenes bluegenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

human readable format is v helpful and tutorial is great!

some suggested changes, otherwise good

ctb and others added 2 commits August 1, 2022 15:48
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
@ctb
Copy link
Contributor Author

ctb commented Aug 1, 2022

@bluegenes re the two remaining comments about classifying at a specific rank, I'm worried I don't understand the code well enough to be sure of what's going on :). I'll take a look in more detail and either resolve the comments or amplify on them with a clearer question before I merge, unless you get a chance to dig into the code and reassure me that I'm not messing anything up here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tax genome should directly support lineage CSVs as an output format
2 participants