Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output unassigned hashes from "gather" to same moltype #1151

Closed
wants to merge 1 commit into from

Conversation

olgabot
Copy link
Collaborator

@olgabot olgabot commented Aug 6, 2020

Currently, unassigned hashes are output to a MinHash without a moltype, which means the default is DNA. After testing out gather on some protein data and analyzing the output results, turns out the moltype of the input signature is not preserved. This (hopefully) fixes that!

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Aug 6, 2020

Codecov Report

❗ No coverage uploaded for pull request base (latest@f4fac4b). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             latest    #1151   +/-   ##
=========================================
  Coverage          ?   92.55%           
=========================================
  Files             ?       71           
  Lines             ?     5705           
  Branches          ?        0           
=========================================
  Hits              ?     5280           
  Misses            ?      425           
  Partials          ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f4fac4b...942dc1b. Read the comment docs.

with_abundance = next_query.minhash.track_abundance
e = MinHash(ksize=query.minhash.ksize, n=0, max_hash=new_max_hash,
track_abundance=with_abundance)
track_abundance=with_abundance, **moltype_kwds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I'm wondering if it might be possible to create one with a "e = query.minhash.copy_and_clear()" instead, which will clone the attributes of the query MinHash? The only thing I can see being problematic is that you probably want to use new_max_hash instead of the query.minhash.max_hash, in case the signature's been downsampled. Hmm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it turns out this wasn't even necessary - next_query is actually the right thing to save directly! w00t!
see #1156

@olgabot olgabot closed this Aug 10, 2020
@ctb ctb deleted the olgabot-patch-1 branch August 20, 2022 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants