-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explain p_match and p_query in sourmash documentation #1289
Comments
from @bluegenes in an e-mail response -
These sourmash issues are directly related to your #2 and might also be useful: |
@ctb This question is motived by unexpected values of overlap as seen below:
The last 2 entries have the same p_query but the overlap seems to be proportional to the p_match which does not make sense to me |
The documentation here (appendix A, on sourmash gather) and here (on abundance weighting) would probably be the place to look. Specific suggestions for changes would be very welcome! The most likely explanation for the above situation is that the results are from query signatures computed with code referencesThe (more optimized, hence ugly/unreadable) code for this is in the f_match = len(intersect_mh) / len(found_mh)
f_orig_query = len(intersect_orig_mh) / orig_query_len The code in # f_orig_query is the containment of the query by the match.
# (note, this only works because containment is 100% in combined).
assert approx_equal(combined_sig.contained_by(match), f_orig_query) but it's only tested for non-abund signature at the moment. It would be a great addition if someone were to do something similar with abund signatures. |
Oh, and the code for what's actually output by the sourmash CLI for p_match when you run sourmash gather is in pct_query = '{:.1f}%'.format(result.f_unique_weighted*100) so it does in fact use the |
Regarding the documentation structure, I have some specific suggestions:
|
I think I am using 2 samples (sampleA.fq and sampleB.fq). Each one has 25k reads.
Then queried the metagenome against the 2 samples with and without
The output with abund tracking
The output with
|
So far, I can not find a clear definition to |
ref #1227 |
I just wrote the following up for #2184. Comments welcome! Appendix C: sourmash gather output examplesBelow we show two real gather analyses done with a mock metagenome, sourmash gather with a query containing abundance information
sourmash gather with the same query, ignoring abundances
Notes and comparisonsThere are a few interesting things to point out about the above output:
Last but not least, something interesting is going on here with strains. Consider a few more details:
What's up?! What's happening here is that The main things to keep in mind for gather are this:
We know it's confusing but it's the best output we've been able to |
it's not easy to find!
The text was updated successfully, but these errors were encountered: