Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Erroneous f_unique_to_query calculation in Rust code. #3137

Closed
ctb opened this issue May 4, 2024 · 3 comments · Fixed by #3138
Closed

Erroneous f_unique_to_query calculation in Rust code. #3137

ctb opened this issue May 4, 2024 · 3 comments · Fixed by #3138
Labels

Comments

@ctb
Copy link
Contributor

ctb commented May 4, 2024

It looks like #2943 introduced a subtly broken calculation that uses the current size of the query metagenome as the denominator for the f_unique_to_query calculation:

let f_unique_to_query = match_size as f64 / query.size() as f64;

In order to match the Python version, this should use the original size of the query metagenome:

self.f_unique_to_query = (

Discovered over in sourmash-bio/sourmash_plugin_branchwater#318 when trying to debug discrepancies there ;)

@ctb ctb added the rust label May 4, 2024
@ctb
Copy link
Contributor Author

ctb commented May 4, 2024

I can't tell if the same calculation in linear.rs is problematic:

let f_unique_to_query = intersect_orig as f64 / query.size() as f64;

It looks like here maybe query is actually orig_query? If so I would suggest renaming it in the source code.

@ctb
Copy link
Contributor Author

ctb commented May 4, 2024

And, also, we should have tests for this on the Rust side in sourmash core. Right now we're relying on catching it over in the branchwater plugin, which is, well, roundabout and also a bit unreliable!

@ctb
Copy link
Contributor Author

ctb commented May 9, 2024

bluegenes pushed a commit that referenced this issue May 10, 2024
This PR fixes an issue introduced in #2943 where we introduced a subtly
broken calculation that uses the _current_ size of the query metagenome
as the denominator for the `f_unique_to_query` calculation.

Fixes #3137

This PR also adds some commented-out test code that demonstrates
#3139 /
sourmash-bio/sourmash_plugin_branchwater#322.
That's something I haven't been able to debug, so I'd suggest fixing
that independently - I'd rather fix _a_ problem _now_, rather than
waiting until we can fix _multiple_ problems at some later indeterminate
time :).

## Notes

- [x] do we need to fix same problem in `linear.rs`? or just rename
things per #3137?
- [x] we should add some tests for this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant