Optional Boundary Inclusiveness for Within Queries#293
Closed
cbueth wants to merge 41 commits intosdd:masterfrom
Closed
Optional Boundary Inclusiveness for Within Queries#293cbueth wants to merge 41 commits intosdd:masterfrom
cbueth wants to merge 41 commits intosdd:masterfrom
Conversation
…ded in Rust 1.54 instead
fixes bug where nearest_n_within accessed self.content_items instead of remainder_items for remainder elements, causing incorrect results when dataset size % CHUNK_SIZE != 0. Also removed unnecessary unsafe code in best_n_within. Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
tests nearest_n_within with size-33 dataset to verify items in remainder region are found correctly. Before the fix, this would access self.content_items[0] instead of remainder_items[0], returning wrong items. Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
If leaf_items.len() exceeds u32::MAX (~4.3 billion), this silently truncates. For datasets with billions of points, this is realistic and causes severe corruption.
* release-plz checkout depth fixed so that full changelogs are generated * add commitlint with conventional commits config
…thin_unsorted_iter within_unsorted_iter is modified to decouple the lifetime of the iterator from that of the query by performing a generally very cheap copy just once at the start of the query
…lidean distance metrics
- deprecate `rd_update` with `D::accumulate` for consistent handling of sum-based and max-based metrics - conditional logic for SIMD (L1/L2) and general L∞ - differentiate distance accumulation behaviour
- integration `nearest_n` tests (Chebyshev, Manhattan, SquaredEuclidean).
…ceMetric` doc, add Gaussian scenario to tests
… trait - improve test coverage
…metrics, clarify usage scenarios
…h parameterised power
- add flag into query logic - add test `test_within_squared_euclidean` for all metrics
03d0166 to
7b947a6
Compare
Author
|
Closed in favor of #294. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TLDR; I have implemented the ability to toggle boundary inclusiveness for
withinqueries across all KD-tree variants in Kiddo (Mutable,Immutable,Hybrid, andFixed).Description
This PR introduces the option for non-inclusive boundary searches in within queries. This change is motivated by requirements in certain information-theoretic estimators where a strict distance less than the radius is necessary (KSG: Type I vs Type 2 https://doi.org/10.1103/PhysRevE.69.066138,
distance < radiusis required for one of them). By generalizing the internal logic, we avoid code duplication while maintaining the high performance of Kiddo, especially in the leaf-level SIMD loops.The implementation propagates an inclusive flag through the search macros for Mutable, Immutable, Hybrid, and Fixed trees. This is consistent with the findings about prunings and the changes of PR #290 & #291. New
*_with_conditionmethods provide this flexibility while keeping the standard API backward compatible. So as #290 would change edge cases, with this PRs changes the user can choose to include or not to include points at exactly the maximal distance. If you want, please suggest an alternative name or API.A full suite of tests has been added to verify boundary behavior across all tree types and distance metrics. One thing to keep in mind is whether we should eventually move towards a more structured
QueryOptionsapproach if more search parameters are added in the future.Changes:
inclusive: boolflag. This flag is propagated through the query stack.*_with_conditionvariants:within->within_with_conditionwithin_unsorted->within_unsorted_with_conditionwithin_unsorted_iter->within_unsorted_iter_with_conditionnearest_n_within->nearest_n_within_with_condition