Skip to content

Optional Boundary Inclusiveness for Within Queries#293

Closed
cbueth wants to merge 41 commits intosdd:masterfrom
cbueth:feat/optional-boundary-inclusiveness
Closed

Optional Boundary Inclusiveness for Within Queries#293
cbueth wants to merge 41 commits intosdd:masterfrom
cbueth:feat/optional-boundary-inclusiveness

Conversation

@cbueth
Copy link

@cbueth cbueth commented Feb 26, 2026

TLDR; I have implemented the ability to toggle boundary inclusiveness for within queries across all KD-tree variants in Kiddo (Mutable, Immutable, Hybrid, and Fixed).

Description

This PR introduces the option for non-inclusive boundary searches in within queries. This change is motivated by requirements in certain information-theoretic estimators where a strict distance less than the radius is necessary (KSG: Type I vs Type 2 https://doi.org/10.1103/PhysRevE.69.066138, distance < radius is required for one of them). By generalizing the internal logic, we avoid code duplication while maintaining the high performance of Kiddo, especially in the leaf-level SIMD loops.

The implementation propagates an inclusive flag through the search macros for Mutable, Immutable, Hybrid, and Fixed trees. This is consistent with the findings about prunings and the changes of PR #290 & #291. New *_with_condition methods provide this flexibility while keeping the standard API backward compatible. So as #290 would change edge cases, with this PRs changes the user can choose to include or not to include points at exactly the maximal distance. If you want, please suggest an alternative name or API.

A full suite of tests has been added to verify boundary behavior across all tree types and distance metrics. One thing to keep in mind is whether we should eventually move towards a more structured QueryOptions approach if more search parameters are added in the future.

Changes:

  • I've updated the internal search macros and leaf-level distance checks to support an inclusive: bool flag. This flag is propagated through the query stack.
  • Introduced *_with_condition variants:
    • within -> within_with_condition
    • within_unsorted -> within_unsorted_with_condition
    • within_unsorted_iter -> within_unsorted_iter_with_condition
    • nearest_n_within -> nearest_n_within_with_condition
  • This is fully backward compatible.

sdd and others added 30 commits December 8, 2025 07:07
fixes bug where nearest_n_within accessed self.content_items instead of
remainder_items for remainder elements, causing incorrect results when
dataset size % CHUNK_SIZE != 0. Also removed unnecessary unsafe code in
best_n_within.

Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
tests nearest_n_within with size-33 dataset to verify items in remainder
region are found correctly. Before the fix, this would access
self.content_items[0] instead of remainder_items[0], returning wrong items.

Signed-off-by: Markus Zoppelt <markus.zoppelt@helsing.ai>
If leaf_items.len() exceeds u32::MAX (~4.3 billion), this silently
truncates. For datasets with billions of points, this is realistic and
causes severe corruption.
* release-plz checkout depth fixed so that full changelogs are generated
* add commitlint with conventional commits config
…thin_unsorted_iter

within_unsorted_iter is modified to decouple the lifetime
of the iterator from that of the query by performing a
generally very cheap copy just once at the start of the query
- deprecate `rd_update` with `D::accumulate` for consistent handling of sum-based and max-based metrics
- conditional logic for SIMD (L1/L2) and general L∞
- differentiate distance accumulation behaviour
- integration `nearest_n` tests (Chebyshev, Manhattan, SquaredEuclidean).
@cbueth cbueth force-pushed the feat/optional-boundary-inclusiveness branch from 03d0166 to 7b947a6 Compare February 26, 2026 14:35
@cbueth
Copy link
Author

cbueth commented Feb 26, 2026

Closed in favor of #294.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants