Skip to content

Fix isolation forest edge cases and improve input validation#8

Merged
whilo merged 2 commits intomainfrom
bugfix/iforest-edge-cases
Apr 13, 2026
Merged

Fix isolation forest edge cases and improve input validation#8
whilo merged 2 commits intomainfrom
bugfix/iforest-edge-cases

Conversation

@whilo
Copy link
Copy Markdown
Member

@whilo whilo commented Apr 13, 2026

  • Use complete binary tree sizing (2^(depth+1)-1 nodes) instead of 2n-1, fixing ArrayIndexOutOfBounds for non-power-of-2 sample sizes
  • Cap sample-size to n-rows in training (matches scikit-learn behavior)
  • Guard against NaN scores when sample-size=1 (cPsi=0 division)
  • Switch to Fisher-Yates partial shuffle for sampling without replacement
  • Add column length mismatch validation with clear error messages
  • Add explicit nil-column and wrong-type error messages in prepare-features
  • Add 8 edge-case tests: single-row, two-rows, sample-size capping, mismatched lengths, missing features, wrong types, NaN inputs, sampling-without-replacement correctness
  • Format fixes (query/plan.clj, iforest.clj, iforest_test.clj)

whilo added 2 commits April 13, 2026 00:44
- Use complete binary tree sizing (2^(depth+1)-1 nodes) instead of 2n-1,
  fixing ArrayIndexOutOfBounds for non-power-of-2 sample sizes
- Cap sample-size to n-rows in training (matches scikit-learn behavior)
- Guard against NaN scores when sample-size=1 (cPsi=0 division)
- Switch to Fisher-Yates partial shuffle for sampling without replacement
- Add column length mismatch validation with clear error messages
- Add explicit nil-column and wrong-type error messages in prepare-features
- Add 8 edge-case tests: single-row, two-rows, sample-size capping,
  mismatched lengths, missing features, wrong types, NaN inputs,
  sampling-without-replacement correctness
- Format fixes (query/plan.clj, iforest.clj, iforest_test.clj)
- Rewrite resolve-anomaly-expressions to walk all query clauses
  (SELECT, WHERE, HAVING, ORDER BY) instead of only SELECT
- Tree-walk collects unique anomaly expressions, computes each once,
  and rewrites references throughout the query
- WHERE/HAVING use synthetic column name (pre-projection),
  ORDER BY uses SELECT alias when available (post-projection)
- Inject 50 anomalies into --demo taxi data (high fares, zero tips)
- Train and register 'taxi_anomaly' model automatically in --demo mode
- Add end-to-end tests for WHERE filter, ORDER BY, and deduplication
@whilo whilo merged commit 40d9bbb into main Apr 13, 2026
5 of 6 checks passed
@whilo whilo deleted the bugfix/iforest-edge-cases branch April 13, 2026 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant