feat: move validation & crossmatch to Rust, add 98 tests across full pipeline#2
feat: move validation & crossmatch to Rust, add 98 tests across full pipeline#2humancto wants to merge 2 commits into
Conversation
…pipeline Restructure the Rust codebase from a monolithic main.rs into a testable library (exoplanet_hunter) with four modules: bls, validate, crossmatch, and io. Add CLI subcommands (search, validate, crossmatch) with backward compatibility for the original top-level flags. Rust changes: - bls.rs: BLS algorithm, SNR estimation, phase computation, median - validate.rs: 5 false-positive tests (odd/even depth, secondary eclipse, transit shape, period agreement, radius ratio) + scoring, parallel via Rayon - crossmatch.rs: Hash-based O(1) catalog lookups replacing O(N*M) Python loop - io.rs: CSV light curve parsing with NaN/Inf filtering, file discovery Python changes: - Vectorize phase-fold binning loop in analyze_candidates.py (numpy.add.at) - Add pytest test suite with shared fixtures (synthetic light curves, mock catalogs) Test coverage (98 tests total): - 66 Rust tests: BLS period recovery, SNR edge cases, all validation tests, scoring boundaries, catalog indexing, CSV parsing - 32 Python tests: validation functions, scoring, cross-matching, plotting, binning correctness Infrastructure: - Cargo.toml: library target, dev-dependencies (tempfile) - pytest.ini, tests/conftest.py: Python test configuration - Makefile: `make test` target (cargo test + pytest) - README.md + CLAUDE.md: updated architecture docs https://claude.ai/code/session_01LxfL2UwEep3JY7vDMANz7A
…drops - crossmatch: resolve hostname collisions by matching longest hostname first (fixes non-deterministic HashMap iteration bug for overlapping names like toi_12 vs toi_123), add test for collision case - validate: handle NaN in transit shape sort with unwrap_or instead of unwrap; clamp negative secondary depth to 0; validate our_period in period agreement test - main: log warnings for skipped candidates instead of silent filter_map dropout; report skipped count in validation summary - bls: guard against division by zero in SNR when out_transit has < 2 pts https://claude.ai/code/session_017xDdpKgRCbtSiWx9YzXYMk
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 046a6cba28
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let normalized = normalize_name(filename); | ||
| let mut best_match: Option<(&str, &Vec<CatalogEntry>)> = None; | ||
| for (hostname, entries) in &self.by_hostname { | ||
| if normalized.contains(hostname.as_str()) { |
There was a problem hiding this comment.
Match hostnames on token boundaries
Using contains for hostname matching will mark unrelated targets as KNOWN when one hostname is a prefix of another (for example, TOI-123 matching a filename for TOI-1234). In catalogs with many similarly numbered systems, this produces false positives in crossmatch results and can incorrectly suppress genuinely new candidates.
Useful? React with 👍 / 👎.
| } | ||
| } | ||
| } | ||
| best_match.and_then(|(_, entries)| entries.first()) |
There was a problem hiding this comment.
Choose matched planet by period instead of first row
For multi-planet hosts, returning entries.first() can attach the wrong known_planet and known_period to a candidate even when the host match is correct. This happens whenever catalog row order does not correspond to the candidate’s orbital period (e.g., host has planets b/c/d), so the output metadata becomes misleading for downstream analysis.
Useful? React with 👍 / 👎.
Restructure the Rust codebase from a monolithic main.rs into a testable
library (exoplanet_hunter) with four modules: bls, validate, crossmatch,
and io. Add CLI subcommands (search, validate, crossmatch) with backward
compatibility for the original top-level flags.
Rust changes:
transit shape, period agreement, radius ratio) + scoring, parallel via Rayon
Python changes:
Test coverage (98 tests total):
scoring boundaries, catalog indexing, CSV parsing
binning correctness
Infrastructure:
make testtarget (cargo test + pytest)https://claude.ai/code/session_01LxfL2UwEep3JY7vDMANz7A