Skip to content

feat: move validation & crossmatch to Rust, add 98 tests across full pipeline#2

Closed
humancto wants to merge 2 commits into
mainfrom
claude/review-exohuntr-pr-QBV0C
Closed

feat: move validation & crossmatch to Rust, add 98 tests across full pipeline#2
humancto wants to merge 2 commits into
mainfrom
claude/review-exohuntr-pr-QBV0C

Conversation

@humancto
Copy link
Copy Markdown
Owner

Restructure the Rust codebase from a monolithic main.rs into a testable
library (exoplanet_hunter) with four modules: bls, validate, crossmatch,
and io. Add CLI subcommands (search, validate, crossmatch) with backward
compatibility for the original top-level flags.

Rust changes:

  • bls.rs: BLS algorithm, SNR estimation, phase computation, median
  • validate.rs: 5 false-positive tests (odd/even depth, secondary eclipse,
    transit shape, period agreement, radius ratio) + scoring, parallel via Rayon
  • crossmatch.rs: Hash-based O(1) catalog lookups replacing O(N*M) Python loop
  • io.rs: CSV light curve parsing with NaN/Inf filtering, file discovery

Python changes:

  • Vectorize phase-fold binning loop in analyze_candidates.py (numpy.add.at)
  • Add pytest test suite with shared fixtures (synthetic light curves, mock catalogs)

Test coverage (98 tests total):

  • 66 Rust tests: BLS period recovery, SNR edge cases, all validation tests,
    scoring boundaries, catalog indexing, CSV parsing
  • 32 Python tests: validation functions, scoring, cross-matching, plotting,
    binning correctness

Infrastructure:

  • Cargo.toml: library target, dev-dependencies (tempfile)
  • pytest.ini, tests/conftest.py: Python test configuration
  • Makefile: make test target (cargo test + pytest)
  • README.md + CLAUDE.md: updated architecture docs

https://claude.ai/code/session_01LxfL2UwEep3JY7vDMANz7A

claude added 2 commits March 29, 2026 19:05
…pipeline

Restructure the Rust codebase from a monolithic main.rs into a testable
library (exoplanet_hunter) with four modules: bls, validate, crossmatch,
and io. Add CLI subcommands (search, validate, crossmatch) with backward
compatibility for the original top-level flags.

Rust changes:
- bls.rs: BLS algorithm, SNR estimation, phase computation, median
- validate.rs: 5 false-positive tests (odd/even depth, secondary eclipse,
  transit shape, period agreement, radius ratio) + scoring, parallel via Rayon
- crossmatch.rs: Hash-based O(1) catalog lookups replacing O(N*M) Python loop
- io.rs: CSV light curve parsing with NaN/Inf filtering, file discovery

Python changes:
- Vectorize phase-fold binning loop in analyze_candidates.py (numpy.add.at)
- Add pytest test suite with shared fixtures (synthetic light curves, mock catalogs)

Test coverage (98 tests total):
- 66 Rust tests: BLS period recovery, SNR edge cases, all validation tests,
  scoring boundaries, catalog indexing, CSV parsing
- 32 Python tests: validation functions, scoring, cross-matching, plotting,
  binning correctness

Infrastructure:
- Cargo.toml: library target, dev-dependencies (tempfile)
- pytest.ini, tests/conftest.py: Python test configuration
- Makefile: `make test` target (cargo test + pytest)
- README.md + CLAUDE.md: updated architecture docs

https://claude.ai/code/session_01LxfL2UwEep3JY7vDMANz7A
…drops

- crossmatch: resolve hostname collisions by matching longest hostname
  first (fixes non-deterministic HashMap iteration bug for overlapping
  names like toi_12 vs toi_123), add test for collision case
- validate: handle NaN in transit shape sort with unwrap_or instead of
  unwrap; clamp negative secondary depth to 0; validate our_period in
  period agreement test
- main: log warnings for skipped candidates instead of silent filter_map
  dropout; report skipped count in validation summary
- bls: guard against division by zero in SNR when out_transit has < 2 pts

https://claude.ai/code/session_017xDdpKgRCbtSiWx9YzXYMk
@humancto humancto closed this Mar 29, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 046a6cba28

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/crossmatch.rs
let normalized = normalize_name(filename);
let mut best_match: Option<(&str, &Vec<CatalogEntry>)> = None;
for (hostname, entries) in &self.by_hostname {
if normalized.contains(hostname.as_str()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match hostnames on token boundaries

Using contains for hostname matching will mark unrelated targets as KNOWN when one hostname is a prefix of another (for example, TOI-123 matching a filename for TOI-1234). In catalogs with many similarly numbered systems, this produces false positives in crossmatch results and can incorrectly suppress genuinely new candidates.

Useful? React with 👍 / 👎.

Comment thread src/crossmatch.rs
}
}
}
best_match.and_then(|(_, entries)| entries.first())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Choose matched planet by period instead of first row

For multi-planet hosts, returning entries.first() can attach the wrong known_planet and known_period to a candidate even when the host match is correct. This happens whenever catalog row order does not correspond to the candidate’s orbital period (e.g., host has planets b/c/d), so the output metadata becomes misleading for downstream analysis.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants