Skip to content

Removed nan-handlers for torch tensors. Updated the unit tests.#751

Merged
drewoldag merged 1 commit intomainfrom
issue/737/remove-old-nan-handlers
Mar 4, 2026
Merged

Removed nan-handlers for torch tensors. Updated the unit tests.#751
drewoldag merged 1 commit intomainfrom
issue/737/remove-old-nan-handlers

Conversation

@drewoldag
Copy link
Copy Markdown
Collaborator

Change Description

Closes #737
Removed nan handlers for torch tensors and updated the associated test file.
The unit tests were simplified as well. There was an unused (or hardly used) dataset class that would produce data with nans, but that was previously packed into the HyraxRandomDataset. So I removed the old RandomNanDataset that was exclusively used for testing this feature. This is ok because HyraxRandomDataset can produce nans in it's random data and exercise the same functionality.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the torch-specific NaN handling paths from DataProvider now that batches are expected to remain NumPy until they reach the model, and updates the NaN-related unit tests accordingly.

Changes:

  • Removed the torch.Tensor-specific _handle_nans registration and torch-based NaN helper functions.
  • Updated test_nan.py to use NumPy-based NaN assertions and removed the test-only NaN dataset subclass.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/hyrax/data_sets/data_provider.py Drops torch tensor NaN-handling logic and simplifies tuple/list handling to NumPy arrays only.
tests/hyrax/test_nan.py Removes torch dependency and adapts tests to validate NaN handling with NumPy arrays and HyraxRandomDataset.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 48 to 52
else:
# Keep non-tensor elements unchanged (e.g., labels, metadata)
# Keep non-numpy elements unchanged (e.g., labels, metadata)
handled_elements.append(element)

return tuple(handled_elements)
Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_handle_nans_tuple is registered for both tuple and list, but it always returns tuple(handled_elements). This changes the type for list inputs (and contradicts the backward-compatibility comment). Consider preserving the input type (e.g., return list for list inputs) and updating the docstring/comment to reflect that it handles both tuples and lists.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.62%. Comparing base (6156e07) to head (f447fe0).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #751      +/-   ##
==========================================
+ Coverage   64.45%   64.62%   +0.17%     
==========================================
  Files          61       61              
  Lines        5925     5875      -50     
==========================================
- Hits         3819     3797      -22     
+ Misses       2106     2078      -28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Collaborator

@mtauraso mtauraso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢 it

@drewoldag drewoldag merged commit 101a720 into main Mar 4, 2026
13 checks passed
@drewoldag drewoldag deleted the issue/737/remove-old-nan-handlers branch March 4, 2026 23:50
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 5, 2026

Before [6156e07] After [54da373] Ratio Benchmark (Parameter)
failed failed n/a data_cache_benchmarks.DataCacheBenchmarks.time_preload_cache_hsc1k
failed failed n/a data_cache_benchmarks.DataCacheBenchmarks.track_cache_hsc1k_hyrax_size_undercount
failed failed n/a data_request_benchmarks.DatasetRequestBenchmarks.time_request_all_data
39.1±0.7ms 40.5±1ms 1.04 benchmarks.time_nb_obj_dir
38.2±0.3ms 38.5±0.4ms 1.01 benchmarks.time_nb_obj_construct
1.98±0.01s 1.99±0.02s 1.01 benchmarks.time_train_help
3.62±0s 3.64±0.02s 1.01 vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(256, 'chromadb')
2.01±0.01s 2.01±0.02s 1.00 benchmarks.time_infer_help
1.99±0.03s 1.99±0.01s 1.00 benchmarks.time_lookup_help
2.03±0.01s 2.02±0.03s 1.00 benchmarks.time_rebuild_manifest_help

Click here to view all benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove nan handling functions that operate on tensors

3 participants