Skip to content

Standardize prepare_inputs docstrings across built-in models#877

Merged
drewoldag merged 3 commits intomainfrom
copilot/update-docstrings-for-prepare-inputs
Apr 15, 2026
Merged

Standardize prepare_inputs docstrings across built-in models#877
drewoldag merged 3 commits intomainfrom
copilot/update-docstrings-for-prepare-inputs

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 14, 2026

Change Description

Built-in prepare_inputs methods had inconsistent, inaccurate docstrings — some said "Does NOT convert to PyTorch tensors" (odd phrasing post-rename), others said "Convert structured data to tensor format" (wrong: these return NumPy arrays), and one had no docstring at all.

Solution Description

All five prepare_inputs implementations now share a uniform NumPy-style docstring structure with:

  • A one-line summary describing what the method extracts.
  • A note that prepare_inputs is the interface between the data pipeline and the model, and that users should override it on their model class to reshape or select fields from the collated batch to match the inputs their model expects.
  • A note that Hyrax handles the numpy→tensor conversion and device placement automatically.
  • Parameters and Returns sections following the NumPy docstring convention used elsewhere in the codebase.

Files updated:

  • hyrax_cnn.py — replaced informal "Does NOT convert to PyTorch Tensors..." with uniform docstring.
  • image_dcae.py — fixed factually incorrect "Convert structured data to tensor format." (returns ndarray, not tensor).
  • hyrax_autoencoderv2.py / hyrax_autoencoder.py — replaced "converts structured data to the input tensor we need to run" with accurate docstring.
  • model_registry.py (default_prepare_inputs) — added missing docstring.

Code Quality

  • I have read the Contribution Guide and agree to the Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

Copilot AI linked an issue Apr 14, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Update docstrings for prepare_inputs functions Standardize prepare_inputs docstrings across built-in models Apr 14, 2026
Copilot AI requested a review from drewoldag April 14, 2026 23:36
@drewoldag
Copy link
Copy Markdown
Collaborator

@copilot since these models also act as examples for users, please also include a line in the docstring mentioning that the static method is meant to be modified when the user wants to experiment with different inputs.

The method is the interface between the data and the model. It exists to change the shape of the collated batch of data into the shape that the model expects.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 15, 2026

@copilot since these models also act as examples for users, please also include a line in the docstring mentioning that the static method is meant to be modified when the user wants to experiment with...

Done in b95bc5f. Each prepare_inputs docstring now includes:

This static method is the interface between the data pipeline and the model. Override it on the model class to reshape or select fields from the collated batch to match the inputs your model expects.

Copy link
Copy Markdown
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updates to the docstrings look good to me. No other models implement their own prepare_inputs.

@drewoldag drewoldag marked this pull request as ready for review April 15, 2026 16:19
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.54%. Comparing base (9cb5c42) to head (b95bc5f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #877   +/-   ##
=======================================
  Coverage   66.54%   66.54%           
=======================================
  Files          62       62           
  Lines        6513     6513           
=======================================
  Hits         4334     4334           
  Misses       2179     2179           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Standardizes the documentation for built-in prepare_inputs methods to consistently describe how models extract NumPy arrays from the collated batch dict, and clarifies that Hyrax handles numpy→tensor conversion/device placement.

Changes:

  • Replaced inconsistent/inaccurate prepare_inputs docstrings across built-in models with a uniform NumPy-style structure.
  • Added a missing docstring for default_prepare_inputs in the model registry.
  • Corrected prior misleading wording implying tensor outputs where the implementations return NumPy arrays.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/hyrax/models/model_registry.py Adds a standardized docstring for default_prepare_inputs.
src/hyrax/models/image_dcae.py Updates prepare_inputs docstring to correctly describe extracting a NumPy image array.
src/hyrax/models/hyrax_cnn.py Replaces informal/odd phrasing with standardized prepare_inputs docstring.
src/hyrax/models/hyrax_autoencoderv2.py Updates prepare_inputs docstring (but leaves a return type hint mismatch).
src/hyrax/models/hyrax_autoencoder.py Updates prepare_inputs docstring (but leaves a return type hint mismatch).

@@ -204,12 +204,25 @@ def infer_batch(self, batch):

@staticmethod
def prepare_inputs(data_dict) -> tuple:
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare_inputs is annotated as returning tuple, but the implementation returns a single image array. This conflicts with the updated docstring (Returns: image : numpy.ndarray) and can mislead type checkers and callers. Update the return annotation to an NDArray type (or remove it) to match the actual return value.

Suggested change
def prepare_inputs(data_dict) -> tuple:
def prepare_inputs(data_dict):

Copilot uses AI. Check for mistakes.
@@ -269,12 +269,25 @@ def infer_batch(self, batch):

@staticmethod
def prepare_inputs(data_dict) -> tuple:
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare_inputs is annotated as returning tuple, but it returns a single image array. This is inconsistent with the new docstring (Returns: image : numpy.ndarray) and reduces the usefulness of type hints. Align the return annotation with the actual return type (e.g., an NDArray) or drop the annotation.

Suggested change
def prepare_inputs(data_dict) -> tuple:
def prepare_inputs(data_dict):

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@mtauraso mtauraso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future we may want a way to check that a benchmark works in github CI without comitting to main; however, I think the trade-off here (posting to slack) is reasonable given that we don't add benchmarks that often.

Ugh typed in wrong box. Reviewing this now

Copy link
Copy Markdown
Collaborator

@mtauraso mtauraso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@drewoldag drewoldag merged commit 21fcbf4 into main Apr 15, 2026
13 checks passed
@drewoldag drewoldag deleted the copilot/update-docstrings-for-prepare-inputs branch April 15, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Review docstrings for built-in prepare_inputs

4 participants