Skip to content

Fix incorrect primary_id_field display for multimodal DataProvider#866

Merged
drewoldag merged 6 commits intomainfrom
copilot/fix-printing-primary-id-field
Apr 15, 2026
Merged

Fix incorrect primary_id_field display for multimodal DataProvider#866
drewoldag merged 6 commits intomainfrom
copilot/fix-printing-primary-id-field

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

Change Description

DataProvider.__repr__ prints primary_id_field for every dataset in a multimodal provider, not just the one that defines it. A secondary dataset with no primary_id_field incorrectly shows the primary dataset's value.

Solution Description

Two fixes in data_provider.py:

  • __repr__: Check per-dataset data dict for "primary_id_field" key instead of the class-level self.primary_dataset_id_field_name (which is truthy for all datasets once any dataset sets it)
  • prepare_datasets: Use dataset_definition.get("primary_id_field") is not None instead of "primary_id_field" in dataset_definition to correctly skip None/False sentinel values (TOML has no null)

Code Quality

  • I have read the Contribution Guide and agree to the Code of Conduct
  • My code follows the code style of this project
  • My code builds (or compiles) cleanly without any errors or warnings
  • My code contains relevant comments and necessary documentation

- In __repr__, check data dict for 'primary_id_field' key instead of
  using self.primary_dataset_id_field_name (which printed it for all datasets)
- In prepare_datasets, use .get() with is not None check instead of 'in'
  operator to properly handle None/False sentinel values

Agent-Logs-Url: https://github.com/lincc-frameworks/hyrax/sessions/46dba48b-42a4-45ba-bde5-f40d414e0c48

Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix incorrectly printing primary_id_field for multimodal DataProvider Fix incorrect primary_id_field display for multimodal DataProvider Apr 6, 2026
Copilot AI requested a review from drewoldag April 6, 2026 22:04
Copy link
Copy Markdown
Collaborator

@drewoldag drewoldag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me.

@drewoldag drewoldag marked this pull request as ready for review April 6, 2026 22:58
Copilot AI review requested due to automatic review settings April 6, 2026 22:58
@drewoldag drewoldag requested a review from mtauraso April 6, 2026 22:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes DataProvider handling of primary_id_field in multimodal configurations so that only the dataset that defines primary_id_field is displayed/treated as primary, and TOML “unset” sentinel values don’t get misinterpreted.

Changes:

  • Update DataProvider.__repr__ to display primary_id_field based on the per-dataset request dict rather than the provider-level cached value.
  • Update DataProvider.prepare_datasets primary-dataset detection logic to use .get(...) rather than key presence.

Comment thread src/hyrax/datasets/data_provider.py Outdated
Comment thread src/hyrax/datasets/data_provider.py Outdated
Comment thread src/hyrax/datasets/data_provider.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.55%. Comparing base (21fcbf4) to head (c04a311).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #866      +/-   ##
==========================================
+ Coverage   66.54%   66.55%   +0.01%     
==========================================
  Files          62       62              
  Lines        6513     6515       +2     
==========================================
+ Hits         4334     4336       +2     
  Misses       2179     2179              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 13, 2026

Before [21fcbf4] After [faefe97] Ratio Benchmark (Parameter)
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(16384, 'chromadb')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(16384, 'qdrant')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(2048, 'chromadb')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(2048, 'qdrant')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(256, 'chromadb')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(256, 'qdrant')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(64, 'chromadb')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.peakmem_load_vector_db(64, 'qdrant')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(16384, 'chromadb')
failed failed n/a vector_db_benchmarks.VectorDBInsertBenchmarks.time_load_vector_db(16384, 'qdrant')

Click here to view all benchmarks.

Copy link
Copy Markdown
Collaborator

@mtauraso mtauraso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@drewoldag drewoldag merged commit 3c66141 into main Apr 15, 2026
8 of 9 checks passed
@drewoldag drewoldag deleted the copilot/fix-printing-primary-id-field branch April 15, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrectly printing primary_id_field for multimodal DataProvider

4 participants