Fix incorrect primary_id_field display for multimodal DataProvider#866
Merged
Fix incorrect primary_id_field display for multimodal DataProvider#866
Conversation
- In __repr__, check data dict for 'primary_id_field' key instead of using self.primary_dataset_id_field_name (which printed it for all datasets) - In prepare_datasets, use .get() with is not None check instead of 'in' operator to properly handle None/False sentinel values Agent-Logs-Url: https://github.com/lincc-frameworks/hyrax/sessions/46dba48b-42a4-45ba-bde5-f40d414e0c48 Co-authored-by: drewoldag <47493171+drewoldag@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix incorrectly printing primary_id_field for multimodal DataProvider
Fix incorrect primary_id_field display for multimodal DataProvider
Apr 6, 2026
drewoldag
approved these changes
Apr 6, 2026
Collaborator
drewoldag
left a comment
There was a problem hiding this comment.
This looks correct to me.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes DataProvider handling of primary_id_field in multimodal configurations so that only the dataset that defines primary_id_field is displayed/treated as primary, and TOML “unset” sentinel values don’t get misinterpreted.
Changes:
- Update
DataProvider.__repr__to displayprimary_id_fieldbased on the per-dataset request dict rather than the provider-level cached value. - Update
DataProvider.prepare_datasetsprimary-dataset detection logic to use.get(...)rather than key presence.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #866 +/- ##
==========================================
+ Coverage 66.54% 66.55% +0.01%
==========================================
Files 62 62
Lines 6513 6515 +2
==========================================
+ Hits 4334 4336 +2
Misses 2179 2179 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Click here to view all benchmarks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
DataProvider.__repr__printsprimary_id_fieldfor every dataset in a multimodal provider, not just the one that defines it. A secondary dataset with noprimary_id_fieldincorrectly shows the primary dataset's value.Solution Description
Two fixes in
data_provider.py:__repr__: Check per-datasetdatadict for"primary_id_field"key instead of the class-levelself.primary_dataset_id_field_name(which is truthy for all datasets once any dataset sets it)prepare_datasets: Usedataset_definition.get("primary_id_field") is not Noneinstead of"primary_id_field" in dataset_definitionto correctly skipNone/Falsesentinel values (TOML has no null)Code Quality