Fix handling of missing terms-of-use metadata in sample set manifest (#766)#990
Conversation
|
Thanks again for clarifying the expected behaviour. This update ensures metadata loading remains robust when terms-of-use columns are absent in legacy or pre-release I’ve kept the change minimal and added tests to cover the missing-column scenarios. Happy to refine further if needed. |
|
All checks have passed. @jonbrenas please take a look when you have time. |
|
Sorry @adilraza99, I thought about this one a bit more and we had already decided to go a different route: the error needs to stay because the data needs to exist for traceability. For the record, a sample set that has not been released yet should have:
|
|
Understood - keeping the error preserves traceability when the metadata is required. I’ll update the implementation so unreleased sample sets use: • terms_of_use_expiry_date = "2099-12-31" This keeps the metadata explicit and aligns with the intended behaviour. |
8a40254 to
ea15930
Compare
|
Thanks @jonbrenas - that clarification helped. I tested this locally with the Ag3 simulator to confirm the behaviour. When the terms-of-use columns are missing, the lookup returns the placeholder values (2099-12-31 / NaN / False), This matches the expected handling for unreleased sample sets. |
|
Sorry, @adilraza99, I was not quite clear with my explanation. What I mean is that if the API is asked to access these 3 columns and one or more of them is missing, it is a problem with the data, not with the API. An error should be raised because someone (most likely me) screwed the pooch upstream and this data should never have been released this way. #766 is thus not an issue for the API, and the code should not be modified to address it. |
5abe0f4 to
202476e
Compare
|
@jonbrenas I’ve updated the implementation so that missing terms-of-use columns now raise a clear error instead of being silently filled. This ensures incomplete metadata is surfaced as a data integrity issue while preserving traceability and keeping the API behaviour explicit. Tests have been updated to reflect the strict validation behaviour. Please let me know if this aligns with the intended design, or if you’d like any adjustments. |
|
@jonbrenas could you take a look at the changes when you have a moment? |
Summary
Fix handling of sample set manifests that do not include terms-of-use metadata.
Some pre-release and legacy manifests legitimately omit the following fields:
terms_of_use_expiry_dateterms_of_use_urlunrestricted_useThe current implementation assumes these columns are always present, which results in a KeyError and prevents
sample_metadata()from loading.Changes
unrestricted_usein_sample_set_has_unrestricted_use()lookup_terms_of_use_info()tolerant of missing columnsWhy this change
Pre-release datasets may not yet have terms-of-use information.
Handling this case explicitly allows metadata to load successfully while making it clear that usage status is unknown.
Additional context
This update keeps behaviour unchanged for public releases and ensures
metadata outputs remain consistent across different release stages.
fixes #766