-
Notifications
You must be signed in to change notification settings - Fork 14
Add method to enforce compliance with a given version of the SSSOM specification. #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add a new method to the MappingSetDataFrame class to automatically determine the minimum version of the SSSOM specification the set is compatible with -- that is, the earliest version that defines all the slots and all the enum values present in the set.
Fix wrong slot name when looking for "composed entity expression". Let Python compare version numbers as tuples of integers. Use `max(list)` instead of `sorted(list)[-1]`.
Amend the SSSOMSchemaView#get_minimum_version() method to return a (major, minor) tuple, rather than a SssomVersionEnum object. The SssomVersionObject (which is automatically generated from the LinkML schema) is cumbersome to use, for at least two reasons: 1) obtaining the actual value of the enum requires accessing two levels of attributes (SssomVersionObject.code.text); 2) SssomVersionEnum values cannot be meaningfully compared (e.g. to check that a given version number is higher than another given version), we must (a) obtain the text value, (b) split that value over the middle dot, (c) convert the strings to integers, (d) put the integers into a tuple. OK, this can be done in one line of code, but this is cumbersome all the same, and it's best if that kind of things is not left to client code.
Add a small helper function to turn a "X.Y" string into a valid SSSOM version number represented as a tuple of integers (X, Y). Instead of working on the input string directly (splitting into two substrings, then converting the substrings to integers), we first convert the string into a SssomVersionEnum object, from which we get the string back. This is so we can rely on the LinkML-generated code to automatically check that the provided value is a valid value that correctly identifies a valid SSSOM version, without having to embed into the method the knowledge of which versions are valid at any given time.
Add a new method `MappingSetDataFrame#enforce_compliance()` to ensure that a mapping set is compliant with a given version of the SSSOM specification, by removing any slot or slot value that has only been defined in a later version. The method can also be used to optionally remove any extra non-standard slot that has not been properly declared as an extension slot (strict=True).
Currently the fact that a given enum value has been added in a specific version of the specification (for example, "composed entity expression" is new in 1.1) is not formally recorded anywhere, and wherever that information is needed we need custom code to deal with it. This commit adds a new constant `NEW_ENUM_VALUES` that provides that information once and for all. This makes the code for both the `get_compatible_version()` and `enforce_version()` much simpler, avoids duplicated information, and will make it easier to later cope with future new enum values.
Add a `get_new_enum_values()` method to the SSSOMSchemaView class to get enum values that were introduced after a given version of the spec. First, this dispenses client code from having to explicitly import the NEW_ENUM_VALUES constant. Second, hopefully in the future we might be able to get the information about new enum values directly from the LinkML schema rather than from a hard-coded list, and when that happens we will simply have to update the `get_new_enum_values()` method without impacting the code that is calling that method.
matentzn
approved these changes
Oct 8, 2025
Collaborator
matentzn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me. Thank you @gouttegd! 🚀
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR builds upon #604 to implement the
enforce_version()method briefly envisioned in the discussion about future releases of the SSSOM schema (mapping-commons/sssom#490).As the #604 PR, this will only work once (1) a new release of the SSSOM schema will be available and (2) SSSOM-Py has been updated to use said new release.
Briefly, the idea is that, given any mapping set object (
msdf), callingmsdf.enforce_version("1.0")will forcibly remove any slot or enum value that is defined only in a version posterior to version 1.0. The method will either return a new mapping set (default), or modify the set it is called upon directly (inplace=True) – this should be familiar enough to anyone who has worked with Pandas data frames.