Skip to content

Conversation

@gouttegd
Copy link
Contributor

This PR builds upon #604 to implement the enforce_version() method briefly envisioned in the discussion about future releases of the SSSOM schema (mapping-commons/sssom#490).

As the #604 PR, this will only work once (1) a new release of the SSSOM schema will be available and (2) SSSOM-Py has been updated to use said new release.

Briefly, the idea is that, given any mapping set object (msdf), calling msdf.enforce_version("1.0") will forcibly remove any slot or enum value that is defined only in a version posterior to version 1.0. The method will either return a new mapping set (default), or modify the set it is called upon directly (inplace=True) – this should be familiar enough to anyone who has worked with Pandas data frames.

gouttegd and others added 12 commits September 1, 2025 18:34
Add a new method to the MappingSetDataFrame class to automatically
determine the minimum version of the SSSOM specification the set is
compatible with -- that is, the earliest version that defines all the
slots and all the enum values present in the set.
Fix wrong slot name when looking for "composed entity expression".

Let Python compare version numbers as tuples of integers.

Use `max(list)` instead of `sorted(list)[-1]`.
Amend the SSSOMSchemaView#get_minimum_version() method to return a
(major, minor) tuple, rather than a SssomVersionEnum object. The
SssomVersionObject (which is automatically generated from the LinkML
schema) is cumbersome to use, for at least two reasons:

1) obtaining the actual value of the enum requires accessing two levels
   of attributes (SssomVersionObject.code.text);
2) SssomVersionEnum values cannot be meaningfully compared (e.g. to
   check that a given version number is higher than another given
   version), we must (a) obtain the text value, (b) split that value
   over the middle dot, (c) convert the strings to integers, (d) put the
   integers into a tuple. OK, this can be done in one line of code, but
   this is cumbersome all the same, and it's best if that kind of things
   is not left to client code.
Add a small helper function to turn a "X.Y" string into a valid SSSOM
version number represented as a tuple of integers (X, Y).

Instead of working on the input string directly (splitting into two
substrings, then converting the substrings to integers), we first
convert the string into a SssomVersionEnum object, from which we get the
string back. This is so we can rely on the LinkML-generated code to
automatically check that the provided value is a valid value that
correctly identifies a valid SSSOM version, without having to embed into
the method the knowledge of which versions are valid at any given time.
Add a new method `MappingSetDataFrame#enforce_compliance()` to ensure
that a mapping set is compliant with a given version of the SSSOM
specification, by removing any slot or slot value that has only been
defined in a later version.

The method can also be used to optionally remove any extra non-standard
slot that has not been properly declared as an extension slot
(strict=True).
Currently the fact that a given enum value has been added in a specific
version of the specification (for example, "composed entity expression"
is new in 1.1) is not formally recorded anywhere, and wherever that
information is needed we need custom code to deal with it.

This commit adds a new constant `NEW_ENUM_VALUES` that provides that
information once and for all.

This makes the code for both the `get_compatible_version()` and
`enforce_version()` much simpler, avoids duplicated information, and
will make it easier to later cope with future new enum values.
@gouttegd gouttegd self-assigned this Sep 10, 2025
gouttegd and others added 3 commits September 11, 2025 09:53
Add a `get_new_enum_values()` method to the SSSOMSchemaView class to get
enum values that were introduced after a given version of the spec.

First, this dispenses client code from having to explicitly import the
NEW_ENUM_VALUES constant. Second, hopefully in the future we might be
able to get the information about new enum values directly from the
LinkML schema rather than from a hard-coded list, and when that happens
we will simply have to update the `get_new_enum_values()` method without
impacting the code that is calling that method.
@gouttegd gouttegd marked this pull request as ready for review October 8, 2025 17:54
@gouttegd gouttegd requested a review from matentzn October 8, 2025 17:54
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me. Thank you @gouttegd! 🚀

@gouttegd gouttegd merged commit 76165d5 into mapping-commons:master Oct 8, 2025
6 checks passed
@gouttegd gouttegd deleted the enforce-version branch October 8, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants