Skip to content

Conversation

@gouttegd
Copy link
Contributor

This PR completes support for propagation and condensation (#584) by making all parsing functions automatically perform propagation (unless explicitly told not to) before returning the parsed set, and all writing functions automatically perform condensation (again unless explicitly told not to) prior to writing a set.

This notably allows to successfully parse a set containing literal records, where the object_type slot has been condensed (#606).

Some exceptions to the default behaviour described above:

  • Propagation is disabled by default for parse_obographs_json and parse_alignment_xml, because it does not make much sense for those parsers anyway (a set extracted from a OBOGraph-JSON cannot be condensed in the first place). The reason those functions are still made to accept a propagate optional parameter is so that all parsing functions can be called in exactly the same way (important for code that is using the get_parsing_function without knowing exactly which function they will be calling).
  • Condensation is disabled by default for write_rdf and write_owl, because it is a more sensible default. RDF or OWL applications are not aware of the concept of “condensed slots”, so it is better to provide them with non-condensed sets.
  • Condensation is always disabled for the write_fhir_json and write_ontoportal_json, because they are deprecated and changing their behaviour is best avoided.

On the command line side, --no-propagate and --no-condense options are added to the following subcommands:

  • convert,
  • parse,
  • validate,
  • and merge.

I personally do not believe it would be worth it to add those options to every single command. For example, if you do not want the result of, say, sssom query to be condensed, you can always use convert afterwards with the --no-condense option. The need for disabling condensation should be rare anyway, so that should be enough. Still, open to other opinions about this.

closes #586
closes #606

Add a `propagate` parameter to all `parse_*` functions. When that
parameter is true, the parsing function will automatically propagate all
condensed slots in the parsed mapping set, prior to returning a
MappingSetDataFrame.

For the `parse_obographs_json` function, the parameter does not actually
make much sense (a mapping set extracted from a OBOGraph-JSON document
cannot be in a condensed state), but having all parsing functions accept
the same `propagate` parameter allows to keep using the
`get_parsing_function` logic without having to make a special case for
the functions that would not accept a `propagate` parameter. Likewise
for the `parse_alignment_xml` function.

The `propagate` parameter defaults to True -- this is the behaviour
recommended by the SSSOM specification --, except for the aforementioned
`parse_obographs_json` and `parse_alignment_xml` functions.

Propagation is normally performed by calling the
`MappingSetDataFrame#propagate()` method, but in the case of the
`parse_sssom_table()` function this cannot work, because propagation
needs to be performed before we can get a MappingSetDataFrame instance
(otherwise we could fail to construct a MappingSetDataFrame if the
mapping set contains literal mappings and the `subject_type` or
`object_type` slot is condensed). So we use a new
`propagate_condensed_slots()` method instead, which does not require a
MappingSetDataFrame object.

Conversely, we also add a `condense` parameter to all the `write_*`
functions, to perform the opposite operation prior to writing a set. The
parameter defaults to True for all writing functions, except `write_rdf`
since writing condensed slots in RDF is most likely not wanted.
Now that (most) parsing functions default to propagate condensed slots,
some tests need to be updated. Parsing-time propagation must be disabled
for:

* some tests that are performed on "half-condensed" test files and do
  not expect the structure of the set to be modified;
* the tests for the actual propagation/condensation feature (can't
  properly test propagation if the parser already did it for us).

In addition, we add a test to check that parsing-time condensation does
allow to successfully read a set containing literal mappings when the
object_type slot is condensed (mapping-commons#606).
Now that parsing and writing functions default to automatically
propagate and condense (respectively) sets, we add `--no-propagate`
and `--no-condense` options to some commands to alter the default
behaviour.

Specifically, we add such options to:

* `sssom convert`,
* `sssom parse`,
* `sssom validate`,
* and `sssom merge`.

It is currently not deemed useful (by me at least) to add those options
to every single command.
@gouttegd gouttegd self-assigned this Oct 11, 2025
@gouttegd gouttegd requested a review from matentzn October 11, 2025 22:01
Copy link
Collaborator

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fantastic enhancement of sssom py, thank you so much!! 🚀

I looked through all the changes, and could not spot any particularly worrying or surprising pieces. Much appreciated!

@gouttegd gouttegd merged commit 427f6e8 into mapping-commons:master Oct 13, 2025
6 checks passed
@gouttegd gouttegd deleted the auto-propagation branch October 13, 2025 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing support for literal mappings

2 participants