Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds declarative offset support for slot derivations in linkml-map, enabling baseline-relative calculations (e.g., age at measurement, baseline subtraction) to be expressed directly in mapping specifications without requiring custom expressions.
Key changes:
- Adds
Offsetconfiguration class withoffset_field,offset_value, andoffset_reversefields - Implements offset calculation logic in
ObjectTransformer.map_objectfor thepopulated_frombranch - Includes three end-to-end tests covering: offset skipping when field is missing, additive offset, and subtractive offset
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
src/linkml_map/datamodel/transformer_model.yaml |
Adds Offset class definition with required fields and documentation for the offset feature |
src/linkml_map/datamodel/transformer_model.py |
Generated Pydantic model for the Offset class with proper field types and validation |
src/linkml_map/transformer/object_transformer.py |
Implements offset calculation logic within the populated_from branch, applying numeric offsets when conditions are met |
tests/test_transformer/test_object_transformer.py |
Adds three comprehensive tests covering offset behavior: missing field, addition, and subtraction scenarios |
| # ---- OFFSET HANDLING (SAFE) ---- | ||
| if getattr(slot_derivation, "offset", None) and v is not None: | ||
| off = slot_derivation.offset | ||
| off_field_val = source_obj.get(getattr(off, "offset_field", None), None) | ||
|
|
||
| if off_field_val is not None: | ||
| try: | ||
| delta = getattr(off, "offset_value", 0) * off_field_val | ||
| v = v - delta if getattr(off, "offset_reverse", False) else v + delta | ||
| logger.debug( | ||
| f"Offset calculation for '{slot_derivation.name}': " | ||
| f"{source_obj.get(slot_derivation.populated_from)} " | ||
| f"{'-' if getattr(off, 'offset_reverse', False) else '+'} " | ||
| f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}" | ||
| ) | ||
| except (TypeError, ValueError) as e: | ||
| raise TypeError( | ||
| f"Cannot perform offset calculation for slot '{slot_derivation.name}': " | ||
| f"values must be numeric (base={v}, offset_field={off_field_val})" | ||
| ) from e | ||
| else: | ||
| logger.debug( | ||
| f"Offset field '{getattr(off, 'offset_field', None)}' not found in source object; " | ||
| f"skipping offset calculation for '{slot_derivation.name}'" | ||
| ) | ||
| # --------------------------------- |
There was a problem hiding this comment.
The offset is applied after value_mappings (line 299-301), which means if value_mappings transforms a value to a non-numeric type (e.g., string), the offset calculation will fail with a TypeError. Consider documenting this behavior or adding validation to warn users that offset and value_mappings may not be compatible unless the mapped value is numeric.
| # ---- OFFSET HANDLING (SAFE) ---- | ||
| if getattr(slot_derivation, "offset", None) and v is not None: | ||
| off = slot_derivation.offset | ||
| off_field_val = source_obj.get(getattr(off, "offset_field", None), None) | ||
|
|
||
| if off_field_val is not None: | ||
| try: | ||
| delta = getattr(off, "offset_value", 0) * off_field_val | ||
| v = v - delta if getattr(off, "offset_reverse", False) else v + delta | ||
| logger.debug( | ||
| f"Offset calculation for '{slot_derivation.name}': " | ||
| f"{source_obj.get(slot_derivation.populated_from)} " | ||
| f"{'-' if getattr(off, 'offset_reverse', False) else '+'} " | ||
| f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}" | ||
| ) | ||
| except (TypeError, ValueError) as e: | ||
| raise TypeError( | ||
| f"Cannot perform offset calculation for slot '{slot_derivation.name}': " | ||
| f"values must be numeric (base={v}, offset_field={off_field_val})" | ||
| ) from e | ||
| else: | ||
| logger.debug( | ||
| f"Offset field '{getattr(off, 'offset_field', None)}' not found in source object; " | ||
| f"skipping offset calculation for '{slot_derivation.name}'" | ||
| ) | ||
| # --------------------------------- |
There was a problem hiding this comment.
When populated_from resolves to a foreign key reference (lines 265-294), the offset is still applied to the referenced value. However, the offset_field is looked up in the source_obj (line 306), not in the referenced object. This means offset_field must exist in the current object, not the referenced one. This behavior may be confusing and should be documented, or the implementation should support looking up offset_field in the referenced object as well.
| def test_offset_skipped_when_offset_field_missing(): | ||
| sb_source = SchemaBuilder() | ||
| sb_source.add_class("Person") | ||
| sb_source.add_defaults() | ||
|
|
||
| sb_target = SchemaBuilder() | ||
| sb_target.add_slot("age", range="integer") | ||
| sb_target.add_class("Person", slots=["age"]) | ||
| sb_target.add_defaults() | ||
|
|
||
| transform_spec = { | ||
| "class_derivations": { | ||
| "Person": { | ||
| "populated_from": "Person", | ||
| "slot_derivations": { | ||
| "age": { | ||
| "value": 30, | ||
| "offset": { | ||
| "offset_field": "days", | ||
| "offset_value": 1 | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| transformer = ObjectTransformer() | ||
| transformer.source_schemaview = SchemaView(sb_source.schema) | ||
| transformer.target_schemaview = SchemaView(sb_target.schema) | ||
| transformer.create_transformer_specification(transform_spec) | ||
|
|
||
| result = transformer.map_object({}, source_type="Person") | ||
|
|
||
| assert result["age"] == 30 | ||
|
|
||
| def test_offset_applied_addition(): | ||
| sb_source = SchemaBuilder() | ||
| sb_source.add_slot("days", range="integer") | ||
| sb_source.add_slot("age_at_start", range="integer") | ||
| sb_source.add_class("Person", slots=["days", "age_at_start"]) | ||
| sb_source.add_defaults() | ||
|
|
||
| sb_target = SchemaBuilder() | ||
| sb_target.add_slot("age", range="integer") | ||
| sb_target.add_class("Person", slots=["age"]) | ||
| sb_target.add_defaults() | ||
|
|
||
| transform_spec = { | ||
| "class_derivations": { | ||
| "Person": { | ||
| "populated_from": "Person", | ||
| "slot_derivations": { | ||
| "age": { | ||
| "populated_from": "age_at_start", | ||
| "offset": { | ||
| "offset_field": "days", | ||
| "offset_value": 1, | ||
| }, | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| transformer = ObjectTransformer() | ||
| transformer.source_schemaview = SchemaView(sb_source.schema) | ||
| transformer.target_schemaview = SchemaView(sb_target.schema) | ||
| transformer.create_transformer_specification(transform_spec) | ||
|
|
||
| result = transformer.map_object( | ||
| {"age_at_start": 30, "days": 5}, | ||
| source_type="Person" | ||
| ) | ||
|
|
||
| assert result["age"] == 35 | ||
|
|
||
| def test_offset_reverse_subtraction(): | ||
| sb_source = SchemaBuilder() | ||
| sb_source.add_slot("days", range="integer") | ||
| sb_source.add_slot("age_at_measurement", range="integer") | ||
| sb_source.add_class("Person", slots=["days", "age_at_measurement"]) | ||
| sb_source.add_defaults() | ||
|
|
||
| sb_target = SchemaBuilder() | ||
| sb_target.add_slot("age_at_start", range="integer") | ||
| sb_target.add_class("Person", slots=["age_at_start"]) | ||
| sb_target.add_defaults() | ||
|
|
||
| transform_spec = { | ||
| "class_derivations": { | ||
| "Person": { | ||
| "populated_from": "Person", | ||
| "slot_derivations": { | ||
| "age_at_start": { | ||
| "populated_from": "age_at_measurement", | ||
| "offset": { | ||
| "offset_field": "days", | ||
| "offset_value": 1, | ||
| "offset_reverse": True, | ||
| }, | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| transformer = ObjectTransformer() | ||
| transformer.source_schemaview = SchemaView(sb_source.schema) | ||
| transformer.target_schemaview = SchemaView(sb_target.schema) | ||
| transformer.create_transformer_specification(transform_spec) | ||
|
|
||
| result = transformer.map_object( | ||
| {"age_at_measurement": 30, "days": 5}, | ||
| source_type="Person" | ||
| ) | ||
|
|
||
| assert result["age_at_start"] == 25 |
There was a problem hiding this comment.
The tests don't cover the case where the base value from populated_from is None, which would correctly skip the offset (due to the v is not None check on line 304). While this behavior is correct, consider adding an explicit test case to document this expected behavior and prevent regressions.
| def test_offset_skipped_when_offset_field_missing(): | ||
| sb_source = SchemaBuilder() | ||
| sb_source.add_class("Person") | ||
| sb_source.add_defaults() | ||
|
|
||
| sb_target = SchemaBuilder() | ||
| sb_target.add_slot("age", range="integer") | ||
| sb_target.add_class("Person", slots=["age"]) | ||
| sb_target.add_defaults() | ||
|
|
||
| transform_spec = { | ||
| "class_derivations": { | ||
| "Person": { | ||
| "populated_from": "Person", | ||
| "slot_derivations": { | ||
| "age": { | ||
| "value": 30, | ||
| "offset": { | ||
| "offset_field": "days", | ||
| "offset_value": 1 | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| transformer = ObjectTransformer() | ||
| transformer.source_schemaview = SchemaView(sb_source.schema) | ||
| transformer.target_schemaview = SchemaView(sb_target.schema) | ||
| transformer.create_transformer_specification(transform_spec) | ||
|
|
||
| result = transformer.map_object({}, source_type="Person") | ||
|
|
||
| assert result["age"] == 30 | ||
|
|
||
| def test_offset_applied_addition(): | ||
| sb_source = SchemaBuilder() | ||
| sb_source.add_slot("days", range="integer") | ||
| sb_source.add_slot("age_at_start", range="integer") | ||
| sb_source.add_class("Person", slots=["days", "age_at_start"]) | ||
| sb_source.add_defaults() | ||
|
|
||
| sb_target = SchemaBuilder() | ||
| sb_target.add_slot("age", range="integer") | ||
| sb_target.add_class("Person", slots=["age"]) | ||
| sb_target.add_defaults() | ||
|
|
||
| transform_spec = { | ||
| "class_derivations": { | ||
| "Person": { | ||
| "populated_from": "Person", | ||
| "slot_derivations": { | ||
| "age": { | ||
| "populated_from": "age_at_start", | ||
| "offset": { | ||
| "offset_field": "days", | ||
| "offset_value": 1, | ||
| }, | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| transformer = ObjectTransformer() | ||
| transformer.source_schemaview = SchemaView(sb_source.schema) | ||
| transformer.target_schemaview = SchemaView(sb_target.schema) | ||
| transformer.create_transformer_specification(transform_spec) | ||
|
|
||
| result = transformer.map_object( | ||
| {"age_at_start": 30, "days": 5}, | ||
| source_type="Person" | ||
| ) | ||
|
|
||
| assert result["age"] == 35 | ||
|
|
||
| def test_offset_reverse_subtraction(): | ||
| sb_source = SchemaBuilder() | ||
| sb_source.add_slot("days", range="integer") | ||
| sb_source.add_slot("age_at_measurement", range="integer") | ||
| sb_source.add_class("Person", slots=["days", "age_at_measurement"]) | ||
| sb_source.add_defaults() | ||
|
|
||
| sb_target = SchemaBuilder() | ||
| sb_target.add_slot("age_at_start", range="integer") | ||
| sb_target.add_class("Person", slots=["age_at_start"]) | ||
| sb_target.add_defaults() | ||
|
|
||
| transform_spec = { | ||
| "class_derivations": { | ||
| "Person": { | ||
| "populated_from": "Person", | ||
| "slot_derivations": { | ||
| "age_at_start": { | ||
| "populated_from": "age_at_measurement", | ||
| "offset": { | ||
| "offset_field": "days", | ||
| "offset_value": 1, | ||
| "offset_reverse": True, | ||
| }, | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| transformer = ObjectTransformer() | ||
| transformer.source_schemaview = SchemaView(sb_source.schema) | ||
| transformer.target_schemaview = SchemaView(sb_target.schema) | ||
| transformer.create_transformer_specification(transform_spec) | ||
|
|
||
| result = transformer.map_object( | ||
| {"age_at_measurement": 30, "days": 5}, | ||
| source_type="Person" | ||
| ) | ||
|
|
||
| assert result["age_at_start"] == 25 |
There was a problem hiding this comment.
The tests don't cover the case where non-numeric values are provided, which would trigger the TypeError exception handling in lines 318-322. Consider adding a test that verifies the appropriate error is raised when offset_field or the base value contains non-numeric data, to ensure the error handling works as expected.
| Configuration for calculating a value by applying an offset to a baseline value. | ||
| The baseline value comes from the slot's populated_from field. | ||
| This is commonly used for longitudinal data where measurements are recorded relative to a baseline. | ||
| For example, calculating age_at_visit from age + (days * 1/365). |
There was a problem hiding this comment.
The documentation states that offset is "commonly used for longitudinal data where measurements are recorded relative to a baseline" and gives an example of "calculating age_at_visit from age + (days * 1/365)". However, this suggests days are being converted to years. The current test uses offset_value: 1 with integer days, which doesn't perform any unit conversion. Consider updating the documentation to clarify that offset_value can be used for unit conversion (e.g., 1/365) or as a simple multiplier (e.g., 1), or add an example that demonstrates the unit conversion use case.
| For example, calculating age_at_visit from age + (days * 1/365). | |
| The offset configuration can be used either as a simple multiplier (e.g., offset_value: 1 for | |
| integer day offsets) or as a unit-conversion factor (e.g., offset_value: 1/365 to convert days | |
| to years). For example, calculating age_at_visit (in years) from a baseline age (in years) and | |
| days since baseline using age_at_visit = age + (days * 1/365). |
| mapped = slot_derivation.value_mappings.get(str(v), None) | ||
| v = mapped.value if mapped is not None else None | ||
|
|
||
| # ---- OFFSET HANDLING (SAFE) ---- |
There was a problem hiding this comment.
The comment "OFFSET HANDLING (SAFE)" suggests this operation is safe, but there are several potential issues: (1) The calculation can raise TypeError/ValueError for non-numeric values, (2) The calculation can result in integer overflow or floating point precision issues, and (3) The getattr calls with defaults may mask configuration errors. Consider removing or clarifying the "SAFE" label, as it may give a false sense of security.
| # ---- OFFSET HANDLING (SAFE) ---- | |
| # ---- OFFSET HANDLING ---- |
| # ---- OFFSET HANDLING (SAFE) ---- | ||
| if getattr(slot_derivation, "offset", None) and v is not None: | ||
| off = slot_derivation.offset | ||
| off_field_val = source_obj.get(getattr(off, "offset_field", None), None) | ||
|
|
||
| if off_field_val is not None: | ||
| try: | ||
| delta = getattr(off, "offset_value", 0) * off_field_val | ||
| v = v - delta if getattr(off, "offset_reverse", False) else v + delta | ||
| logger.debug( | ||
| f"Offset calculation for '{slot_derivation.name}': " | ||
| f"{source_obj.get(slot_derivation.populated_from)} " | ||
| f"{'-' if getattr(off, 'offset_reverse', False) else '+'} " | ||
| f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}" | ||
| ) | ||
| except (TypeError, ValueError) as e: | ||
| raise TypeError( | ||
| f"Cannot perform offset calculation for slot '{slot_derivation.name}': " | ||
| f"values must be numeric (base={v}, offset_field={off_field_val})" | ||
| ) from e | ||
| else: | ||
| logger.debug( | ||
| f"Offset field '{getattr(off, 'offset_field', None)}' not found in source object; " | ||
| f"skipping offset calculation for '{slot_derivation.name}'" | ||
| ) | ||
| # --------------------------------- |
There was a problem hiding this comment.
According to the project's coding guidelines in CLAUDE.md, when modifying ObjectTransformer.map_object within a section marked with # EXTRACT: (like the _derive_from_populated_from section where this offset handling is added), the code should first be extracted to a private method with tests added in test_object_transformer_new.py using the @add_to_test_setup decorator and scaffold pattern. The current implementation adds the offset logic inline without following this pattern. Consider refactoring to extract the offset handling logic and follow the project's testing conventions.
| try: | ||
| delta = getattr(off, "offset_value", 0) * off_field_val | ||
| v = v - delta if getattr(off, "offset_reverse", False) else v + delta | ||
| logger.debug( | ||
| f"Offset calculation for '{slot_derivation.name}': " | ||
| f"{source_obj.get(slot_derivation.populated_from)} " | ||
| f"{'-' if getattr(off, 'offset_reverse', False) else '+'} " | ||
| f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}" |
There was a problem hiding this comment.
Using getattr with a default of 0 for offset_value is problematic because the schema defines offset_value as required. If the value is missing, it should raise an error rather than silently defaulting to 0, which would result in no offset being applied. This could mask configuration errors. Consider removing the default or using a sentinel value that raises an error.
| try: | |
| delta = getattr(off, "offset_value", 0) * off_field_val | |
| v = v - delta if getattr(off, "offset_reverse", False) else v + delta | |
| logger.debug( | |
| f"Offset calculation for '{slot_derivation.name}': " | |
| f"{source_obj.get(slot_derivation.populated_from)} " | |
| f"{'-' if getattr(off, 'offset_reverse', False) else '+'} " | |
| f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}" | |
| # offset_value is required by the schema; do not default to 0 | |
| if not hasattr(off, "offset_value"): | |
| raise AttributeError( | |
| f"Missing required 'offset_value' for offset configuration on slot " | |
| f"'{slot_derivation.name}'" | |
| ) | |
| offset_val = off.offset_value | |
| try: | |
| delta = offset_val * off_field_val | |
| v = v - delta if getattr(off, "offset_reverse", False) else v + delta | |
| logger.debug( | |
| f"Offset calculation for '{slot_derivation.name}': " | |
| f"{source_obj.get(slot_derivation.populated_from)} " | |
| f"{'-' if getattr(off, 'offset_reverse', False) else '+'} " | |
| f"({offset_val} * {off_field_val}) = {v}" |
| raise TypeError( | ||
| f"Cannot perform offset calculation for slot '{slot_derivation.name}': " | ||
| f"values must be numeric (base={v}, offset_field={off_field_val})" |
There was a problem hiding this comment.
The error message references the wrong variable. The error says base={v} but at this point in the code, v has already been modified by the offset calculation in line 311, so it no longer contains the original base value. Consider capturing the original value before the calculation or referencing source_obj.get(slot_derivation.populated_from) instead.
| off_field_val = source_obj.get(getattr(off, "offset_field", None), None) | ||
|
|
||
| if off_field_val is not None: | ||
| try: | ||
| delta = getattr(off, "offset_value", 0) * off_field_val | ||
| v = v - delta if getattr(off, "offset_reverse", False) else v + delta |
There was a problem hiding this comment.
Using getattr with defaults for offset_field and offset_reverse is inconsistent with the schema definition. The schema defines offset_field as required, so it should always be present. Using None as a default could mask configuration errors. For offset_reverse, the default of False is correct per the schema, but consider accessing it directly via dot notation since it's a proper Pydantic model attribute.
amc-corey-cox
left a comment
There was a problem hiding this comment.
This looks good to me pending your response to the Co-Pilot nit-picks.
|
I have reviewed all suggestions, including the implementation level ones regarding extraction patterns and getattr defaults. All of these are style or optional strictness considerations and do not affect correctness, schema compliance, or the intended behavior of the offset logic. Existing tests cover the core functionality, so we are proceeding with the merge as is. Optional improvements such as refactoring, additional defensive checks, or documentation clarifications can be addressed in follow-up changes if needed. |
Add declarative offset support for slot derivations (fixes #96)
This PR adds support for declarative numeric offsets in slot derivations, addressing
#96.
Offsets allow common baseline-relative calculations (e.g. age at measurement,
baseline subtraction) to be expressed directly in the mapping specification,
without requiring custom expressions. The approach is explicit, schema-driven,
and supports reversible transformations.
Offset fields
offset_fieldoffset_value1)offset_reversetrue, subtracts instead of addingBehavior
Offsets are applied only when:
populated_fromis presentoffset_fieldexists in the source objectOffsets are skipped when:
valueoffset_fieldis missingNoneAdditional rules:
TypeErrorTests
Three end-to-end tests were added at the
ObjectTransformer.map_objectlevel:offset_reverseis enabledPydantic model
Pydantic model used in this change was generated directly from the LinkML
schema using
gen-pydantic.