Skip to content

offset implementation#110

Merged
madanucd merged 2 commits intomainfrom
offset-implementation
Jan 15, 2026
Merged

offset implementation#110
madanucd merged 2 commits intomainfrom
offset-implementation

Conversation

@madanucd
Copy link
Copy Markdown
Collaborator

@madanucd madanucd commented Jan 8, 2026

Add declarative offset support for slot derivations (fixes #96)

This PR adds support for declarative numeric offsets in slot derivations, addressing
#96.

Offsets allow common baseline-relative calculations (e.g. age at measurement,
baseline subtraction) to be expressed directly in the mapping specification,
without requiring custom expressions. The approach is explicit, schema-driven,
and supports reversible transformations.


Offset fields

Field Description
offset_field Source field providing the numeric offset
offset_value Multiplier applied to the offset field (default: 1)
offset_reverse If true, subtracts instead of adding

Behavior

Offsets are applied only when:

  • populated_from is present
  • the offset_field exists in the source object
  • all involved values are numeric

Offsets are skipped when:

  • the slot has a constant value
  • the offset_field is missing
  • the derived value is None

Additional rules:

  • non-numeric values raise a TypeError
  • default behavior is additive

Tests

Three end-to-end tests were added at the ObjectTransformer.map_object level:

  1. Offset is skipped when the offset field is missing
  2. Offset is applied using addition
  3. Offset is applied using subtraction when offset_reverse is enabled

Pydantic model

Pydantic model used in this change was generated directly from the LinkML
schema using gen-pydantic.

@madanucd madanucd requested a review from amc-corey-cox January 8, 2026 11:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds declarative offset support for slot derivations in linkml-map, enabling baseline-relative calculations (e.g., age at measurement, baseline subtraction) to be expressed directly in mapping specifications without requiring custom expressions.

Key changes:

  • Adds Offset configuration class with offset_field, offset_value, and offset_reverse fields
  • Implements offset calculation logic in ObjectTransformer.map_object for the populated_from branch
  • Includes three end-to-end tests covering: offset skipping when field is missing, additive offset, and subtractive offset

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
src/linkml_map/datamodel/transformer_model.yaml Adds Offset class definition with required fields and documentation for the offset feature
src/linkml_map/datamodel/transformer_model.py Generated Pydantic model for the Offset class with proper field types and validation
src/linkml_map/transformer/object_transformer.py Implements offset calculation logic within the populated_from branch, applying numeric offsets when conditions are met
tests/test_transformer/test_object_transformer.py Adds three comprehensive tests covering offset behavior: missing field, addition, and subtraction scenarios

Comment on lines +303 to +328
# ---- OFFSET HANDLING (SAFE) ----
if getattr(slot_derivation, "offset", None) and v is not None:
off = slot_derivation.offset
off_field_val = source_obj.get(getattr(off, "offset_field", None), None)

if off_field_val is not None:
try:
delta = getattr(off, "offset_value", 0) * off_field_val
v = v - delta if getattr(off, "offset_reverse", False) else v + delta
logger.debug(
f"Offset calculation for '{slot_derivation.name}': "
f"{source_obj.get(slot_derivation.populated_from)} "
f"{'-' if getattr(off, 'offset_reverse', False) else '+'} "
f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}"
)
except (TypeError, ValueError) as e:
raise TypeError(
f"Cannot perform offset calculation for slot '{slot_derivation.name}': "
f"values must be numeric (base={v}, offset_field={off_field_val})"
) from e
else:
logger.debug(
f"Offset field '{getattr(off, 'offset_field', None)}' not found in source object; "
f"skipping offset calculation for '{slot_derivation.name}'"
)
# ---------------------------------
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The offset is applied after value_mappings (line 299-301), which means if value_mappings transforms a value to a non-numeric type (e.g., string), the offset calculation will fail with a TypeError. Consider documenting this behavior or adding validation to warn users that offset and value_mappings may not be compatible unless the mapped value is numeric.

Copilot uses AI. Check for mistakes.
Comment on lines +303 to +328
# ---- OFFSET HANDLING (SAFE) ----
if getattr(slot_derivation, "offset", None) and v is not None:
off = slot_derivation.offset
off_field_val = source_obj.get(getattr(off, "offset_field", None), None)

if off_field_val is not None:
try:
delta = getattr(off, "offset_value", 0) * off_field_val
v = v - delta if getattr(off, "offset_reverse", False) else v + delta
logger.debug(
f"Offset calculation for '{slot_derivation.name}': "
f"{source_obj.get(slot_derivation.populated_from)} "
f"{'-' if getattr(off, 'offset_reverse', False) else '+'} "
f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}"
)
except (TypeError, ValueError) as e:
raise TypeError(
f"Cannot perform offset calculation for slot '{slot_derivation.name}': "
f"values must be numeric (base={v}, offset_field={off_field_val})"
) from e
else:
logger.debug(
f"Offset field '{getattr(off, 'offset_field', None)}' not found in source object; "
f"skipping offset calculation for '{slot_derivation.name}'"
)
# ---------------------------------
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When populated_from resolves to a foreign key reference (lines 265-294), the offset is still applied to the referenced value. However, the offset_field is looked up in the source_obj (line 306), not in the referenced object. This means offset_field must exist in the current object, not the referenced one. This behavior may be confusing and should be documented, or the implementation should support looking up offset_field in the referenced object as well.

Copilot uses AI. Check for mistakes.
Comment on lines +707 to +824
def test_offset_skipped_when_offset_field_missing():
sb_source = SchemaBuilder()
sb_source.add_class("Person")
sb_source.add_defaults()

sb_target = SchemaBuilder()
sb_target.add_slot("age", range="integer")
sb_target.add_class("Person", slots=["age"])
sb_target.add_defaults()

transform_spec = {
"class_derivations": {
"Person": {
"populated_from": "Person",
"slot_derivations": {
"age": {
"value": 30,
"offset": {
"offset_field": "days",
"offset_value": 1
}
}
}
}
}
}

transformer = ObjectTransformer()
transformer.source_schemaview = SchemaView(sb_source.schema)
transformer.target_schemaview = SchemaView(sb_target.schema)
transformer.create_transformer_specification(transform_spec)

result = transformer.map_object({}, source_type="Person")

assert result["age"] == 30

def test_offset_applied_addition():
sb_source = SchemaBuilder()
sb_source.add_slot("days", range="integer")
sb_source.add_slot("age_at_start", range="integer")
sb_source.add_class("Person", slots=["days", "age_at_start"])
sb_source.add_defaults()

sb_target = SchemaBuilder()
sb_target.add_slot("age", range="integer")
sb_target.add_class("Person", slots=["age"])
sb_target.add_defaults()

transform_spec = {
"class_derivations": {
"Person": {
"populated_from": "Person",
"slot_derivations": {
"age": {
"populated_from": "age_at_start",
"offset": {
"offset_field": "days",
"offset_value": 1,
},
}
}
}
}
}

transformer = ObjectTransformer()
transformer.source_schemaview = SchemaView(sb_source.schema)
transformer.target_schemaview = SchemaView(sb_target.schema)
transformer.create_transformer_specification(transform_spec)

result = transformer.map_object(
{"age_at_start": 30, "days": 5},
source_type="Person"
)

assert result["age"] == 35

def test_offset_reverse_subtraction():
sb_source = SchemaBuilder()
sb_source.add_slot("days", range="integer")
sb_source.add_slot("age_at_measurement", range="integer")
sb_source.add_class("Person", slots=["days", "age_at_measurement"])
sb_source.add_defaults()

sb_target = SchemaBuilder()
sb_target.add_slot("age_at_start", range="integer")
sb_target.add_class("Person", slots=["age_at_start"])
sb_target.add_defaults()

transform_spec = {
"class_derivations": {
"Person": {
"populated_from": "Person",
"slot_derivations": {
"age_at_start": {
"populated_from": "age_at_measurement",
"offset": {
"offset_field": "days",
"offset_value": 1,
"offset_reverse": True,
},
}
}
}
}
}

transformer = ObjectTransformer()
transformer.source_schemaview = SchemaView(sb_source.schema)
transformer.target_schemaview = SchemaView(sb_target.schema)
transformer.create_transformer_specification(transform_spec)

result = transformer.map_object(
{"age_at_measurement": 30, "days": 5},
source_type="Person"
)

assert result["age_at_start"] == 25
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests don't cover the case where the base value from populated_from is None, which would correctly skip the offset (due to the v is not None check on line 304). While this behavior is correct, consider adding an explicit test case to document this expected behavior and prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +707 to +824
def test_offset_skipped_when_offset_field_missing():
sb_source = SchemaBuilder()
sb_source.add_class("Person")
sb_source.add_defaults()

sb_target = SchemaBuilder()
sb_target.add_slot("age", range="integer")
sb_target.add_class("Person", slots=["age"])
sb_target.add_defaults()

transform_spec = {
"class_derivations": {
"Person": {
"populated_from": "Person",
"slot_derivations": {
"age": {
"value": 30,
"offset": {
"offset_field": "days",
"offset_value": 1
}
}
}
}
}
}

transformer = ObjectTransformer()
transformer.source_schemaview = SchemaView(sb_source.schema)
transformer.target_schemaview = SchemaView(sb_target.schema)
transformer.create_transformer_specification(transform_spec)

result = transformer.map_object({}, source_type="Person")

assert result["age"] == 30

def test_offset_applied_addition():
sb_source = SchemaBuilder()
sb_source.add_slot("days", range="integer")
sb_source.add_slot("age_at_start", range="integer")
sb_source.add_class("Person", slots=["days", "age_at_start"])
sb_source.add_defaults()

sb_target = SchemaBuilder()
sb_target.add_slot("age", range="integer")
sb_target.add_class("Person", slots=["age"])
sb_target.add_defaults()

transform_spec = {
"class_derivations": {
"Person": {
"populated_from": "Person",
"slot_derivations": {
"age": {
"populated_from": "age_at_start",
"offset": {
"offset_field": "days",
"offset_value": 1,
},
}
}
}
}
}

transformer = ObjectTransformer()
transformer.source_schemaview = SchemaView(sb_source.schema)
transformer.target_schemaview = SchemaView(sb_target.schema)
transformer.create_transformer_specification(transform_spec)

result = transformer.map_object(
{"age_at_start": 30, "days": 5},
source_type="Person"
)

assert result["age"] == 35

def test_offset_reverse_subtraction():
sb_source = SchemaBuilder()
sb_source.add_slot("days", range="integer")
sb_source.add_slot("age_at_measurement", range="integer")
sb_source.add_class("Person", slots=["days", "age_at_measurement"])
sb_source.add_defaults()

sb_target = SchemaBuilder()
sb_target.add_slot("age_at_start", range="integer")
sb_target.add_class("Person", slots=["age_at_start"])
sb_target.add_defaults()

transform_spec = {
"class_derivations": {
"Person": {
"populated_from": "Person",
"slot_derivations": {
"age_at_start": {
"populated_from": "age_at_measurement",
"offset": {
"offset_field": "days",
"offset_value": 1,
"offset_reverse": True,
},
}
}
}
}
}

transformer = ObjectTransformer()
transformer.source_schemaview = SchemaView(sb_source.schema)
transformer.target_schemaview = SchemaView(sb_target.schema)
transformer.create_transformer_specification(transform_spec)

result = transformer.map_object(
{"age_at_measurement": 30, "days": 5},
source_type="Person"
)

assert result["age_at_start"] == 25
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests don't cover the case where non-numeric values are provided, which would trigger the TypeError exception handling in lines 318-322. Consider adding a test that verifies the appropriate error is raised when offset_field or the base value contains non-numeric data, to ensure the error handling works as expected.

Copilot uses AI. Check for mistakes.
Configuration for calculating a value by applying an offset to a baseline value.
The baseline value comes from the slot's populated_from field.
This is commonly used for longitudinal data where measurements are recorded relative to a baseline.
For example, calculating age_at_visit from age + (days * 1/365).
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation states that offset is "commonly used for longitudinal data where measurements are recorded relative to a baseline" and gives an example of "calculating age_at_visit from age + (days * 1/365)". However, this suggests days are being converted to years. The current test uses offset_value: 1 with integer days, which doesn't perform any unit conversion. Consider updating the documentation to clarify that offset_value can be used for unit conversion (e.g., 1/365) or as a simple multiplier (e.g., 1), or add an example that demonstrates the unit conversion use case.

Suggested change
For example, calculating age_at_visit from age + (days * 1/365).
The offset configuration can be used either as a simple multiplier (e.g., offset_value: 1 for
integer day offsets) or as a unit-conversion factor (e.g., offset_value: 1/365 to convert days
to years). For example, calculating age_at_visit (in years) from a baseline age (in years) and
days since baseline using age_at_visit = age + (days * 1/365).

Copilot uses AI. Check for mistakes.
mapped = slot_derivation.value_mappings.get(str(v), None)
v = mapped.value if mapped is not None else None

# ---- OFFSET HANDLING (SAFE) ----
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "OFFSET HANDLING (SAFE)" suggests this operation is safe, but there are several potential issues: (1) The calculation can raise TypeError/ValueError for non-numeric values, (2) The calculation can result in integer overflow or floating point precision issues, and (3) The getattr calls with defaults may mask configuration errors. Consider removing or clarifying the "SAFE" label, as it may give a false sense of security.

Suggested change
# ---- OFFSET HANDLING (SAFE) ----
# ---- OFFSET HANDLING ----

Copilot uses AI. Check for mistakes.
Comment on lines +303 to +328
# ---- OFFSET HANDLING (SAFE) ----
if getattr(slot_derivation, "offset", None) and v is not None:
off = slot_derivation.offset
off_field_val = source_obj.get(getattr(off, "offset_field", None), None)

if off_field_val is not None:
try:
delta = getattr(off, "offset_value", 0) * off_field_val
v = v - delta if getattr(off, "offset_reverse", False) else v + delta
logger.debug(
f"Offset calculation for '{slot_derivation.name}': "
f"{source_obj.get(slot_derivation.populated_from)} "
f"{'-' if getattr(off, 'offset_reverse', False) else '+'} "
f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}"
)
except (TypeError, ValueError) as e:
raise TypeError(
f"Cannot perform offset calculation for slot '{slot_derivation.name}': "
f"values must be numeric (base={v}, offset_field={off_field_val})"
) from e
else:
logger.debug(
f"Offset field '{getattr(off, 'offset_field', None)}' not found in source object; "
f"skipping offset calculation for '{slot_derivation.name}'"
)
# ---------------------------------
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the project's coding guidelines in CLAUDE.md, when modifying ObjectTransformer.map_object within a section marked with # EXTRACT: (like the _derive_from_populated_from section where this offset handling is added), the code should first be extracted to a private method with tests added in test_object_transformer_new.py using the @add_to_test_setup decorator and scaffold pattern. The current implementation adds the offset logic inline without following this pattern. Consider refactoring to extract the offset handling logic and follow the project's testing conventions.

Copilot generated this review using guidance from repository custom instructions.
Comment on lines +309 to +316
try:
delta = getattr(off, "offset_value", 0) * off_field_val
v = v - delta if getattr(off, "offset_reverse", False) else v + delta
logger.debug(
f"Offset calculation for '{slot_derivation.name}': "
f"{source_obj.get(slot_derivation.populated_from)} "
f"{'-' if getattr(off, 'offset_reverse', False) else '+'} "
f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}"
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using getattr with a default of 0 for offset_value is problematic because the schema defines offset_value as required. If the value is missing, it should raise an error rather than silently defaulting to 0, which would result in no offset being applied. This could mask configuration errors. Consider removing the default or using a sentinel value that raises an error.

Suggested change
try:
delta = getattr(off, "offset_value", 0) * off_field_val
v = v - delta if getattr(off, "offset_reverse", False) else v + delta
logger.debug(
f"Offset calculation for '{slot_derivation.name}': "
f"{source_obj.get(slot_derivation.populated_from)} "
f"{'-' if getattr(off, 'offset_reverse', False) else '+'} "
f"({getattr(off, 'offset_value', 0)} * {off_field_val}) = {v}"
# offset_value is required by the schema; do not default to 0
if not hasattr(off, "offset_value"):
raise AttributeError(
f"Missing required 'offset_value' for offset configuration on slot "
f"'{slot_derivation.name}'"
)
offset_val = off.offset_value
try:
delta = offset_val * off_field_val
v = v - delta if getattr(off, "offset_reverse", False) else v + delta
logger.debug(
f"Offset calculation for '{slot_derivation.name}': "
f"{source_obj.get(slot_derivation.populated_from)} "
f"{'-' if getattr(off, 'offset_reverse', False) else '+'} "
f"({offset_val} * {off_field_val}) = {v}"

Copilot uses AI. Check for mistakes.
Comment on lines +319 to +321
raise TypeError(
f"Cannot perform offset calculation for slot '{slot_derivation.name}': "
f"values must be numeric (base={v}, offset_field={off_field_val})"
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message references the wrong variable. The error says base={v} but at this point in the code, v has already been modified by the offset calculation in line 311, so it no longer contains the original base value. Consider capturing the original value before the calculation or referencing source_obj.get(slot_derivation.populated_from) instead.

Copilot uses AI. Check for mistakes.
Comment on lines +306 to +311
off_field_val = source_obj.get(getattr(off, "offset_field", None), None)

if off_field_val is not None:
try:
delta = getattr(off, "offset_value", 0) * off_field_val
v = v - delta if getattr(off, "offset_reverse", False) else v + delta
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using getattr with defaults for offset_field and offset_reverse is inconsistent with the schema definition. The schema defines offset_field as required, so it should always be present. Using None as a default could mask configuration errors. For offset_reverse, the default of False is correct per the schema, but consider accessing it directly via dot notation since it's a proper Pydantic model attribute.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@amc-corey-cox amc-corey-cox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me pending your response to the Co-Pilot nit-picks.

@madanucd
Copy link
Copy Markdown
Collaborator Author

I have reviewed all suggestions, including the implementation level ones regarding extraction patterns and getattr defaults. All of these are style or optional strictness considerations and do not affect correctness, schema compliance, or the intended behavior of the offset logic. Existing tests cover the core functionality, so we are proceeding with the merge as is. Optional improvements such as refactoring, additional defensive checks, or documentation clarifications can be addressed in follow-up changes if needed.

@madanucd madanucd merged commit 26193a8 into main Jan 15, 2026
14 checks passed
@madanucd madanucd deleted the offset-implementation branch January 15, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement offset calculation

3 participants