NMDC: Document string/scalar-to-structured measurement transforms (tests only) by turbomam · Pull Request #131 · linkml/linkml-map

turbomam · 2026-02-26T14:54:50Z

Closes #129

NMDC Relevance

This test documents the inverse NMDC pattern: transforming string or split scalar inputs into structured measurement objects (for example depth -> QuantityValue).

What This PR Adds

Test-only coverage for string-to-object range override behavior using expr.
No code changes; validates existing behavior on current main.

Coverage

Baseline string-to-string passthrough
Object construction from separate fields
Parsing composite strings like "5 m" into object fields

Why It Matters

For NMDC schema-integration workflows (including MIxS-adjacent data), this confirms linkml-map can already express string/scalar -> structured-object transforms without engine changes.

Copilot

Pull request overview

This PR adds comprehensive tests demonstrating linkml-map's capability to transform slots from string ranges to structured object ranges (e.g., depth: string → depth: QuantityValue). This pattern is common in schema integration projects where upstream schemas use simple strings for values that downstream schemas model as structured objects.

Changes:

Adds three test scenarios: string-to-string passthrough (baseline), constructing objects from separate fields via expressions, and parsing composite strings into structured objects
Documents that dict construction via expr works without code changes
Complements PR #130's inverse pattern (object-to-scalar flattening)

Copilot · 2026-02-26T14:59:34Z

+    assert result["depth"] == {
+        "has_numeric_value": 5.0,
+        "has_unit": "m",
+    }


The test_parse_string_into_object test parses a string with split(" ") but does not validate edge cases. Consider adding tests for:

Empty string input (e.g., depth: "")

Null/None input (e.g., depth: null)

Malformed strings (e.g., depth: "5" without unit, depth: "five m" with non-numeric value, depth: "5 m" with multiple spaces)

Strings with extra whitespace (e.g., depth: " 5 m ")

Without validation, these cases could cause IndexError or ValueError exceptions at runtime. While the current test documents the happy path, adding edge case tests would ensure robust handling of real-world data where input quality may vary.

Suggested change

}

}

def test_parse_string_into_object_empty_and_null_depth() -> None:

"""Edge cases: empty string or None depth should not parse as valid."""

for depth in ("", None):

with pytest.raises((TypeError, ValueError, IndexError)):

_run(

source_schema=_source_schema_string(),

target_schema=_target_schema_quantity(),

transform_spec=TRANSFORM_PARSE,

input_data={"id": "samp1", "depth": depth},

source_type="StringSample",

)

def test_parse_string_into_object_malformed_and_whitespace() -> None:

"""Edge cases: malformed or oddly spaced depth strings."""

# Missing unit, non-numeric value, multiple internal spaces, extra outer spaces

malformed_depth_values = [

"5", # no unit

"five m", # non-numeric value

"5 m", # multiple spaces between value and unit

" 5 m ", # leading/trailing whitespace that split(' ') may mishandle

]

for depth in malformed_depth_values:

with pytest.raises((TypeError, ValueError, IndexError)):

_run(

source_schema=_source_schema_string(),

target_schema=_target_schema_quantity(),

transform_spec=TRANSFORM_PARSE,

input_data={"id": "samp1", "depth": depth},

source_type="StringSample",

)

Copilot · 2026-02-26T14:59:34Z

+                        '{"has_numeric_value": float(depth.split(" ")[0]),'
+                        ' "has_unit": depth.split(" ")[1]}'
+                    ),


The expression in line 114 calls split(" ") twice on the same string, which is inefficient. While this works correctly, consider documenting or testing whether the expression evaluator supports variable assignment within expressions (e.g., parts = depth.split(" ")) to avoid redundant string operations. If variable assignment is not supported, this duplication is necessary but worth noting in a comment for maintainability.

Copilot · 2026-02-26T14:59:35Z

+        "has_unit": "m",
+    }
+
+


The test constructs a QuantityValue object using float() conversion, but the test only validates the happy path. Consider adding a test case for test_construct_object_from_subfields where depth_value contains a non-numeric string (e.g., "five") to document the expected behavior. Should this raise an error during transformation, or should it be validated before the transformation? This would help clarify the contract for users of this pattern.

Suggested change

def test_construct_object_from_subfields_non_numeric() -> None:

"""Non-numeric depth_value should raise an error during transformation."""

with pytest.raises(ValueError):

_run(

source_schema=_source_schema_flat(),

target_schema=_target_schema_quantity(),

transform_spec=TRANSFORM_CONSTRUCT,

input_data={"id": "samp1", "depth_value": "five", "depth_unit": "m"},

source_type="FlatSample",

)

Copilot · 2026-02-26T14:59:35Z

+
+
+# ---------------------------------------------------------------------------


The PR description notes that expressions returning dictionaries are not validated against the target schema. Consider adding a test case that explicitly validates this behavior - for example, an expression that returns {"wrong_key": 5.0} for a QuantityValue target - to document the current behavior. This would make it easier to track when/if schema validation of expression outputs is implemented in the future, and helps users understand the current limitations.

Suggested change

# ---------------------------------------------------------------------------

# Transformation where the expr returns a dict that does NOT match the

# QuantityValue schema. This documents that expression outputs are not

# currently validated against the target schema.

TRANSFORM_BAD_QUANTITY: dict[str, Any] = {

"class_derivations": {

"StructuredSample": {

"populated_from": "StringSample",

"slot_derivations": {

"id": {},

"depth": {

# Deliberately return a dict with the wrong key for a

# QuantityValue target; current behavior is that this

# passes through without schema validation.

"expr": '{"wrong_key": 5.0}',

},

},

},

},

}

# ---------------------------------------------------------------------------

# Tests documenting current behavior for dict-valued expressions

# ---------------------------------------------------------------------------

def test_dict_expression_not_validated_against_target_schema() -> None:

"""Expression dict outputs are not schema-validated for QuantityValue.

This test encodes the current limitation: an expr that returns a dict

with an unexpected key for a QuantityValue target is accepted and the

dict is used as-is, rather than being validated against the

QuantityValue schema (which would require has_numeric_value/has_unit).

"""

source_schema = _source_schema_string()

target_schema = _target_schema_quantity()

result = _run(

source_schema=source_schema,

target_schema=target_schema,

transform_spec=TRANSFORM_BAD_QUANTITY,

input_data={"id": "sample1", "depth": "5 m"},

source_type="StringSample",

)

# The transform should succeed and produce the unvalidated dict.

assert "depth" in result

assert isinstance(result["depth"], dict)

assert "wrong_key" in result["depth"]

# And it should *not* be coerced into the expected QuantityValue shape.

assert "has_numeric_value" not in result["depth"]

assert "has_unit" not in result["depth"]

# ---------------------------------------------------------------------------

- Fix spec mutation: use copy.deepcopy() on transform specs before passing to create_transformer_specification() - Add parametrized tests for malformed parse inputs (null, empty, no-unit, non-numeric) — all yield None via simpleeval error handling - Add test for non-numeric depth_value in construct expr - Add test documenting validation gap: wrong dict keys pass through without target schema validation

amc-corey-cox

Looks like co-pilot has a bunch of false positives. This looks good to me.

Add tests for string-to-object range override patterns (#129)

ae2749a

Copilot AI review requested due to automatic review settings February 26, 2026 14:54

Copilot started reviewing on behalf of turbomam February 26, 2026 14:55 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

turbomam changed the title ~~Add tests for string-to-object range override patterns~~ NMDC: Document string/scalar-to-structured measurement transforms (tests only) Feb 26, 2026

amc-corey-cox approved these changes Mar 9, 2026

View reviewed changes

amc-corey-cox merged commit 8c03485 into main Mar 9, 2026
10 checks passed

amc-corey-cox deleted the test-range-override-string-to-object branch March 9, 2026 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NMDC: Document string/scalar-to-structured measurement transforms (tests only)#131

NMDC: Document string/scalar-to-structured measurement transforms (tests only)#131
amc-corey-cox merged 2 commits intomainfrom
test-range-override-string-to-object

turbomam commented Feb 26, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

amc-corey-cox left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-    }
+    }
+def test_parse_string_into_object_empty_and_null_depth() -> None:
+    """Edge cases: empty string or None depth should not parse as valid."""
+    for depth in ("", None):
+        with pytest.raises((TypeError, ValueError, IndexError)):
+            _run(
+                source_schema=_source_schema_string(),
+                target_schema=_target_schema_quantity(),
+                transform_spec=TRANSFORM_PARSE,
+                input_data={"id": "samp1", "depth": depth},
+                source_type="StringSample",
+            )
+def test_parse_string_into_object_malformed_and_whitespace() -> None:
+    """Edge cases: malformed or oddly spaced depth strings."""
+    # Missing unit, non-numeric value, multiple internal spaces, extra outer spaces
+    malformed_depth_values = [
+        "5",       # no unit
+        "five m",  # non-numeric value
+        "5  m",    # multiple spaces between value and unit
+        " 5 m ",   # leading/trailing whitespace that split(' ') may mishandle
+    ]
+    for depth in malformed_depth_values:
+        with pytest.raises((TypeError, ValueError, IndexError)):
+            _run(
+                source_schema=_source_schema_string(),
+                target_schema=_target_schema_quantity(),
+                transform_spec=TRANSFORM_PARSE,
+                input_data={"id": "samp1", "depth": depth},
+                source_type="StringSample",
+            )

+def test_construct_object_from_subfields_non_numeric() -> None:
+    """Non-numeric depth_value should raise an error during transformation."""
+    with pytest.raises(ValueError):
+        _run(
+            source_schema=_source_schema_flat(),
+            target_schema=_target_schema_quantity(),
+            transform_spec=TRANSFORM_CONSTRUCT,
+            input_data={"id": "samp1", "depth_value": "five", "depth_unit": "m"},
+            source_type="FlatSample",
+        )



		# ---------------------------------------------------------------------------

-# ---------------------------------------------------------------------------
+# Transformation where the expr returns a dict that does NOT match the
+# QuantityValue schema. This documents that expression outputs are not
+# currently validated against the target schema.
+TRANSFORM_BAD_QUANTITY: dict[str, Any] = {
+    "class_derivations": {
+        "StructuredSample": {
+            "populated_from": "StringSample",
+            "slot_derivations": {
+                "id": {},
+                "depth": {
+                    # Deliberately return a dict with the wrong key for a
+                    # QuantityValue target; current behavior is that this
+                    # passes through without schema validation.
+                    "expr": '{"wrong_key": 5.0}',
+                },
+            },
+        },
+    },
+}
+# ---------------------------------------------------------------------------
+# Tests documenting current behavior for dict-valued expressions
+# ---------------------------------------------------------------------------
+def test_dict_expression_not_validated_against_target_schema() -> None:
+    """Expression dict outputs are not schema-validated for QuantityValue.
+    This test encodes the current limitation: an expr that returns a dict
+    with an unexpected key for a QuantityValue target is accepted and the
+    dict is used as-is, rather than being validated against the
+    QuantityValue schema (which would require has_numeric_value/has_unit).
+    """
+    source_schema = _source_schema_string()
+    target_schema = _target_schema_quantity()
+    result = _run(
+        source_schema=source_schema,
+        target_schema=target_schema,
+        transform_spec=TRANSFORM_BAD_QUANTITY,
+        input_data={"id": "sample1", "depth": "5 m"},
+        source_type="StringSample",
+    )
+    # The transform should succeed and produce the unvalidated dict.
+    assert "depth" in result
+    assert isinstance(result["depth"], dict)
+    assert "wrong_key" in result["depth"]
+    # And it should *not* be coerced into the expected QuantityValue shape.
+    assert "has_numeric_value" not in result["depth"]
+    assert "has_unit" not in result["depth"]
+# ---------------------------------------------------------------------------

Conversation

turbomam commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NMDC Relevance

What This PR Adds

Coverage

Why It Matters

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

amc-corey-cox left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

turbomam commented Feb 26, 2026 •

edited

Loading