Skip to content

Multi-artifact emission: one source instance into multiple target containers #239

@amc-corey-cox

Description

@amc-corey-cox

Motivation

Some transformations need one source instance to emit artifacts into multiple target containers. Current class_derivations are one-to-one: a derivation emits one class instance per source match. Authors handle the one-to-many case by writing two separate derivations both rooted at the source class and walking the source twice — see docs/examples/MetamodelMapping.ipynb for the canonical workaround.

This works but couples the artifacts only by author convention, not by the spec model. The planner can't see that the two derivations operate on the same source instances, can't validate cross-references between the emitted artifacts, and can't optimize execution.

Concrete driver: schema-automator's EML importer (linkml/schema-automator#208). An EML <attribute> with an <enumeratedDomain> is the canonical origin of both a slot_definition (in the parent class's attributes map) and an enum_definition (in the schema-level enums map). This is the first concrete schema-to-schema use case driving the need; dm-bip and similar data-to-data consumers don't exercise it. Future XSD/JSON-Schema importers will hit the same pattern.

Proposed direction (strawman)

Allow a ClassDerivation to declare additional emitted artifacts beyond its primary class:

class_derivations:
  AttributeToSlot:
    populated_from: Attribute
    target_class: slot_definition
    publishes: { enum_name_for: self }
    slot_derivations: { ... }
    also_emit:
      - target_class: enum_definition
        when: "is_present(measurementScale.nominal.nonNumericDomain.enumeratedDomain)"
        place_into: "$target.enums"
        slot_derivations:
          name: { expr: "<naming expression>" }
          permissible_values: { ... }

The cross-reference between primary and secondary artifacts is declared via publishes / ref (proposed in #237) — the slot in the primary artifact and the enum in the secondary share a binding-keyed name.

Reversibility

also_emit doesn't change linkml-map's existing reversibility-where-lossless rule. The inverse engine needs to know which slot-and-enum pairs originated from the same source instance; when the cross-reference (#237) is explicit, that pairing is mechanical. Expression invertibility is inherited from the expressions themselves (a spec that uses slugify for naming is lossy regardless of also_emit; a spec that uses identity expressions remains reversible).

Rule: also_emit is reversible iff (a) the cross-reference binding is declared, and (b) the binding expression is invertible. This adds one rule and inherits the rest.

Open questions

  • place_into surface. A dotted path into the target schema ("$target.enums") is explicit. Alternative: have the target slot in the trans-spec model carry container-selection semantics, so emission routes by target schema structure. The dotted path is simpler to start.
  • Interaction with dictionary_key / cast_collection_as. How does emission into a map-keyed container interact with the secondary artifact's own keying? Likely the target slot's keying applies and the secondary artifact must include the key slot.
  • Alternative: canonicalize two-walk pattern. If appetite for also_emit is low, the alternative is to elevate the MetamodelMapping.ipynb two-walk pattern as the documented idiom and add tooling (validation, cross-ref declarations) around it. also_emit makes the relationship first-class; convention is the other option.

References / contrasts

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions