Skip to content

Add semantic validation to validate-spec#207

Merged
amc-corey-cox merged 6 commits intomainfrom
semantic-validation
May 7, 2026
Merged

Add semantic validation to validate-spec#207
amc-corey-cox merged 6 commits intomainfrom
semantic-validation

Conversation

@amc-corey-cox
Copy link
Copy Markdown
Contributor

Summary

  • Extends validate-spec to cross-reference transformation specs against source and/or target LinkML schemas, catching typos and schema drift that would otherwise silently produce None at runtime
  • Adds ValidationMessage dataclass with severity levels (error/warning) — replaces plain string returns
  • AST-based expression slot reference extraction (extract_expr_slot_references()) identifies slot names in expr fields without regex
  • Auto-detects schemas from spec's source_schema/target_schema fields when not explicitly provided
  • CLI gains --source-schema, --target-schema, --strict, --no-warnings options

What gets checked

Check Severity
Target class name not in target schema error
populated_from class not in source schema error
Target slot name not valid on target class error
populated_from slot not valid on source class error
Target enum not in target schema error
populated_from enum not in source schema error
Unresolved expression slot reference warning (error with --strict)
Required target slot with no derivation warning

Example

linkml-map validate-spec \
  --source-schema source.yaml \
  --target-schema target.yaml \
  my-transform.yaml

Test plan

  • 65 tests (was 36): expression extraction, all semantic check categories, strict mode, CLI options
  • Full suite: 624 passed, 4 skipped
  • Smoke tested against personinfo_basic (catches real issues) and flattening (clean pass)

Closes #199

Extends the validator to cross-reference transformation specs against
source and/or target LinkML schemas, catching typos and schema drift
that would otherwise silently produce None at runtime.

New features:
- ValidationMessage dataclass with severity (error/warning) and path
- validate_spec_semantics() checks class/slot/enum names and
  populated_from references against source/target SchemaViews
- extract_expr_slot_references() uses ast.parse to find slot name
  candidates in expressions
- Auto-detection of schemas from spec source_schema/target_schema fields
- CLI: --source-schema, --target-schema, --strict, --no-warnings

Closes #199
Copilot AI review requested due to automatic review settings April 24, 2026 18:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends validate-spec beyond JSON Schema structural checks to optionally perform semantic validation of transformation specs against source/target LinkML schemas, returning structured validation messages and adding CLI controls for schema inputs and warning handling.

Changes:

  • Introduces ValidationMessage (error/warning + path) and updates validator APIs to return messages instead of plain strings.
  • Adds AST-based expression slot reference extraction and semantic checks for class/slot/enum references (plus required-slot warnings).
  • Updates validate-spec CLI with --source-schema, --target-schema, --strict, and --no-warnings, and expands test coverage accordingly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
src/linkml_map/validator.py Adds message dataclass, structural+semantic validation pipeline, schema loading, and AST-based expression reference extraction.
src/linkml_map/cli/cli.py Extends validate-spec command with schema/strict/no-warnings options and updated output/exit behavior.
tests/test_validator.py Updates existing validator tests for ValidationMessage output and adds semantic validation + expression extraction tests.
tests/test_cli/test_cli_validate.py Adds CLI coverage for semantic validation, strict mode behavior, and warning suppression.

Comment thread src/linkml_map/validator.py Outdated
Comment thread src/linkml_map/validator.py
Comment thread tests/test_validator.py Outdated
Comment thread src/linkml_map/cli/cli.py Outdated
Comment thread src/linkml_map/validator.py Outdated
- Try SchemaView resolution for URLs/identifiers with 10s timeout
- Error (not warn) when explicitly provided schema fails to load
- Filter assignment targets and comprehension variables from expression
  slot reference extraction, eliminating false-positive warnings
- Make --strict cause exit 1 on any warnings, not just expr refs
- Fix codespell: unparseable → unparsable
Bare identifiers like 's1' that aren't local files should not be passed
to SchemaView — only values containing '://' are treated as URLs worth
attempting. Fixes doctest failures where auto-detected non-file schema
references produced spurious warnings.
Copilot AI review requested due to automatic review settings April 24, 2026 19:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances validate-spec by adding semantic validation of transformation specs against LinkML source/target schemas, helping detect schema drift and typos that would otherwise fail silently at runtime.

Changes:

  • Introduces ValidationMessage (error/warning) and updates validation APIs and tests to return structured messages instead of plain strings.
  • Adds AST-based expression slot reference extraction and semantic checks for class/slot/enum references and required slots.
  • Extends the CLI with schema/strict/no-warnings options and adds CLI + semantic validation test coverage.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
src/linkml_map/validator.py Adds ValidationMessage, structural+semantic validation, schema auto-detection/loading, and AST expression reference extraction.
src/linkml_map/cli/cli.py Extends validate-spec CLI with schema options, warning handling, and strict mode exit behavior.
tests/test_validator.py Updates existing tests for message objects and adds semantic validation + expression extraction tests.
tests/test_cli/test_cli_validate.py Adds CLI tests covering semantic validation, strict mode, and warning suppression.

Comment thread src/linkml_map/validator.py Outdated
Comment thread src/linkml_map/cli/cli.py Outdated
Comment thread src/linkml_map/validator.py
- Use non-blocking executor shutdown on timeout to avoid hanging
- Skip semantic validation when structural errors exist to prevent
  crashes on malformed input
Resolve conflicts in cli.py validate_spec_cmd / _validate_spec_individual:
combine main's --merge/--entity/--emit-spec options (added by PR #203)
with this branch's --source-schema/--target-schema/--strict/--no-warnings
options. The two feature sets are orthogonal and now compose:

  validate-spec --merge --entity X --source-schema src.yaml \
                --target-schema tgt.yaml --strict specs/

Extended _validate_spec_merged to accept and forward the semantic-validation
parameters and to handle warnings/strict the same way the per-file path does
(separate error/warning grouping, --no-warnings suppresses warning lines,
--strict exits 1 on warnings).

Also fix test_validate_spec_multiple_files: with auto-schema-detection now
in main, the personinfo spec triggers benign warnings when its source/target
schema URLs don't resolve in the test environment. The summary line for
each file becomes 'ok (with warnings)' instead of just 'ok'. Loosened the
assertion to accept either form — both signal success.
Copilot AI review requested due to automatic review settings May 6, 2026 19:02
@amc-corey-cox
Copy link
Copy Markdown
Contributor Author

Merge of main + integration with PR #203 features (c4ab33f)

Main moved substantially since this PR was last touched (April 24). The recent merge of #203 added --merge, --entity, and --emit-spec to validate-spec — orthogonal to this PR's --source-schema, --target-schema, --strict, --no-warnings, but they now share validate_spec_cmd and its helpers.

Resolution: The two feature sets compose. After the merge:

linkml-map validate-spec --merge --entity Person \
  --source-schema src.yaml --target-schema tgt.yaml \
  --strict specs/
  • _validate_spec_individual keeps this branch's per-file warning/error grouping and --no-warnings / --strict handling, picking up main's resolve_spec_paths (directory expansion).
  • _validate_spec_merged (introduced by Support multi-file spec loading, --entity filter, and --emit-spec #203 for the merge path) now also accepts and forwards the semantic-validation parameters and applies the same warnings/strict semantics.

Test fix: test_validate_spec_multiple_files was asserting every output line ends with : ok. With auto-schema-detection now in this branch, the personinfo spec's source/target schema URLs (which 404 in CI) trigger benign warnings, so its summary line becomes ok (with warnings). Loosened the assertion to accept both — neither is an error.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread src/linkml_map/validator.py
Comment thread src/linkml_map/validator.py Outdated
Comment thread tests/test_validator.py Outdated
Three fixes from Copilot's review of the merge commit:

1. Skip join aliases in expression-reference validation (validator.py).
   Previously, expressions like {demographics.age_at_exam} where
   'demographics' is a class_derivation.joins alias would warn:
   'Expression references demographics which is not a slot on the
   source class'. Now _validate_class_derivation collects join names
   and passes them to _validate_slot_derivation, which excludes them
   from refs before checking against source_class_slots.

   Validating {join_alias.some_slot} against the joined class itself
   would be the next step but requires resolving the joined class's
   SchemaView — deferred as a follow-up enhancement.

   Added test_semantics_join_alias_in_expr_no_warning. Verified by
   temporarily removing the '- joined_aliases' filter — test fails
   with the false-positive warning.

2. Document _resolve_schema_path's identifier behavior (validator.py
   docstring + cli.py --source-schema/--target-schema help). Auto-
   detection supports paths and URLs only; identifier-style values
   (e.g. 'biolink') are skipped silently to avoid surprise network
   requests on typos. Users with identifier-style schemas should pass
   --source-schema / --target-schema explicitly. Also added a debug
   log for visibility when verbose.

3. Fix misleading comment in test_semantics_auto_detect_schemas. The
   test uses FLATTENING_TR which has source_schema: s1 (placeholder),
   not URL — comment now reflects that auto-detection skips
   non-resolvable identifier values.
@amc-corey-cox amc-corey-cox merged commit f1a05a8 into main May 7, 2026
12 checks passed
@amc-corey-cox amc-corey-cox deleted the semantic-validation branch May 7, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

validate-spec should check class and slot references against source and target schemas

2 participants