Add semantic validation to validate-spec#207
Conversation
Extends the validator to cross-reference transformation specs against source and/or target LinkML schemas, catching typos and schema drift that would otherwise silently produce None at runtime. New features: - ValidationMessage dataclass with severity (error/warning) and path - validate_spec_semantics() checks class/slot/enum names and populated_from references against source/target SchemaViews - extract_expr_slot_references() uses ast.parse to find slot name candidates in expressions - Auto-detection of schemas from spec source_schema/target_schema fields - CLI: --source-schema, --target-schema, --strict, --no-warnings Closes #199
There was a problem hiding this comment.
Pull request overview
Extends validate-spec beyond JSON Schema structural checks to optionally perform semantic validation of transformation specs against source/target LinkML schemas, returning structured validation messages and adding CLI controls for schema inputs and warning handling.
Changes:
- Introduces
ValidationMessage(error/warning + path) and updates validator APIs to return messages instead of plain strings. - Adds AST-based expression slot reference extraction and semantic checks for class/slot/enum references (plus required-slot warnings).
- Updates
validate-specCLI with--source-schema,--target-schema,--strict, and--no-warnings, and expands test coverage accordingly.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/linkml_map/validator.py |
Adds message dataclass, structural+semantic validation pipeline, schema loading, and AST-based expression reference extraction. |
src/linkml_map/cli/cli.py |
Extends validate-spec command with schema/strict/no-warnings options and updated output/exit behavior. |
tests/test_validator.py |
Updates existing validator tests for ValidationMessage output and adds semantic validation + expression extraction tests. |
tests/test_cli/test_cli_validate.py |
Adds CLI coverage for semantic validation, strict mode behavior, and warning suppression. |
- Try SchemaView resolution for URLs/identifiers with 10s timeout - Error (not warn) when explicitly provided schema fails to load - Filter assignment targets and comprehension variables from expression slot reference extraction, eliminating false-positive warnings - Make --strict cause exit 1 on any warnings, not just expr refs - Fix codespell: unparseable → unparsable
Bare identifiers like 's1' that aren't local files should not be passed to SchemaView — only values containing '://' are treated as URLs worth attempting. Fixes doctest failures where auto-detected non-file schema references produced spurious warnings.
There was a problem hiding this comment.
Pull request overview
This PR enhances validate-spec by adding semantic validation of transformation specs against LinkML source/target schemas, helping detect schema drift and typos that would otherwise fail silently at runtime.
Changes:
- Introduces
ValidationMessage(error/warning) and updates validation APIs and tests to return structured messages instead of plain strings. - Adds AST-based expression slot reference extraction and semantic checks for class/slot/enum references and required slots.
- Extends the CLI with schema/strict/no-warnings options and adds CLI + semantic validation test coverage.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/linkml_map/validator.py |
Adds ValidationMessage, structural+semantic validation, schema auto-detection/loading, and AST expression reference extraction. |
src/linkml_map/cli/cli.py |
Extends validate-spec CLI with schema options, warning handling, and strict mode exit behavior. |
tests/test_validator.py |
Updates existing tests for message objects and adds semantic validation + expression extraction tests. |
tests/test_cli/test_cli_validate.py |
Adds CLI tests covering semantic validation, strict mode, and warning suppression. |
- Use non-blocking executor shutdown on timeout to avoid hanging - Skip semantic validation when structural errors exist to prevent crashes on malformed input
Resolve conflicts in cli.py validate_spec_cmd / _validate_spec_individual: combine main's --merge/--entity/--emit-spec options (added by PR #203) with this branch's --source-schema/--target-schema/--strict/--no-warnings options. The two feature sets are orthogonal and now compose: validate-spec --merge --entity X --source-schema src.yaml \ --target-schema tgt.yaml --strict specs/ Extended _validate_spec_merged to accept and forward the semantic-validation parameters and to handle warnings/strict the same way the per-file path does (separate error/warning grouping, --no-warnings suppresses warning lines, --strict exits 1 on warnings). Also fix test_validate_spec_multiple_files: with auto-schema-detection now in main, the personinfo spec triggers benign warnings when its source/target schema URLs don't resolve in the test environment. The summary line for each file becomes 'ok (with warnings)' instead of just 'ok'. Loosened the assertion to accept either form — both signal success.
Merge of main + integration with PR #203 features (c4ab33f)Main moved substantially since this PR was last touched (April 24). The recent merge of #203 added Resolution: The two feature sets compose. After the merge: linkml-map validate-spec --merge --entity Person \
--source-schema src.yaml --target-schema tgt.yaml \
--strict specs/
Test fix: |
Three fixes from Copilot's review of the merge commit:
1. Skip join aliases in expression-reference validation (validator.py).
Previously, expressions like {demographics.age_at_exam} where
'demographics' is a class_derivation.joins alias would warn:
'Expression references demographics which is not a slot on the
source class'. Now _validate_class_derivation collects join names
and passes them to _validate_slot_derivation, which excludes them
from refs before checking against source_class_slots.
Validating {join_alias.some_slot} against the joined class itself
would be the next step but requires resolving the joined class's
SchemaView — deferred as a follow-up enhancement.
Added test_semantics_join_alias_in_expr_no_warning. Verified by
temporarily removing the '- joined_aliases' filter — test fails
with the false-positive warning.
2. Document _resolve_schema_path's identifier behavior (validator.py
docstring + cli.py --source-schema/--target-schema help). Auto-
detection supports paths and URLs only; identifier-style values
(e.g. 'biolink') are skipped silently to avoid surprise network
requests on typos. Users with identifier-style schemas should pass
--source-schema / --target-schema explicitly. Also added a debug
log for visibility when verbose.
3. Fix misleading comment in test_semantics_auto_detect_schemas. The
test uses FLATTENING_TR which has source_schema: s1 (placeholder),
not URL — comment now reflects that auto-detection skips
non-resolvable identifier values.
Summary
validate-specto cross-reference transformation specs against source and/or target LinkML schemas, catching typos and schema drift that would otherwise silently produceNoneat runtimeValidationMessagedataclass with severity levels (error/warning) — replaces plain string returnsextract_expr_slot_references()) identifies slot names inexprfields without regexsource_schema/target_schemafields when not explicitly provided--source-schema,--target-schema,--strict,--no-warningsoptionsWhat gets checked
populated_fromclass not in source schemapopulated_fromslot not valid on source classpopulated_fromenum not in source schema--strict)Example
Test plan
Closes #199