# Getting Started with linkml-term-validator

This tutorial demonstrates how to use the `linkml-term-validator` CLI to validate LinkML schemas and data that reference external ontology terms.

## What is linkml-term-validator?

linkml-term-validator validates that:
1. **Schema validation**: `meaning` fields in enum permissible values reference valid ontology terms
2. **Data validation**: Data instances comply with dynamic enums (`reachable_from`, `matches`, `concepts`)
3. **Binding validation**: Nested object fields satisfy binding constraints

## Installation

First, make sure linkml-term-validator is installed:

In [None]:
%%bash
# Check if installed
linkml-term-validator --help > /dev/null && echo "✅ linkml-term-validator is installed" || echo "❌ Install with: pip install linkml-term-validator"


## Setup: Create Test Files

Let's create a temporary directory with example schemas and data:

In [None]:
import tempfile
from pathlib import Path
import os

# Create temp directory
tmpdir = Path(tempfile.mkdtemp())
os.chdir(tmpdir)
print(f"Working in: {tmpdir}")

## Part 1: Schema Validation

Schema validation checks that `meaning` fields in enum permissible values reference valid ontology terms.

### Create a Valid Schema

In [None]:
%%bash
cat > valid_schema.yaml << 'EOF'
id: https://example.org/my-schema
name: my-schema

prefixes:
  GO: http://purl.obolibrary.org/obo/GO_
  linkml: https://w3id.org/linkml/

default_prefix: my-schema
default_range: string

classes:
  Sample:
    attributes:
      id:
        identifier: true
      process:
        range: BiologicalProcessEnum

enums:
  BiologicalProcessEnum:
    permissible_values:
      BIOLOGICAL_PROCESS:
        title: biological_process
        meaning: GO:0008150
      CELL_CYCLE:
        title: cell cycle
        meaning: GO:0007049
      DNA_REPLICATION:
        title: DNA replication
        meaning: GO:0006260
EOF

echo "✅ Created valid_schema.yaml"

### Validate the Schema (Success Case)

In [None]:
%%bash
linkml-term-validator validate-schema valid_schema.yaml --cache-dir cache && echo "✅ Validation passed!"

### Understanding the Cache

The validator caches ontology labels to improve performance. Let's look at what's in the cache:

In [None]:
%%bash
find ./cache

In [None]:
%%bash
cat cache/go/terms.csv

### Create a Schema with Errors

Now let's create a schema with some common mistakes:

In [None]:
%%bash
cat > invalid_schema.yaml << 'EOF'
id: https://example.org/invalid-schema
name: invalid-schema

prefixes:
  GO: http://purl.obolibrary.org/obo/GO_
  linkml: https://w3id.org/linkml/

default_prefix: invalid-schema
default_range: string

enums:
  BiologicalProcessEnum:
    permissible_values:
      INVALID_TERM:
        title: this term does not exist
        meaning: GO:9999999  # Invalid CURIE - doesn't exist
      WRONG_LABEL:
        title: wrong label here
        meaning: GO:0008150  # Valid CURIE but wrong label (should be biological_process)
EOF

echo "✅ Created invalid_schema.yaml with intentional errors"

### Validate the Invalid Schema (Failure Cases)

In [None]:
%%bash
# This should fail and show errors
linkml-term-validator validate-schema invalid_schema.yaml --cache-dir cache --verbose || echo "❌ Validation failed as expected"

## Part 2: Data Validation with Dynamic Enums

Dynamic enums allow you to define valid values based on ontology queries rather than static lists.

### Create a Schema with Dynamic Enum

In [None]:
%%bash
cat > dynamic_schema.yaml << 'EOF'
id: https://example.org/dynamic-schema
name: dynamic-schema

prefixes:
  GO: http://purl.obolibrary.org/obo/GO_
  linkml: https://w3id.org/linkml/

default_prefix: dynamic-schema
default_range: string

classes:
  Sample:
    attributes:
      id:
        identifier: true
      process_type:
        range: BiologicalProcessEnum
        required: true

enums:
  BiologicalProcessEnum:
    description: Any biological process or its descendants
    reachable_from:
      source_ontology: obo:go
      source_nodes:
        - GO:0008150  # biological_process
      relationship_types:
        - rdfs:subClassOf
EOF

echo "✅ Created dynamic_schema.yaml"

### Create Valid Data

In [None]:
%%bash
cat > valid_data.yaml << 'EOF'
- id: sample1
  process_type: GO:0007049  # cell cycle - descendant of biological_process

- id: sample2
  process_type: GO:0006260  # DNA replication - also a descendant
EOF

echo "✅ Created valid_data.yaml"

### Validate Valid Data (Success)

In [None]:
%%bash
linkml-term-validator validate-data valid_data.yaml \
  --schema dynamic_schema.yaml \
  --target-class Sample \
  --cache-dir cache

### Create Invalid Data

Let's create data with terms that are NOT biological processes:

In [None]:
%%bash
cat > invalid_data.yaml << 'EOF'
- id: sample1
  process_type: GO:0005634  # nucleus - this is a cellular component, NOT a process!

- id: sample2
  process_type: GO:0003674  # molecular_function - wrong branch of GO!
EOF

echo "✅ Created invalid_data.yaml with intentional errors"

### Validate Invalid Data (Failures)

In [None]:
%%bash
linkml-term-validator validate-data invalid_data.yaml \
  --schema dynamic_schema.yaml \
  --target-class Sample \
  --cache-dir cache \
  || echo "❌ Validation failed as expected"

## CLI Options

### Verbose Output

Get more detailed information:

In [None]:
%%bash
linkml-term-validator validate-schema valid_schema.yaml --verbose --cache-dir cache && echo "✅ Validation passed!"

### Strict Mode

Treat warnings as errors:

In [None]:
%%bash
linkml-term-validator validate-schema valid_schema.yaml --strict --cache-dir cache && echo "✅ Validation passed!"

### Help

Get help for any command:

In [None]:
%%bash
linkml-term-validator --help

In [None]:
%%bash
linkml-term-validator validate-data --help

## Next Steps

- [**Tutorial 2: Advanced Usage**](../02_advanced_usage/) - Custom configs, bindings, and local OBO files
- [**Tutorial 3: Python API**](../03_python_api/) - Programmatic usage

## Cleanup

In [None]:
import shutil
shutil.rmtree(tmpdir)
print("✅ Temporary files cleaned up")