# Python API for Programmatic Usage

This tutorial shows how to use linkml-term-validator programmatically in Python scripts and applications.

## Setup

In [None]:
import tempfile
from pathlib import Path
import os

tmpdir = Path(tempfile.mkdtemp())
os.chdir(tmpdir)
print(f"Working in: {tmpdir}")

## Part 1: Using Validation Plugins

The recommended approach is to use validation plugins with LinkML's Validator API.

### Schema Validation with PermissibleValueMeaningPlugin

In [None]:
# Create a test schema
schema_path = tmpdir / "schema.yaml"
schema_path.write_text("""
id: https://example.org/test
name: test

prefixes:
  GO: http://purl.obolibrary.org/obo/GO_
  linkml: https://w3id.org/linkml/

default_prefix: test
default_range: string

enums:
  ProcessEnum:
    permissible_values:
      CELL_CYCLE:
        title: cell cycle
        meaning: GO:0007049
""")

print(f"Created schema: {schema_path}")

In [None]:
from linkml.validator import Validator
from linkml_term_validator.plugins import PermissibleValueMeaningPlugin

# Create plugin
plugin = PermissibleValueMeaningPlugin(
    oak_adapter_string="sqlite:obo:",
    cache_dir=tmpdir / "cache",
    strict_mode=False
)

# Create validator
validator = Validator(
    schema=str(schema_path),
    validation_plugins=[plugin]
)

# Validate (note: validates the schema structure, not data)
print("Schema validation plugin configured successfully")

### Data Validation with DynamicEnumPlugin

In [None]:
# Create schema with dynamic enum
dynamic_schema_path = tmpdir / "dynamic_schema.yaml"
dynamic_schema_path.write_text("""
id: https://example.org/dynamic
name: dynamic

prefixes:
  GO: http://purl.obolibrary.org/obo/GO_
  linkml: https://w3id.org/linkml/

default_prefix: dynamic
default_range: string

classes:
  Sample:
    attributes:
      id:
        identifier: true
      process:
        range: ProcessEnum

enums:
  ProcessEnum:
    reachable_from:
      source_ontology: obo:go
      source_nodes:
        - GO:0008150
      relationship_types:
        - rdfs:subClassOf
""")

# Create valid data
data_path = tmpdir / "data.yaml"
data_path.write_text("""
- id: sample1
  process: GO:0007049
""")

print("Created schema and data files")

In [None]:
from linkml.validator.loaders import YamlLoader
from linkml_term_validator.plugins import DynamicEnumPlugin

# Create plugin
dynamic_plugin = DynamicEnumPlugin(
    oak_adapter_string="sqlite:obo:",
    cache_dir=tmpdir / "cache"
)

# Create validator
dynamic_validator = Validator(
    schema=str(dynamic_schema_path),
    validation_plugins=[dynamic_plugin]
)

# Load and validate data
loader = YamlLoader(data_path)
report = dynamic_validator.validate_source(loader, target_class="Sample")

# Check results
if len(report.results) == 0:
    print("✅ Validation passed!")
else:
    print(f"❌ Validation failed with {len(report.results)} issues:")
    for result in report.results:
        print(f"  - {result.message}")

### Combining Multiple Plugins

In [None]:
from linkml_term_validator.plugins import (
    DynamicEnumPlugin,
    BindingValidationPlugin
)

# Create multiple plugins
plugins = [
    DynamicEnumPlugin(
        oak_adapter_string="sqlite:obo:",
        cache_dir=tmpdir / "cache"
    ),
    BindingValidationPlugin(
        oak_adapter_string="sqlite:obo:",
        validate_labels=True,
        cache_dir=tmpdir / "cache"
    )
]

# Create comprehensive validator
comprehensive_validator = Validator(
    schema=str(dynamic_schema_path),
    validation_plugins=plugins
)

print("✅ Comprehensive validator created with multiple plugins")

## Part 2: Using Custom OAK Configurations

In [None]:
# Create OAK config
oak_config_path = tmpdir / "oak_config.yaml"
oak_config_path.write_text("""
ontology_adapters:
  GO: sqlite:obo:go
  MY: ""  # Skip validation for MY prefix
""")

# Create plugin with custom config
custom_plugin = DynamicEnumPlugin(
    oak_adapter_string="sqlite:obo:",
    oak_config_path=oak_config_path,
    cache_dir=tmpdir / "cache"
)

print("✅ Plugin created with custom OAK configuration")

## Part 3: Error Handling

In [None]:
# Create invalid data
invalid_data_path = tmpdir / "invalid_data.yaml"
invalid_data_path.write_text("""
- id: sample1
  process: GO:0005634  # nucleus - NOT a process!
""")

# Validate and handle errors
loader = YamlLoader(invalid_data_path)
report = dynamic_validator.validate_source(loader, target_class="Sample")

if len(report.results) == 0:
    print("✅ Validation passed")
else:
    print(f"❌ Found {len(report.results)} validation issues:\n")
    for i, result in enumerate(report.results, 1):
        print(f"{i}. Severity: {result.severity.name}")
        print(f"   Message: {result.message}")
        print(f"   Type: {result.type}")
        if hasattr(result, 'instance'):
            print(f"   Instance: {result.instance}")
        print()

## Part 4: Integration Example

Here's a complete example of integrating validation into a data processing pipeline:

In [None]:
def validate_linkml_data(schema_file, data_file, target_class=None):
    """Validate LinkML data with ontology term checking.
    
    Args:
        schema_file: Path to LinkML schema
        data_file: Path to data file (YAML/JSON)
        target_class: Target class name (optional)
        
    Returns:
        tuple: (is_valid: bool, error_messages: list)
    """
    from linkml.validator import Validator
    from linkml.validator.loaders import default_loader_for_file
    from linkml_term_validator.plugins import (
        DynamicEnumPlugin,
        BindingValidationPlugin
    )
    
    # Create plugins
    plugins = [
        DynamicEnumPlugin(oak_adapter_string="sqlite:obo:"),
        BindingValidationPlugin(oak_adapter_string="sqlite:obo:")
    ]
    
    # Create validator
    validator = Validator(schema=str(schema_file), validation_plugins=plugins)
    
    # Load and validate
    loader = default_loader_for_file(data_file)
    report = validator.validate_source(loader, target_class=target_class)
    
    # Extract errors
    errors = [result.message for result in report.results]
    
    return (len(errors) == 0, errors)

# Test the function
is_valid, errors = validate_linkml_data(dynamic_schema_path, data_path, "Sample")

if is_valid:
    print("✅ Data is valid!")
else:
    print(f"❌ Data has {len(errors)} errors:")
    for error in errors:
        print(f"  - {error}")

## Summary

Key Python API patterns:

1. **Create plugins** with configuration (adapter, cache, etc.)
2. **Create Validator** with schema and plugins
3. **Load data** with appropriate loader (YamlLoader, JsonLoader, etc.)
4. **Validate** with `validator.validate_source(loader, target_class)`
5. **Check results** with `len(report.results) == 0`

For most use cases, the CLI (`linkml-term-validator`) is recommended. Use the Python API when you need:
- Integration with existing Python code
- Custom error handling
- Programmatic control over validation

## Cleanup

In [None]:
import shutil
shutil.rmtree(tmpdir)
print("✅ Temporary files cleaned up")