## Summary

Key takeaways:

1. **Local OBO files** enable offline validation with `simpleobo:` adapter
2. **OAK config** maps prefixes to specific adapters or disables validation
3. **Bindings** ensure nested objects use correct ontology branches
4. **Caching strategies**:
   - **Progressive (default)**: Lazy validation, caches as encountered - best for large ontologies
   - **Greedy**: Expands all terms upfront - deterministic, good for CI/CD
5. **CLI flags** provide fine-grained control:
   - `--cache-strategy` to choose progressive or greedy
   - `--labels` to validate labels
   - `--no-dynamic-enums` or `--no-bindings` to skip validators
   - `--verbose` for detailed output
   - `--strict` to treat warnings as errors

**Next**: [Tutorial 3: Python API](../03_python_api/) covers programmatic usage.

In [None]:
%%bash
cat > oak_config_with_strategy.yaml << 'EOF'
# Set default cache strategy
cache_strategy: progressive  # or "greedy"

ontology_adapters:
  MY: simpleobo:my_ontology.obo
  GO: sqlite:obo:go
EOF

cat oak_config_with_strategy.yaml

### Configure Cache Strategy in YAML

You can set the cache strategy in your `oak_config.yaml`:

In [None]:
%%bash
# Greedy caching - expands entire enum upfront
rm -rf cache/enums  # Clear enum cache to see the difference

linkml-term-validator validate-data experiment_data.yaml \
  --schema local_dynamic_schema.yaml \
  --target-class Experiment \
  --config oak_config.yaml \
  --cache-strategy greedy \
  --cache-dir cache
echo ""
echo "=== Greedy cache file (all terms) ==="
cat cache/enums/*.csv 2>/dev/null || echo "(no cache file)"

# Advanced Usage: Custom Configs and Local OBO Files

This tutorial covers:
1. Using custom OAK configurations
2. Working with local OBO files (offline validation)
3. Binding validation
4. Advanced CLI options
5. Troubleshooting common failures

## Setup

In [None]:
import tempfile
from pathlib import Path
import os

tmpdir = Path(tempfile.mkdtemp())
os.chdir(tmpdir)
print(f"Working in: {tmpdir}")

## Part 1: Working with Local OBO Files

For offline work or faster validation, you can use local OBO files instead of downloading databases.

### Create a Local OBO File

In [None]:
%%bash
cat > my_ontology.obo << 'EOF'
format-version: 1.2
ontology: my_ontology

[Term]
id: MY:0000001
name: root process

[Term]
id: MY:0000002
name: cell growth
is_a: MY:0000001

[Term]
id: MY:0000003
name: cell division
is_a: MY:0000001

[Term]
id: MY:0000004
name: mitosis
is_a: MY:0000003
EOF

echo "✅ Created local OBO file: my_ontology.obo"

### Create OAK Config for Local File

In [None]:
%%bash
cat > oak_config.yaml << 'EOF'
ontology_adapters:
  MY: simpleobo:my_ontology.obo
EOF

echo "✅ Created OAK config pointing to local file"

### Create Schema Using Local Ontology

In [None]:
%%bash
cat > local_schema.yaml << 'EOF'
id: https://example.org/local-schema
name: local-schema

prefixes:
  MY: http://example.org/MY_
  linkml: https://w3id.org/linkml/

default_prefix: local-schema
default_range: string

enums:
  ProcessEnum:
    permissible_values:
      CELL_GROWTH:
        title: cell growth
        meaning: MY:0000002
      CELL_DIVISION:
        title: cell division
        meaning: MY:0000003
EOF

echo "✅ Created schema using local ontology"

### Validate with Custom OAK Config

In [None]:
%%bash
linkml-term-validator validate-schema local_schema.yaml \
  --config oak_config.yaml \
  --cache-dir cache \
  --verbose && echo "✅ Validation passed!"

### Dynamic Enum with Local OBO

In [None]:
%%bash
cat > local_dynamic_schema.yaml << 'EOF'
id: https://example.org/local-dynamic
name: local-dynamic

prefixes:
  MY: http://example.org/MY_
  linkml: https://w3id.org/linkml/

default_prefix: local-dynamic
default_range: string

classes:
  Experiment:
    attributes:
      id:
        identifier: true
      process:
        range: AllProcessesEnum

enums:
  AllProcessesEnum:
    description: Any process or its descendants
    reachable_from:
      source_ontology: simpleobo:my_ontology.obo
      source_nodes:
        - MY:0000001  # root process
      relationship_types:
        - rdfs:subClassOf
EOF

echo "✅ Created dynamic enum schema with local OBO"

In [None]:
%%bash
cat > experiment_data.yaml << 'EOF'
- id: exp1
  process: MY:0000004  # mitosis - valid (descendant of root)

- id: exp2
  process: MY:0000002  # cell growth - valid
EOF

echo "✅ Created experiment_data.yaml"


In [None]:
%%bash
linkml-term-validator validate-data experiment_data.yaml \
  --schema local_dynamic_schema.yaml \
  --target-class Experiment \
  --config oak_config.yaml \
  --cache-dir cache && echo "✅ Validation passed!"


## Part 2: Binding Validation

Bindings constrain nested object fields to specific enum ranges based on the value of another field.

### Create Schema with Bindings

In [None]:
%%bash
cat > binding_schema.yaml << 'EOF'
id: https://example.org/binding-schema
name: binding-schema

prefixes:
  GO: http://purl.obolibrary.org/obo/GO_
  linkml: https://w3id.org/linkml/

default_prefix: binding-schema
default_range: string

classes:
  GeneAnnotation:
    description: Annotation of a gene with a GO term
    attributes:
      id:
        identifier: true
      gene_symbol:
        range: string
      go_term:
        range: GOTerm
        bindings:
          - binds_value_of: id
            range: BiologicalProcessEnum

  GOTerm:
    attributes:
      id:
        identifier: true
      label:
        range: string

enums:
  BiologicalProcessEnum:
    description: Biological process terms
    reachable_from:
      source_ontology: obo:go
      source_nodes:
        - GO:0008150  # biological_process
      relationship_types:
        - rdfs:subClassOf
EOF

echo "✅ Created schema with binding constraints"

### Valid Data (Passes Binding Check)

In [None]:
%%bash
cat > valid_binding_data.yaml << 'EOF'
- id: annot1
  gene_symbol: BRCA1
  go_term:
    id: GO:0007049  # cell cycle - IS a biological process
    label: cell cycle
EOF

In [None]:
%%bash
linkml-term-validator validate-data valid_binding_data.yaml \
  --schema binding_schema.yaml \
  --target-class GeneAnnotation \
  --cache-dir cache || echo "⚠️  May fail due to GO database download requirements"

### Invalid Data (Fails Binding Check)

In [None]:
%%bash
cat > invalid_binding_data.yaml << 'EOF'
- id: annot1
  gene_symbol: BRCA1
  go_term:
    id: GO:0005634  # nucleus - this is a cellular component, NOT a biological process!
    label: nucleus
EOF

In [None]:
%%bash
linkml-term-validator validate-data invalid_binding_data.yaml \
  --schema binding_schema.yaml \
  --target-class GeneAnnotation \
  --cache-dir cache \
  || echo "❌ Binding validation failed as expected"

## Part 3: Advanced CLI Options

### Disable Specific Validators

In [None]:
%%bash
# Skip dynamic enum validation
linkml-term-validator validate-data valid_binding_data.yaml \
  --schema binding_schema.yaml \
  --target-class GeneAnnotation \
  --no-dynamic-enums \
  --cache-dir cache

echo ""
echo "---"
echo ""

# Skip binding validation
linkml-term-validator validate-data valid_binding_data.yaml \
  --schema binding_schema.yaml \
  --target-class GeneAnnotation \
  --no-bindings \
  --cache-dir cache || echo "⚠️  May fail due to GO database download requirements"

### Label Validation

Optionally verify that labels in your data match the ontology:

In [None]:
%%bash
cat > wrong_label_data.yaml << 'EOF'
- id: annot1
  gene_symbol: BRCA1
  go_term:
    id: GO:0007049
    label: wrong label here  # Incorrect label
EOF

In [None]:
%%bash
linkml-term-validator validate-data wrong_label_data.yaml \
  --schema binding_schema.yaml \
  --target-class GeneAnnotation \
  --cache-dir cache
echo ""
echo "=== With --labels (checks labels) ==="
linkml-term-validator validate-data wrong_label_data.yaml \
  --schema binding_schema.yaml \
  --target-class GeneAnnotation \
  --labels \
  --cache-dir cache \
  || echo "❌ Label validation failed as expected"

### Custom Adapter String

Use different OAK adapters:

In [None]:
%%bash
# Default: sqlite:obo:
linkml-term-validator validate-schema local_schema.yaml \
  --adapter sqlite:obo: \
  --config oak_config.yaml \
  --cache-dir cache

# Or specify a specific ontology source
# --adapter ubergraph:  (requires Ubergraph access)
# --adapter bioportal:  (requires BioPortal API key) && echo "✅ Validation passed!"

## Part 4: Troubleshooting Common Failures

### Missing Ontology Prefix

In [None]:
%%bash
cat > missing_prefix_schema.yaml << 'EOF'
id: https://example.org/missing-prefix
name: missing-prefix

prefixes:
  UNKNOWN: http://example.org/UNKNOWN_
  linkml: https://w3id.org/linkml/

default_prefix: missing-prefix
default_range: string

enums:
  TestEnum:
    permissible_values:
      SOMETHING:
        title: something
        meaning: UNKNOWN:12345  # No OAK adapter configured for UNKNOWN
EOF

In [None]:
%%bash
linkml-term-validator validate-schema missing_prefix_schema.yaml \
  --cache-dir cache \
  || echo "❌ Failed: Unknown prefix not configured"

### Solution: Add to OAK Config

In [None]:
%%bash
cat > fix_oak_config.yaml << 'EOF'
ontology_adapters:
  MY: simpleobo:my_ontology.obo
  UNKNOWN: ""  # Empty string = skip validation for this prefix
EOF

In [None]:
%%bash
linkml-term-validator validate-schema missing_prefix_schema.yaml \
  --config fix_oak_config.yaml \
  --cache-dir cache
echo "✅ Validation skipped UNKNOWN prefix" && echo "✅ Validation passed!"

### Network/Database Issues

If you see errors like:
- 'Could not download database'
- 'Network timeout'
- 'Unable to connect'

**Solutions:**
1. Use local OBO files with simpleobo adapter
2. Pre-download databases:
   ```bash
   runoak -i sqlite:obo:go dump -o /tmp/go.db
   ```
3. Use `--no-cache` to force re-download
4. Check your internet connection

## Part 5: Caching Strategies for Dynamic Enums

The validator supports two caching strategies for dynamic enum validation:

- **Progressive (default)**: Validates lazily, caching valid terms as encountered
- **Greedy**: Expands entire enum upfront and caches all terms

### When to Use Each Strategy

| Use Case | Strategy |
|----------|----------|
| Large ontologies (SNOMED, NCBI Taxonomy) | Progressive |
| Small enums (< 1000 terms) | Either |
| CI/CD (deterministic) | Greedy |
| Development (fast startup) | Progressive |

In [None]:
%%bash
# Progressive caching (default) - validates lazily
linkml-term-validator validate-data experiment_data.yaml \
  --schema local_dynamic_schema.yaml \
  --target-class Experiment \
  --config oak_config.yaml \
  --cache-strategy progressive \
  --cache-dir cache
echo ""
echo "=== Progressive cache file (may be empty initially) ===" 
find cache/enums -name "*.csv" 2>/dev/null | head -3 || echo "(no enum cache files yet)"

## Cleanup

In [None]:
import shutil
shutil.rmtree(tmpdir)
print("✅ Temporary files cleaned up")