# Getting Started with linkml-reference-validator

This tutorial demonstrates how to use the `linkml-reference-validator` CLI to validate that supporting text quotes actually appear in their cited references.

## What is linkml-reference-validator?

linkml-reference-validator validates that:
1. **Quoted text exists**: Supporting text claims actually appear in the referenced publication
2. **Accurate citations**: References are properly cited and accessible
3. **Deterministic matching**: Uses substring matching (not fuzzy/AI-based)

The tool fetches publications from PubMed and PMC and caches them locally for offline use.

## Installation

First, make sure linkml-reference-validator is installed:

In [None]:
%%bash
# Check if installed
linkml-reference-validator --help > /dev/null && echo "✅ linkml-reference-validator is installed" || echo "❌ Install with: pip install linkml-reference-validator"

## Part 1: Basic Validation with `validate text`

The most common use case is validating a single supporting text quote against a reference.

### Example 1: Validate a Real Quote

Let's validate a quote from a real scientific paper (PMID:16888623):

In [None]:
%%bash
# This quote appears in the referenced paper
linkml-reference-validator validate text \
  "MUC1 oncoprotein blocks nuclear targeting of c-Abl" \
  PMID:16888623

echo "✅ Quote validated!"

**Note**: The first time you run this, it fetches the reference from PubMed and caches it locally in `references_cache/`. Subsequent validations use the cached copy, making them much faster!

### Example 2: Validation Failure

What happens when the quote doesn't appear in the reference?

In [None]:
%%bash
# This text does NOT appear in PMID:16888623
linkml-reference-validator validate text \
  "MUC1 activates the JAK-STAT pathway" \
  PMID:16888623 \
  || echo "❌ Validation failed - text not found in reference"

### Example 3: Partial Quotes

You can validate partial quotes from the reference:

In [None]:
%%bash
# Just a portion of the text
linkml-reference-validator validate text \
  "blocks nuclear targeting" \
  PMID:16888623

echo "✅ Partial quote validated!"

## Part 2: Editorial Notes with `[...]`

Use square brackets for editorial clarifications that should be ignored during matching.

For example, if you want to clarify what "MUC1" stands for in your quote:

In [None]:
%%bash
# Editorial clarification - brackets are ignored during matching
linkml-reference-validator validate text \
  'MUC1 [mucin 1] oncoprotein blocks nuclear targeting of c-Abl' \
  PMID:16888623

echo "✅ Editorial note ignored during matching!"

In [None]:
%%bash
# Multiple editorial notes
linkml-reference-validator validate text \
  'MUC1 [an oncoprotein] blocks nuclear targeting of c-Abl [a tyrosine kinase]' \
  PMID:16888623

echo "✅ Multiple editorial notes handled!"

## Part 3: Ellipsis for Omitted Text (`...`)

Use `...` to indicate omitted text between two parts of a quote. Both parts must be found in the reference.

In [None]:
%%bash
# Multi-part quote with ellipsis
linkml-reference-validator validate text \
  "MUC1 oncoprotein ... c-Abl in the apoptotic response" \
  PMID:16888623

echo "✅ Both parts of ellipsis quote found!"

## Part 5: Text Normalization

Before matching, text is normalized:
- Lowercased
- Punctuation removed
- Extra whitespace collapsed

This means different formatting of the same text will match:

In [None]:
%%bash
# All these variations match the same text
linkml-reference-validator validate text \
  "MUC-1 ONCOPROTEIN blocks NUCLEAR-TARGETING!!!" \
  PMID:16888623

echo "✅ Normalized text matched!"

## Part 6: Pre-caching References with `cache reference`

You can pre-fetch and cache references for offline use:

In [None]:
%%bash
# Pre-cache a reference (shows metadata)
linkml-reference-validator cache reference PMID:16888623

## Part 7: Verbose Output

Use `--verbose` to see detailed validation information:

In [None]:
%%bash
# Verbose output shows fetching and matching details
linkml-reference-validator validate text \
  "MUC1 oncoprotein blocks nuclear targeting" \
  PMID:16888623 \
  --verbose

## Part 8: Using in Shell Scripts

The CLI uses standard exit codes for easy integration into scripts:

In [None]:
%%bash
# Example shell script usage
if linkml-reference-validator validate text \
    "MUC1 oncoprotein blocks nuclear targeting" \
    PMID:16888623 > /dev/null 2>&1; then
  echo "✅ Quote verified successfully"
else
  echo "❌ Quote validation failed"
  exit 1
fi

## Part 9: Understanding the Cache

References are cached in `references_cache/` by default. Let's see what's in there:

In [None]:
%%bash
# List cached references
ls -lh references_cache/ | head -10

In [None]:
%%bash
# Peek at a cached reference
head -20 references_cache/PMID_16888623.md

The cache files are in markdown format with YAML frontmatter, making them human-readable!

## CLI Help

Get help for any command:

In [None]:
%%bash
linkml-reference-validator --help

In [None]:
%%bash
linkml-reference-validator validate --help

In [None]:
%%bash
linkml-reference-validator validate text --help

In [None]:
%%bash
linkml-reference-validator cache reference --help

## Summary

In this tutorial, we learned:

- **Basic validation**: `validate text "quote" PMID:12345`
- **Editorial notes**: Use `[...]` for clarifications
- **Ellipsis**: Use `...` for omitted text
- **Normalization**: Case and punctuation don't matter
- **Caching**: References cached automatically in `references_cache/`
- **PMC support**: Full-text articles available

## Next Steps

- **Tutorial 2**: Advanced usage with data files and LinkML schemas (`validate data`)
- **Tutorial 3**: Python API for programmatic usage
- [Full Documentation](https://monarch-initiative.github.io/linkml-reference-validator)