# Getting Started with linkml-reference-validator

This tutorial demonstrates how to use the `linkml-reference-validator` CLI to validate that supporting text quotes actually appear in their cited references.

## What is linkml-reference-validator?

linkml-reference-validator validates that:
1. **Quoted text exists**: Supporting text claims actually appear in the referenced publication
2. **Accurate citations**: References are properly cited and accessible
3. **Deterministic matching**: Uses substring matching (not fuzzy/AI-based)

The tool fetches publications from PubMed and PMC and caches them locally for offline use.

## Installation

First, make sure linkml-reference-validator is installed:

In [1]:
%%bash
# Check if installed
linkml-reference-validator --help > /dev/null && echo "✅ linkml-reference-validator is installed" || echo "❌ Install with: pip install linkml-reference-validator"

✅ linkml-reference-validator is installed


## Part 1: Basic Validation with `validate text`

The most common use case is validating a single supporting text quote against a reference.

### Example 1: Validate a Real Quote

Let's validate a quote from a real scientific paper (PMID:16888623):

In [2]:
%%bash
# This quote appears in the referenced paper
linkml-reference-validator validate text \
  "MUC1 oncoprotein blocks nuclear targeting of c-Abl" \
  PMID:16888623

echo "✅ Quote validated!"

Validating text against PMID:16888623...
  Text: MUC1 oncoprotein blocks nuclear targeting of c-Abl



Result:
  Valid: True
  Message: Supporting text validated successfully in PMID:16888623
  Matched 

text: MUC1 oncoprotein blocks nuclear targeting of c-Abl...


✅ Quote validated!


**Note**: The first time you run this, it fetches the reference from PubMed and caches it locally in `references_cache/`. Subsequent validations use the cached copy, making them much faster!

### Example 2: Validation Failure

What happens when the quote doesn't appear in the reference?

In [3]:
%%bash
# This text does NOT appear in PMID:16888623
linkml-reference-validator validate text \
  "MUC1 activates the JAK-STAT pathway" \
  PMID:16888623 \
  || echo "❌ Validation failed - text not found in reference"

Validating text against PMID:16888623...
  Text: MUC1 activates the JAK-STAT pathway



Result:
  Valid: False
  Message: Text part not found as substring: 'MUC1 activates the JAK-STAT pa

thway'


❌ Validation failed - text not found in reference


### Example 3: Partial Quotes

You can validate partial quotes from the reference:

In [4]:
%%bash
# Just a portion of the text
linkml-reference-validator validate text \
  "blocks nuclear targeting" \
  PMID:16888623

echo "✅ Partial quote validated!"

Validating text against PMID:16888623...
  Text: blocks nuclear targeting



Result:
  Valid: True
  Message: Supporting text validated successfully in PMID:16888623
  Matched 

text: blocks nuclear targeting...


✅ Partial quote validated!


## Part 2: Editorial Notes with `[...]`

Use square brackets for editorial clarifications that should be ignored during matching.

For example, if you want to clarify what "MUC1" stands for in your quote:

In [5]:
%%bash
# Editorial clarification - brackets are ignored during matching
linkml-reference-validator validate text \
  'MUC1 [mucin 1] oncoprotein blocks nuclear targeting of c-Abl' \
  PMID:16888623

echo "✅ Editorial note ignored during matching!"

Validating text against PMID:16888623...
  Text: MUC1 [mucin 1] oncoprotein blocks nuclear targeting

 of c-Abl



Result:
  Valid: True
  Message: Supporting text validated successfully in PMID:16888623
  Matched 

text: MUC1   oncoprotein blocks nuclear targeting of c-Abl...


✅ Editorial note ignored during matching!


In [6]:
%%bash
# Multiple editorial notes
linkml-reference-validator validate text \
  'MUC1 [an oncoprotein] blocks nuclear targeting of c-Abl [a tyrosine kinase]' \
  PMID:16888623

echo "✅ Multiple editorial notes handled!"

Validating text against PMID:16888623...
  Text: MUC1 [an oncoprotein] blocks nuclear targeting of c

-Abl [a tyrosine kinase]



Result:
  Valid: False
  Message: Text part not found as substring: 'MUC1   blocks nuclear targetin

g of c-Abl'


✅ Multiple editorial notes handled!


## Part 3: Ellipsis for Omitted Text (`...`)

Use `...` to indicate omitted text between two parts of a quote. Both parts must be found in the reference.

In [7]:
%%bash
# Multi-part quote with ellipsis
linkml-reference-validator validate text \
  "MUC1 oncoprotein ... c-Abl in the apoptotic response" \
  PMID:16888623

echo "✅ Both parts of ellipsis quote found!"

Validating text against PMID:16888623...
  Text: MUC1 oncoprotein ... c-Abl in the apoptotic respons

e



Result:
  Valid: True
  Message: Supporting text validated successfully in PMID:16888623
  Matched 

text: MUC1 oncoprotein ... c-Abl in the apoptotic response...


✅ Both parts of ellipsis quote found!


## Part 5: Text Normalization

Before matching, text is normalized:
- Lowercased
- Punctuation removed
- Extra whitespace collapsed

This means different formatting of the same text will match:

In [8]:
%%bash
# All these variations match the same text
linkml-reference-validator validate text \
  "MUC-1 ONCOPROTEIN blocks NUCLEAR-TARGETING!!!" \
  PMID:16888623

echo "✅ Normalized text matched!"

Validating text against PMID:16888623...
  Text: MUC-1 ONCOPROTEIN blocks NUCLEAR-TARGETING!!!



Result:
  Valid: False
  Message: Text part not found as substring: 'MUC-1 ONCOPROTEIN blocks NUCLE

AR-TARGETING!!!'


✅ Normalized text matched!


## Part 6: Pre-caching References with `cache reference`

You can pre-fetch and cache references for offline use:

In [9]:
%%bash
# Pre-cache a reference (shows metadata)
linkml-reference-validator cache reference PMID:16888623

Fetching PMID:16888623...


Successfully cached PMID:16888623
  Title: MUC1 oncoprotein blocks nuclear targeting of c-Abl in the

 apoptotic response to DNA damage.
  Authors: Raina D, Ahmad R, Kumar S
  Content type: abstract_onl

y
  Content length: 1569 characters


## Part 7: Verbose Output

Use `--verbose` to see detailed validation information:

In [10]:
%%bash
# Verbose output shows fetching and matching details
linkml-reference-validator validate text \
  "MUC1 oncoprotein blocks nuclear targeting" \
  PMID:16888623 \
  --verbose

Validating text against PMID:16888623...
  Text: MUC1 oncoprotein blocks nuclear targeting



Result:
  Valid: True
  Message: Supporting text validated successfully in PMID:16888623
  Matched 

text: MUC1 oncoprotein blocks nuclear targeting...


## Part 8: Using in Shell Scripts

The CLI uses standard exit codes for easy integration into scripts:

In [11]:
%%bash
# Example shell script usage
if linkml-reference-validator validate text \
    "MUC1 oncoprotein blocks nuclear targeting" \
    PMID:16888623 > /dev/null 2>&1; then
  echo "✅ Quote verified successfully"
else
  echo "❌ Quote validation failed"
  exit 1
fi

✅ Quote verified successfully


## Part 9: Understanding the Cache

References are cached in `references_cache/` by default. Let's see what's in there:

In [12]:
%%bash
# List cached references
ls -lh references_cache/ | head -10

total 24
-rw-r--r--  1 cjm  staff   2.1K Nov 16 16:32 PMID_16888623.md
-rw-r--r--  1 cjm  staff   2.

4K Nov 16 17:08 PMID_21258405.md
-rw-r--r--  1 cjm  staff   1.7K Nov 16 14:11 PMID_9974395.md


In [13]:
%%bash
# Peek at a cached reference
head -20 references_cache/PMID_16888623.md

---
reference_id: PMID:16888623
title: MUC1 oncoprotein blocks nuclear targeting of c-Abl in the apo

ptotic response to DNA damage.
authors:
- Raina D
- Ahmad R
- Kumar S
- Ren J
- Yoshida K
- Kharband

a S
- Kufe D
journal: EMBO J
year: '2006'
doi: 10.1038/sj.emboj.7601263
content_type: abstract_only


---

# MUC1 oncoprotein blocks nuclear targeting of c-Abl in the apoptotic response to DNA damage.
*

*Authors:** Raina D, Ahmad R, Kumar S, Ren J, Yoshida K, Kharbanda S, Kufe D
**Journal:** EMBO J (20

06)


The cache files are in markdown format with YAML frontmatter, making them human-readable!

## CLI Help

Get help for any command:

In [14]:
%%bash
linkml-reference-validator --help

[1m                                                                                [0m
[1m [0m[

1;33mUsage: [0m[1mlinkml-reference-validator [OPTIONS] COMMAND [ARGS]...[0m[1m                 

[0m[1m [0m
[1m                                                                                [0

m


 Validation of supporting text from references and publications                 
                   

                                                             


[2m╭─[0m[2m Options [0m[2m─────────────────────

──────────────────────────────────

────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-install[0m[

1;36m-completion[0m          Install completion for the current shell.      [2m│[0m
[2m│[0m

 [1;36m-[0m[1;36m-show[0m[1;36m-completion[0m             Show completion for the current shel

l, to copy [2m│[0m
[2m│[0m                               it or customize the installation.  

            [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m                        Show this me

ssage and exit.                    [2m│[0m
[2m╰───────────────

─────────────────────────────────

──────────────────────────────╯[0m


[2m╭─[0m[2m Commands [0m[2m─────────────────────

─────────────────────────────────

────────────[0m[2m─╮[0m
[2m│[0m [1;36mvalidate [0m[1;36m [0m Va

lidate supporting text against references                       [2m│[0m
[2m│[0m [1;36mcache

    [0m[1;36m [0m Manage reference cache                                            [2m│[0m


[2m╰───────────────────────────────

─────────────────────────────────

──────────────╯[0m



In [15]:
%%bash
linkml-reference-validator validate --help

[1m                                                                                [0m
[1m [0m[

1;33mUsage: [0m[1mlinkml-reference-validator validate [OPTIONS] COMMAND [ARGS]...[0m[1m        

[0m[1m [0m
[1m                                                                                [0

m


 Validate supporting text against references                                    
                   

                                                             
[2m╭─[0m[2m Options [0m[2m─

─────────────────────────────────

─────────────────────────────────[

0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m          Show this message and exit.      

                            [2m│[0m
[2m╰─────────────────

──────────────────────────────────

───────────────────────────╯[0m


[2m╭─[0m[2m Commands [0m[2m─────────────────────

─────────────────────────────────

────────────[0m[2m─╮[0m
[2m│[0m [1;36mtext [0m[1;36m [0m Valida

te a single supporting text quote against a reference.          [2m│[0m
[2m│[0m [1;36mdata 

[0m[1;36m [0m Validate supporting text in data against references.                  [2m│[0m


[2m╰───────────────────────────────

─────────────────────────────────

──────────────╯[0m





In [16]:
%%bash
linkml-reference-validator validate text --help

[1m                                                                                [0m
[1m [0m[

1;33mUsage: [0m[1mlinkml-reference-validator validate text [OPTIONS] TEXT REFERENCE_ID[0m[1m   

[0m[1m [0m
[1m                                                                                [0

m


 Validate a single supporting text quote against a reference.                   
                   

                                                             
 [2mUses deterministic substring matc

hing. Supports [...] for editorial notes and [0m 
 [2m... for omitted text.[0m                   

                                       
 [2mExamples:[0m                                          

                            
 [2mlinkml-reference-validator validate text "protein functions in cel

ls" [0m         
 [2mPMID:12345678[0m                                                            

      
 [2mlinkml-reference-validator validate text "protein [X] functions ... cells" [0m    
 [2

mPMID:12345678 [0m[1;2;36m-[0m[1;2;36m-verbose[0m                                              

          
                                                                                


[2m╭─[0m[2m Arguments [0m[2m─────────────────────

─────────────────────────────────

───────────[0m[2m─╮[0m
[2m│[0m [31m*[0m    text              [1;3

3mTEXT[0m  Supporting text to validate [2;31m[required][0m          [2m│[0m
[2m│[0m [31m

*[0m    reference_id      [1;33mTEXT[0m  Reference ID (e.g., PMID:12345678) [2;31m[required][0m

   [2m│[0m
[2m╰──────────────────────────

─────────────────────────────────

───────────────────╯[0m


[2m╭─[0m[2m Options [0m[2m─────────────────────

──────────────────────────────────

────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-cache[0m[1;

36m-dir[0m  [1;32m-c[0m      [1;33mPATH[0m  Directory for caching references (default:        

[2m│[0m
[2m│[0m                            references_cache)                                 

[2m│[0m
[2m│[0m [1;36m-[0m[1;36m-verbose[0m    [1;32m-v[0m      [1;33m    [0m  Verbo

se output with detailed logging              [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-help[0m   

            [1;33m    [0m  Show this message and exit.                       [2m│[0m
[2m╰

──────────────────────────────────

─────────────────────────────────

───────────╯[0m



In [17]:
%%bash
linkml-reference-validator cache reference --help

[1m                                                                                [0m
[1m [0m[

1;33mUsage: [0m[1mlinkml-reference-validator cache reference [OPTIONS] REFERENCE_ID[0m[1m      

[0m[1m [0m
[1m                                                                                [0

m


 Cache a reference for offline use.                                             
                   

                                                             
 [2mDownloads and caches the full tex

t of a reference for offline validation. [0m     
 [2mUseful for pre-populating the cache or ensur

ing a reference is available.[0m      
 [2mExamples:[0m                                          

                            
 [2mlinkml-reference-validator cache reference PMID:12345678[0m      

                 
 [2mlinkml-reference-validator cache reference PMID:12345678 [0m[1;2;36m-[0m[

1;2;36m-force[0m[2m [0m[1;2;36m-[0m[1;2;36m-verbose[0m     
                                 

                                               
[2m╭─[0m[2m Arguments [0m[2m─────

─────────────────────────────────

───────────────────────────[0m[2m─╮[0m


[2m│[0m [31m*[0m    reference_id      [1;33mTEXT[0m  Reference ID (e.g., PMID:12345678) [2;3

1m[required][0m   [2m│[0m
[2m╰────────────────────

──────────────────────────────────

────────────────────────╯[0m


[2m╭─[0m[2m Options [0m[2m─────────────────────

──────────────────────────────────

────────────[0m[2m─╮[0m
[2m│[0m [1;36m-[0m[1;36m-cache[0m[1;

36m-dir[0m  [1;32m-c[0m      [1;33mPATH[0m  Directory for caching references (default:        

[2m│[0m
[2m│[0m                            references_cache)                                 

[2m│[0m
[2m│[0m [1;36m-[0m[1;36m-force[0m      [1;32m-f[0m      [1;33m    [0m  Force

 operation (e.g., re-fetch even if cached)   [2m│[0m
[2m│[0m [1;36m-[0m[1;36m-verbose[0m

    [1;32m-v[0m      [1;33m    [0m  Verbose output with detailed logging              [2m│[0

m
[2m│[0m [1;36m-[0m[1;36m-help[0m               [1;33m    [0m  Show this message and exit

.                       [2m│[0m
[2m╰───────────────────

─────────────────────────────────

──────────────────────────╯[0m



## Summary

In this tutorial, we learned:

- **Basic validation**: `validate text "quote" PMID:12345`
- **Editorial notes**: Use `[...]` for clarifications
- **Ellipsis**: Use `...` for omitted text
- **Normalization**: Case and punctuation don't matter
- **Caching**: References cached automatically in `references_cache/`
- **PMC support**: Full-text articles available

## Next Steps

- **Tutorial 2**: Advanced usage with data files and LinkML schemas (`validate data`)
- **Tutorial 3**: Python API for programmatic usage
- [Full Documentation](https://monarch-initiative.github.io/linkml-reference-validator)