Merged
Conversation
…ness of field names within sources
…the formatting function that does dict -> str
…ontent and extra keys to metadata
…ontent and extra keys to metadata
…ontent and extra keys to metadata
…ontent and extra keys to metadata
…ig should point to a string/number and not an object; recursive parsing of objects is not done; something to consider for future
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a configurable mechanism to extract additional per-source fields from raw reference API responses (via JSONPath), append them to the cached reference content as markdown sections, and persist which fields were captured in cache frontmatter/metadata.
Changes:
- Introduces
ReferenceValidationConfig.source_extra_fields(per-source field name → JSONPath). - Adds shared helpers to extract/format extra fields and integrates them into multiple sources (ClinicalTrials, PMID, Entrez, DOI).
- Persists
extra_fields_capturedto cache frontmatter and restores it on load; adds/updates tests and docs.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/linkml_reference_validator/models.py |
Adds source_extra_fields config field and documentation. |
src/linkml_reference_validator/etl/sources/utils.py |
New helpers to extract extra fields with JSONPath and format them for appending to content. |
src/linkml_reference_validator/etl/sources/clinicaltrials.py |
Extracts configured extra fields from raw CT.gov response and appends to content/metadata. |
src/linkml_reference_validator/etl/sources/pmid.py |
Extracts configured extra fields from Entrez record and appends to content/metadata. |
src/linkml_reference_validator/etl/sources/entrez.py |
Extracts configured extra fields for Entrez summary sources (incl. GEO) and appends to content/metadata. |
src/linkml_reference_validator/etl/sources/doi.py |
Extracts configured extra fields from Crossref/DataCite payloads and appends to abstract/metadata. |
src/linkml_reference_validator/etl/reference_fetcher.py |
Saves extra_fields_captured into YAML frontmatter and restores it into ReferenceContent.metadata. |
tests/test_source_utils.py |
New unit tests for extraction/formatting helpers. |
tests/test_sources.py |
Adds ClinicalTrials integration test for source_extra_fields; minor formatting changes. |
tests/test_reference_fetcher.py |
Adds save/load test for extra_fields_captured; minor formatting changes. |
README.md |
Documents source_extra_fields configuration and usage. |
.gitignore |
Ignores .cursor directory. |
…ls.py, specifying handling of lists in JSONPath expressions.
…quested which has text, content_type now is set to "summary" for this case instead of "unavailable"
…ent is empty but extra fields produce content in clinical trials source.
… fields are strings and no wierd structure that will break the yaml formatting when writing to disk
caufieldjh
approved these changes
Mar 16, 2026
Contributor
caufieldjh
left a comment
There was a problem hiding this comment.
Looks good - thanks @Reasat !
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Allows users to capture additional fields from reference API responses via config. JSONPath expressions are evaluated per source; extracted values are appended to reference content (with
### field_nameheadings) and are included in validation. Captured field names are stored in metadata and in the cache frontmatter.Changes
ReferenceValidationConfig.source_extra_fields— per-source map of field name → JSONPath. Docstring notes to prefer single-value paths; objects/arrays are stringified.extract_extra_fields(data, field_map)returnsdict[str, str];format_extra_fields_for_content(extra)formats for appending to content.source_extra_fields, extract from raw response, append tocontent, setmetadata["extra_fields_captured"]._save_to_diskwritesextra_fields_capturedto YAML frontmatter;_load_markdown_formatrestores it intoReferenceContent.metadata.source_extra_fieldssubsection under Configuration with example and commands.extract_extra_fields/format_extra_fields_for_content, ClinicalTrials fetch with extra fields, save/load ofextra_fields_captured.How to test
my-config.yamlwithvalidation.source_extra_fields(e.g. clinicaltrials eligibility JSONPath).uv run linkml-reference-validator cache reference clinicaltrials:NCT00001372 --config my-config.yamlreferences_cache/for### eligibility(or your field) andextra_fields_capturedin frontmatter.uv run linkml-reference-validator validate text "Inclusion: age" clinicaltrials:NCT00001372 --config my-config.yaml— should pass if text is in the extra section.Fixes #39