VerifyRef

A tool for verifying the authenticity of academic references in PDF documents using multiple academic databases and optional AI-powered analysis.

Important Note for Reviewers
This tool may produce false positives — authentic references can sometimes be flagged as suspicious or unverified. This can happen due to:

New papers not yet indexed in databases

Author name format variations (e.g., "J. Smith" vs "John Smith")

Regional or specialized venues with limited database coverage

OCR/extraction errors from PDF processing

Always manually verify flagged references before making decisions. VerifyRef is a screening tool to assist human reviewers, not a replacement for careful manual checking.

Why VerifyRef?

While reviewing a journal submission, I found a reference that listed my brother, a businessman with no connection to cryptography, as a co-author of a paper on symmetric-key cryptanalysis with a well-known researcher. My brother had nothing to do with this paper. This triggered me to inspect that reference and others in the paper, which turned out to be partially AI-generated with multiple fake references.

Manually checking dozens of references was time-consuming, so I created VerifyRef to automatically extract and verify references against trusted academic databases. Here is the summary of the output for that paper:

                   Verification Summary                   
╭──────────────────────────┬───────┬────────────┬────────╮
│ Classification           │ Count │ Percentage │ Status │
├──────────────────────────┼───────┼────────────┼────────┤
│ [+] AUTHENTIC            │    11 │      61.1% │   *    │
│ [?] SUSPICIOUS           │     6 │      33.3% │   *    │
│ [X] FAKE                 │     0 │       0.0% │   -    │
│ [~] AUTHOR MANIPULATION  │     1 │       5.6% │   *    │
│ [-] FABRICATED           │     0 │       0.0% │   -    │
│ [!] INCONCLUSIVE         │     0 │       0.0% │   -    │
╰──────────────────────────┴───────┴────────────┴────────╯

[REVIEW RECOMMENDED] Some references require manual verification

This tool helps reviewers quickly identify potentially problematic references and AI-generated content, making the peer review process more efficient. Note that VerifyRef is not a replacement for human judgment but a powerful assistant to streamline the verification process. The tool may occasionally misclassify authentic references, so always double-check flagged items manually.

Features

Multi-database verification across 8+ academic databases
PDF processing using GROBID (works out of the box with public server)
Retraction detection via CrossRef and Retraction Watch
Author manipulation detection (real titles with fake authors)
Optional AI verification using free (Gemini, Groq, Ollama) or paid (OpenAI) providers
Book reference handling for textbooks that may not appear in paper databases
Parallel processing with multi-threaded database queries
JSON and text output formats

Installation

From PyPI (Recommended)

pip install verifyref

# Run verification
verifyref paper.pdf -o results.txt

From Source

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Run verification (uses public GROBID server automatically)
python verifyref.py paper.pdf -o results.txt

Docker Installation

git clone https://github.com/hadipourh/verifyref.git
cd verifyref
docker build -t verifyref .

# Interactive mode
docker run -it --rm -v "$(pwd):/app/workspace" verifyref

# Inside the container:
cd /app/workspace/
verifyref paper.pdf -o results.txt

Local GROBID (Optional)

For faster processing or privacy, run GROBID locally:

docker run -d -p 8070:8070 lfoppiano/grobid:0.8.2
export GROBID_URL="http://localhost:8070"
python verifyref.py paper.pdf

VerifyRef automatically detects and uses local GROBID when available.

Usage

Basic Usage

# Verify references in a PDF
python verifyref.py paper.pdf -o results.txt

# Search for a specific citation
python verifyref.py --cite "Differential Cryptanalysis of DES"

# Verify a single reference
python verifyref.py --verify "Author, A.: Title. Venue, 2024"

Advanced Options

# Verification rigor levels
python verifyref.py paper.pdf --rigor strict    # High precision
python verifyref.py paper.pdf --rigor balanced  # Default
python verifyref.py paper.pdf --rigor lenient   # High recall

# Context-aware search
python verifyref.py --cite "cryptanalysis" --context cs
python verifyref.py --cite "gene therapy" --context bio

# AI-enhanced verification
python verifyref.py paper.pdf --enable-ai

# Verbose output
python verifyref.py paper.pdf --verbose

AI Verification Setup

VerifyRef supports multiple AI providers. Ollama is recommended for unlimited free usage:

# Option 1: Ollama (free, local, no rate limits)
brew install ollama
ollama serve
ollama pull llama3.2
export AI_PROVIDER="ollama"
python verifyref.py paper.pdf --enable-ai

# Option 2: Google Gemini (free tier)
export AI_PROVIDER="gemini"
export GOOGLE_GEMINI_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

# Option 3: Groq (free tier)
export AI_PROVIDER="groq"
export GROQ_API_KEY="your-key"
python verifyref.py paper.pdf --enable-ai

Classification System

VerifyRef uses a 5-category system to evaluate reference authenticity:

Category	Criteria	Action
AUTHENTIC	High similarity (>55%), multiple database matches	Accept
SUSPICIOUS	Moderate similarity (25-55%), limited evidence	Manual review
FABRICATED	Very low similarity (<25%), no database matches	Investigate
AUTHOR_MANIPULATION	Title matches but authors differ significantly	Flag misconduct
INCONCLUSIVE	Parsing errors, books, or network issues	Re-verify

Retracted papers are flagged with a warning regardless of classification.

Database Integration

Primary Databases (no API key required):

OpenAlex - Comprehensive coverage (200M+ works)
DBLP - Computer Science
IACR - Cryptography
ArXiv - Preprints
CrossRef - DOI metadata and retraction status

Enhanced with API Keys (optional):

Semantic Scholar - Higher rate limits
PubMed - Biomedical (NCBI key)
Springer Nature - STM publications

Smart Fallback:

Google Scholar - Used only when other databases find poor matches (<70% similarity)

Configuration

Edit config.py to configure:

# Required
CROSSREF_EMAIL = "your.email@domain.com"

# Optional API keys
SEMANTIC_SCHOLAR_API_KEY = ""
NCBI_API_KEY = ""
SPRINGER_API_KEY = ""

# AI providers (for --enable-ai)
GOOGLE_GEMINI_API_KEY = ""
GROQ_API_KEY = ""
OPENAI_API_KEY = ""

# Database toggles
ENABLE_CROSSREF = True
ENABLE_GOOGLE_SCHOLAR = True

GROBID Configuration

VerifyRef uses a smart fallback chain for PDF processing:

Public GROBID server (default, no setup required)
Local GROBID (if running on localhost:8070)
PyMuPDF fallback (lower accuracy, used when GROBID unavailable)

Override the default GROBID URL:

export GROBID_URL="http://localhost:8070"

Project Structure

verifyref/
├── verifyref.py              # CLI entry point
├── config.py                 # Configuration
├── grobid/
│   ├── client.py             # GROBID client with smart fallback
│   └── fallback_parser.py    # PyMuPDF fallback parser
├── extractor/
│   └── reference_parser.py   # Reference parsing
├── verifier/
│   ├── multi_database_verifier.py
│   ├── classifier.py         # Classification logic
│   ├── ai_verifier.py        # AI verification
│   ├── doi_validation_client.py  # DOI and retraction checking
│   └── *_client.py           # Database clients
└── utils/
    ├── helpers.py
    ├── report_generator.py
    └── ...

Troubleshooting

Issue	Solution
No references found	Check PDF quality; try a different PDF
GROBID timeout	Public server may be busy; try local GROBID
High INCONCLUSIVE rate	Use `--rigor lenient`
AI rate limits	Use Ollama (no limits) or wait for cooldown

Ethical Usage

VerifyRef follows strict ethical guidelines:

API-only access (no web scraping)
Respects all service rate limits
No personal data collection
Proper attribution in requests

Contributing

See contributing.md for guidelines.

License

GNU General Public License v3 (GPLv3)

Documentation

Technical Documentation - Architecture and API reference
Ethical Guidelines - Usage policies
Contributing - Development guidelines

Caution

VerifyRef is designed to assist in verification of academic references and should not be used as a sole determinant of reference authenticity. It is intended to complement human judgment in the peer review process.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
extractor		extractor
grobid		grobid
test		test
utils		utils
verifier		verifier
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
boomeyong.pdf		boomeyong.pdf
config.py		config.py
contributing.md		contributing.md
ethical_guidelines.md		ethical_guidelines.md
exceptions.py		exceptions.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
technical_documentation.md		technical_documentation.md
verifyref.py		verifyref.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VerifyRef

Why VerifyRef?

Features

Installation

From PyPI (Recommended)

From Source

Docker Installation

Local GROBID (Optional)

Usage

Basic Usage

Advanced Options

AI Verification Setup

Classification System

Database Integration

Configuration

GROBID Configuration

Project Structure

Troubleshooting

Ethical Usage

Contributing

License

Documentation

Caution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

hadipourh/verifyref

Folders and files

Latest commit

History

Repository files navigation

VerifyRef

Why VerifyRef?

Features

Installation

From PyPI (Recommended)

From Source

Docker Installation

Local GROBID (Optional)

Usage

Basic Usage

Advanced Options

AI Verification Setup

Classification System

Database Integration

Configuration

GROBID Configuration

Project Structure

Troubleshooting

Ethical Usage

Contributing

License

Documentation

Caution

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages