This repository contains code for our scalable VeriScore implementation, adapted from the original VeriScore repo.
The code supports two LLM backend engines, vLLM for efficient LLM inference on the local machine, and matrix for scalable LLM inference on a remote server.
It can be run in two modes: offline and online.
- Offline Mode
- Scores a prediction file on multiple GPUs using SLURM.
- Online Mode
- Starts a web server that can process requests in real-time.
- Make a new Python 3.10+ environment using
virtualenvorconda. - Install from source using
pip - Download
en_core_web_smusingspacylibrary by runningpython -m spacy download en_core_web_sm - The evidence retrieval step is performed with Google Search via the Serper API, so you will need to get a Serper API key here.
pip install -e .
python -m spacy download en_core_web_sm
export SERPER_KEY_PRIVATE={your_serper_api_key}
The following script launches ScalableVeriScore on multiple GPUs (or nodes) using SLURM.
See data/data_sample.jsonl for the format of the input JSONL file.
vllm_mistral_extract and peft_verify are the names for the finetuned claim extractor and claim verifier in the original VeriScore paper.
python -m veriscore.stool run name=<run_name> \
partition=learn \
nodes=1 \
tasks=8 \
ngpu=8 \
account=<your_slurm_account> \
qos=<your_slurm_qos> \
time=1440 \
root_dump_dir=<root_dump_dir> \
args=" --data_dir ./data/ --output_dir ./outputs/ --input_file <input_jsonl_path> --model_name_extraction vllm_mistral_extract --extraction_llm_backend vllm --extraction_prompt_format finetuned --model_name_verification peft_verify --verification_llm_backend transformers --verification_prompt_format finetuned "
Running ScalableVeriScore as an online server for processing requests in real-time requires setting up a matrix cluster with the LLM inference workers.
The following command starts a matrix cluster with 4 nodes (32 GPUs) and deploys 8 replicas of the Llama-3.3-70B-Instruct model:
matrix start_cluster --force_new_head
matrix start_cluster --add_workers 4
matrix deploy_applications --applications '[{"name": "llama3_3_70b", "model_name": "meta-llama/Llama-3.3-70B-Instruct", "min_replica": 8, "max_replica": 16, "model_size": "3_3_70B"}]'
Once the matrix cluster is running, the following command starts an online ScalableVeriScore API server.
python -m veriscore.veriscore_server
hypercorn veriscore.veriscore_server:app --bind 0.0.0.0:42000 --workers 8 --log-file - --access-logfile - --access-logformat '%(h)s %(l)s %(l)s %(t)s "%(r)s" %(s)s %(b)s "%(L)s" "%(a)s"'
Once the API server is running, you can issue requests to it via:
curl -X POST http://<server_addr>:<server_port>/veriscore -H 'Content-Type: application/json' -d '{"question": <question_here>, "response": <response_here>, "last_sentence_only": false}'
If you use this repo, please cite the following paper:
@misc{chen2025learning,
title={Learning to Reason for Factuality},
author={Xilun Chen and Ilia Kulikov and Vincent-Pierre Berges and Barlas Oğuz and Rulin Shao and Gargi Ghosh and Jason Weston and Wen-tau Yih},
year={2025},
eprint={2508.05618},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2508.05618},
}
ScalableVeriScore is licensed under CC-BY-NC, however portions of the project are available under separate license terms: VeriScore is licensed under the Apache 2.0 license.