AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI

AgentSLR is an open-source pipeline for conducting end-to-end systematic literature reviews in epidemiology. It brings together article retrieval, title and abstract screening, PDF-to-Markdown OCR conversion, full-text screening, structured extraction and report generation in one modular codebase.

Systematic literature reviews are essential for synthesising scientific evidence but are costly, difficult to scale and time-intensive, creating bottlenecks for evidence-based policy. We study whether large language models can automate the complete systematic review workflow, from article retrieval, article screening, data extraction to report synthesis. Applied to epidemiological reviews of nine WHO-designated priority pathogens and validated against expert-curated ground truth, our open-source agentic pipeline (AgentSLR) achieves performance comparable to human researchers while reducing review time from approximately 7 weeks to 20 hours (a 58x speed-up). Our comparison of five frontier models reveals that performance on SLR is driven less by model size or inference cost than by each model's distinctive capabilities. Through human-in-the-loop validation, we identify key failure modes. Our results demonstrate that agentic AI can substantially accelerate scientific evidence synthesis in specialised domains.

AgentSLR automates the review flow from search and retrieval to report (living review) generation.

Repository Overview

Directory	Purpose	Documentation
`src/harvest/`	Metadata retrieval and PDF download	README
`src/screening/`	Abstract and full-text screening	README
`src/ocr/`	PDF-to-Markdown conversion	README
`src/extraction/`	Structured data extraction	README
`src/analysis/`	Report generation	README
`scripts/`	Shell wrappers for all stages	See subdirectory READMEs
`eval/`	Evaluation against PERG ground truth	README
`notebooks/`	Paper figures and statistics	README

main.py is the single CLI entrypoint for the pipeline. It is set to work with nine WHO (World Health Organization) priority pathogens including Marburg, Ebola, Lassa, SARS, Zika, MERS, Nipah, Rift Valley fever and Crimean-Congo haemorrhagic fever.

Data Workspace

The scripts allow for configurable directories to store model artefacts and results across each stage. Due to the critical nature of epidemiology data, and the hundreds of thousands of articles processed per review, we store outputs for each stage of the pipeline, as these are often required to meet reporting standards.

Assuming data/agentslr is set as the main output directory, the directory structure would look like the tree below. The structure is simple: harvest artefacts live once per pathogen, while LLM-specific outputs live under client/<client_dir_name>/... so different reasoning models can be compared against the same corpus.

data/
├── agentslr/
│   ├── harvests/
│   │   └── <pathogen>/
│   │       ├── harvest_metadata.csv
│   │       ├── harvest_downloaded_pdfs.csv
│   │       ├── articles_with_markdown.csv
│   │       ├── pdfs/
│   │       └── ocr/
│   │           └── <ocr_client>/
│   │               └── markdown/
│   └── client/
│       └── <client_dir_name>/
│           └── <pathogen>/
│               ├── screening/
│               │   ├── abstract_screening.csv
│               │   └── fulltext_screening.csv
│               ├── extractions/
│               │   ├── data_extraction_parameters.jsonl
│               │   ├── data_extraction_models.csv
│               │   └── data_extraction_outbreaks.csv
│               ├── report/
│               └── logs/
└── perg/
    ├── screening/
    └── extracted/

client_dir_name is either inferred from --model-name or set explicitly with --client-dir-name.

Environment Setup

Base environment

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Use this environment for harvest, screening, extraction, evaluation and report generation.

Note: This environment can be used for OCR as well, if using Mistral API.

OCR environment

If you want to run local OCR backends such as GLM or Paddle, create a dedicated OCR environment:

python3 -m venv .venv-ocr
source .venv-ocr/bin/activate
pip install --upgrade pip

# Install a torch build that matches your machine first
pip install torch

# Install exactly one Paddle runtime
# CPU example:
pip install paddlepaddle

# GPU example:
# pip install paddlepaddle-gpu

pip install -r requirements-ocr.txt

API keys and config

You can pass credentials directly on the command line or store them in config.json:

{
  "openai_api_key": "your-openai-api-key",
  "openrouter_api_key": "your-openrouter-api-key",
  "mistral_api_key": "your-mistral-api-key"
}

Useful environment variables:

export OPENAI_API_KEY="..." # Optional if using local hosted model
export OPENALEX_API_KEY="..." # Optional
export NCBI_API_KEY="..." # Optional
export NCBI_EMAIL="you@example.com" # Optional
export UNPAYWALL_EMAIL="you@example.com" # Optional
export MISTRAL_API_KEY="..." # Optional if using local hosted model

Running AgentSLR

Quickstart

Run the full pipeline:

scripts/pipeline/run_all_stages.sh \
  <pathogen> <data-dir> <model-name> <client-dir-name> \
  --base-url <api-endpoint> \
  --api-key <api-key> \
  --ocr-client mistral \
  --config-json config.json
# e.g. lassa data/agentslr gpt-oss-120b gpt_oss_120b --base-url http://localhost:1738/v1

Run individual stages:

python main.py \
  --stage <stage> \
  --pathogen <pathogen> \
  --data-dir <data-dir> \
  --model-name <model-name> \
  --client-dir-name <client-dir-name>

Detailed Usage Arguments

The recommended entrypoints are the shell wrappers in scripts/. They cover all stages.

⚠️ The pipeline defaults are currently tuned for reasoning-capable models and OpenAI-compatible backends that accept arguments such as reasoning_effort, reasoning, and in some cases provider-specific extra_body fields. If you use a model or server that does not support these arguments, you may encounter request failures such as connection issues or 400 errors.

See scripts/ subdir

The key main.py arguments for pipeline runs are:

Argument	Required?	What to pass	Why it matters
`--stage`	Yes	A stage from the table above, most often `run_all`.	Selects which pipeline block to run.
`--pathogen`	Yes	One of `marburg`, `ebola`, `lassa`, `sars`, `zika`, `nipah`, `rvf`, `cchf`, `mers`.	Chooses the review topic and query set.
`--data-dir`	Optional	A workspace root such as `data/agentslr`.	Controls where harvests, screening outputs, extractions and reports are written.
`--model-name`	Recommended	The reasoning model name, for example `openai/gpt-oss-120b`	Determines the client/model used for screening and extraction stages.
`--client-dir-name`	Recommended	A stable short label such as `gpt_oss_120b`.	Keeps outputs for different reasoning models separated under `client/<client_dir_name>/...`.
`--base-url`	Optional	API endpoint such as `http://localhost:6767/v1`, `https://api.openai.com/v1` or `https://openrouter.ai/api/v1`.	Needed when using OpenAI-compatible hosted or local servers.
`--api-key`	Optional	Matching API key, often `6767` for local vLLM.	Authenticates requests to the configured client endpoint.
`--config-json`	Optional	Usually `config.json`.	Loads saved API keys such as OpenAI, OpenRouter or Mistral credentials.
`--resume-from`	Optional	A later stage such as `ocr`, `fulltext_screen` or `write_up_parameters`.	Only used with `--stage run_all`; resumes the pipeline from that stage onward.
`--ocr-client`	Optional	`mistral`, `glm` or `paddle`.	Selects the OCR backend for the OCR stage inside `run_all` or standalone `ocr`.
`--ocr-python-bin`	Optional	Usually `.venv-ocr/bin/python`.	Lets `run_all` switch just the OCR stage into the dedicated OCR environment.
`--report-model-name`	Optional	A report model such as `openai/gpt-oss-120b`.	Lets write-up refinement use a different model from screening/extraction.

The arguments that can be passed through --stage determine, which functionality of the review pipeline is run. The options are:

Stage	What it does	Typical use
`harvest`	Fetches metadata, deduplicates results and downloads PDFs.	Start a new review corpus.
`run_all`	Runs the full pipeline in order: harvest, abstract screening, OCR, full-text screening, extraction and write-up.	Standard end-to-end run.
`abstract_screen`	Screens titles and abstracts using the configured reasoning model.	Re-run abstract inclusion/exclusion without repeating harvest.
`ocr`	Converts PDFs into Markdown using `mistral`, `glm` or `paddle`.	Prepare article text for full-text screening and downstream extraction.
`fulltext_screen`	Screens the OCR Markdown at the full-text stage.	Re-run full-text inclusion/exclusion after OCR or prompt/model changes.
`data_extraction`	Alias for the first extraction stage. Maps to `data_extraction_parameters`.	Resume from the extraction block with a shorter stage name.
`data_extraction_parameters`	Extracts epidemiological parameter data.	Parameter-focused extraction runs.
`data_extraction_models`	Extracts transmission and modelling-study information.	Model-study extraction runs.
`data_extraction_outbreaks`	Extracts outbreak-event information.	Outbreak-focused extraction runs.
`write_up`	Alias for the first write-up stage. Maps to `write_up_parameters`.	Resume from the reporting block with a shorter stage name.
`write_up_parameters`	Generates the parameter report.	Parameter report generation.
`write_up_models`	Generates the modelling report.	Model report generation.
`write_up_outbreaks`	Generates the outbreak report.	Outbreak report generation.

--stage run_all runs the full end-to-end sequence. If you also pass --resume-from <stage>, it starts from that stage and continues through the rest of the pipeline.

Running With OpenRouter Or Local vLLM

The screening, extraction and report stages all use the same OpenAI-compatible client interface. In practice that means you can point AgentSLR at OpenAI, OpenRouter or a local vLLM server by changing --base-url, --api-key and --model-name.

OpenRouter

python main.py \
  --stage fulltext_screen \
  --pathogen lassa \
  --data-dir data/agentslr \
  --model-name openai/gpt-oss-120b \
  --client-dir-name gpt_oss \
  --base-url https://openrouter.ai/api/v1 \
  --config-json config.json \
  --fulltext-screening-mode direct_fulltext

Local vLLM

Serve a local model:

scripts/serve_vllm/serve_client.sh

The helper defaults to port 6767, API key 6767, model openai/gpt-oss-20b, GPU IDs 0, tensor parallel size 1, --max-model-len 131072, and --async-scheduling. You can override them with optional args such as --port, --api-key, --model-name, --gpu-ids, --tp-size, and repeated --extra-vllm-arg.

You can also run the equivalent command directly:

vllm serve openai/gpt-oss-120b \
  --port 6767 \
  --api-key 6767 \
  --tensor-parallel-size 2 \
  --max-model-len 131072 \
  --async-scheduling

Then point AgentSLR at it:

scripts/pipeline/run_all_stages.sh \
  lassa data/agentslr openai/gpt-oss-120b gpt_oss_120b \
  --base-url http://localhost:6767/v1 \
  --api-key 6767

Additional recipe-style serving notes for DeepSeek, Kimi and GLM are in scripts/serve_vllm/README.md.

Evaluation

python -m eval.run_eval <evaluation> \
  --pathogen <Pathogen> \
  --screened <path-to-results> \
  --output-dir <output-dir>

See eval/README.md for full evaluation documentation.

Citation

If you use AgentSLR, please cite:

@misc{padarha2025agentslr,
  title={AgentSLR: Automating Systematic Literature Reviews
         in Epidemiology with Agentic AI},
  author={Padarha, Shreyansh and Kearns, Ryan Othniel
          and Naidoo, Tristan and Yang, Lingyi
          and Borchmann, {\L}ukasz and B{\l}aszczyk, Piotr
          and Morgenstern, Christian and McCabe, Ruth
          and Bhatia, Sangeeta and Torr, Philip H.
          and Foerster, Jakob and Hale, Scott A.
          and Rawson, Thomas and Cori, Anne
          and Semenova, Elizaveta and Mahdi, Adam},
  year={2026},
  eprint={2603.22327},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2603.22327}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
data		data
eval		eval
notebooks		notebooks
scripts		scripts
src		src
utils		utils
.gitignore		.gitignore
README.MD		README.MD
args.py		args.py
config.py		config.py
main.py		main.py
requirements-ocr.txt		requirements-ocr.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI

Repository Overview

Data Workspace

Environment Setup

Base environment

OCR environment

API keys and config

Running AgentSLR

Quickstart

Detailed Usage Arguments

Running With OpenRouter Or Local vLLM

OpenRouter

Local vLLM

Evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI

Repository Overview

Data Workspace

Environment Setup

Base environment

OCR environment

API keys and config

Running AgentSLR

Quickstart

Detailed Usage Arguments

Running With OpenRouter Or Local vLLM

OpenRouter

Local vLLM

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages