🥯 BAGEL: Bayesian Active Learning with Gaussian Processes Guided by LLM relevance scoring

Official code repository for the ACL 2026 Findings paper, "Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval".

🔍 Overview

BAGEL is a retrieval reranking framework that propagates sparse LLM relevance signals with Gaussian Process-based active learning, enabling efficient exploration across the embedding space.

🛠️ Environment Setup

Prerequisites

Python 3.11.5
For local unsloth/... models: an NVIDIA GPU with CUDA support and a CUDA-compatible PyTorch environment
If GPU/CUDA is unavailable, use an API-backed model (for example, openai/gpt4o) instead

Install

git clone https://github.com/junieberry/BAGEL
cd BAGEL
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

🗂️ Datasets

Experiments are conducted on four datasets:

covid, nfcorpus, robust04: loaded via ir-datasets
traveldest: available under data/traveldest/

Note: Accessing robust04 may require additional setup. See ir-datasets robust04 docs.

🚀 Quick Start

Run BAGEL 🥯

python bagel.py \
  --dataset_name covid \
  --llm_name unsloth/Qwen3-14B-unsloth-bnb-4bit \
  --kernel rbf \
  --acq_fun ucb \
  --llm_budget 50 \
  --warm_start 25

Supported options:

--dataset_name: covid, nfcorpus, robust, traveldest
--llm_name: unsloth/Qwen3-14B-unsloth-bnb-4bit, openai/gpt4o
--kernel: rbf, linear, matern
--acq_fun: ucb, ei, pi, thompson, random, dense

Baselines 📏

Detailed baseline commands and notes are available in the Baseline Guide

📚 Citation

If you find this work useful, please cite:

@misc{kim2026bagel,
  title={Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval},
  author={Junyoung Kim and Anton Korikov and Jiazhou Liang and Justin Cui and Yifan Simon Liu and Qianfeng Wen and Mark Zhao and Scott Sanner},
  year={2026},
  eprint={2604.17906},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2604.17906}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
asset		asset
data/traveldest		data/traveldest
src		src
.gitignore		.gitignore
README.md		README.md
bagel.py		bagel.py
cross_encoder.py		cross_encoder.py
dense_retriever.py		dense_retriever.py
listwise_llm.py		listwise_llm.py
pointwise_label.py		pointwise_label.py
pointwise_llm.py		pointwise_llm.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥯 BAGEL: Bayesian Active Learning with Gaussian Processes Guided by LLM relevance scoring

🔍 Overview

🛠️ Environment Setup

Prerequisites

Install

🗂️ Datasets

🚀 Quick Start

Run BAGEL 🥯

Baselines 📏

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🥯 BAGEL: Bayesian Active Learning with Gaussian Processes Guided by LLM relevance scoring

🔍 Overview

🛠️ Environment Setup

Prerequisites

Install

🗂️ Datasets

🚀 Quick Start

Run BAGEL 🥯

Baselines 📏

📚 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages