Skip to content

junieberry/BAGEL

Repository files navigation

🥯 BAGEL: Bayesian Active Learning with Gaussian Processes Guided by LLM relevance scoring

arXiv Venue: ACL 2026 Findings Task: passage retrieval Python 3.11 Dataset: ir-datasets

Official code repository for the ACL 2026 Findings paper, "Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval".

🔍 Overview

BAGEL Overview

BAGEL is a retrieval reranking framework that propagates sparse LLM relevance signals with Gaussian Process-based active learning, enabling efficient exploration across the embedding space.

🛠️ Environment Setup

Prerequisites

  • Python 3.11.5
  • For local unsloth/... models: an NVIDIA GPU with CUDA support and a CUDA-compatible PyTorch environment
  • If GPU/CUDA is unavailable, use an API-backed model (for example, openai/gpt4o) instead

Install

git clone https://github.com/junieberry/BAGEL
cd BAGEL
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

🗂️ Datasets

Experiments are conducted on four datasets:

Note: Accessing robust04 may require additional setup. See ir-datasets robust04 docs.

🚀 Quick Start

Run BAGEL 🥯

python bagel.py \
  --dataset_name covid \
  --llm_name unsloth/Qwen3-14B-unsloth-bnb-4bit \
  --kernel rbf \
  --acq_fun ucb \
  --llm_budget 50 \
  --warm_start 25

Supported options:

  • --dataset_name: covid, nfcorpus, robust, traveldest
  • --llm_name: unsloth/Qwen3-14B-unsloth-bnb-4bit, openai/gpt4o
  • --kernel: rbf, linear, matern
  • --acq_fun: ucb, ei, pi, thompson, random, dense

Baselines 📏

Detailed baseline commands and notes are available in the Baseline Guide

📚 Citation

If you find this work useful, please cite:

@misc{kim2026bagel,
  title={Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval},
  author={Junyoung Kim and Anton Korikov and Jiazhou Liang and Justin Cui and Yifan Simon Liu and Qianfeng Wen and Mark Zhao and Scott Sanner},
  year={2026},
  eprint={2604.17906},
  archivePrefix={arXiv},
  primaryClass={cs.IR},
  url={https://arxiv.org/abs/2604.17906}
}

About

🥯 [ACL 2026 Findings] Bayesian Active Learning with Gaussian Processes Guided by LLM Relevance Scoring for Dense Passage Retrieval

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages