Skip to content

finitearth/caliope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Can Calibration of Positional Encodings Enhance Long Context Utilization?

This repository contains the official implementation for the paper Can Calibration of Positional Encodings Enhance Long Context Utilization?, which was accepted to the Findings Track of the EACL 2026 in Rabat, Morocco.

This work investigates the "Lost in the Middle" (LiM) phenomenon, a positional bias in Large Language Models that negatively impacts long-context Retrieval Augmented Generation (RAG). We explore whether Rotary Position Embeddings (RoPE) are a primary cause of this issue. Our findings led to the development of Caliope (Calibration of Positional Encodings), a training-free framework that modifies RoPE inputs at inference time to mitigate this bias.


Abstract

Large language models suffer from positional biases like the "Lost in the Middle" (LiM) phenomenon and recency bias, which reduce the effective utilization of long contexts. In this work, we investigate the role of Positional Encodings in this context. Our empirical study confirms the persistence of these biases in modern large language models. Drawing on these findings, we introduce Caliope, a training-free framework for calibrating Positional Encodings at inference time. Our calibrators yield substantial improvements on needle-in-a-haystack and cross-chunk reasoning benchmarks, and offer a practical, lightweight method for improving long-context utilization.


Citations:

If you use this work, please cite:

@inproceedings{zehle-assenmacher-2026-calibration,
    title = "Can Calibration of Positional Encodings Enhance Long Context Utilization?",
    author = "Zehle, Tom and A{\"s}senmacher, Matthias",
    booktitle = "Findings of the Association for Computational Linguistics: EACL 2026",
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.findings-eacl.120/",
    pages = "2268--2280"
}

Repository Structure

The repository is organized as follows:

.
├── caliope/                  # Source code for the CALIOPE framework and experiments
│   ├── analysis/
│   │   └── utils/
│   ├── experiment/
│   │   ├── configs.py
│   │   ├── evaluation.py
│   │   ├── experiment_utils.py
│   │   ├── llm_as_judge.py
│   │   └── llm.py
│   └── calibrators.py          # Core implementation of the calibrators (Moses, Hourglass, Decay)
├── notebooks/                # Jupyter notebooks for data analysis and visualization
│   ├── cross_chunk_reasoning.ipynb
│   ├── lim_calibrators.ipynb
│   └── lim.ipynb
├── results/                  # Directory for storing raw experimental results
├── scripts/                  # Scripts to run experiments and generate figures
│   ├── create_figs.py        # Generates figures from the paper
│   ├── extract_nq.py         # file to create nq dataset files
│   ├── run_experiment.py     # Main script to run experiments
│   ├── run_evaluation.py     # Script that runs evaluation of experiments
│   └── run_batches.py        # Script running the above two files when required
├── pyproject.toml            # Project dependencies for Poetry
├── poetry.lock
└── README.md

Installation

To run experiments or contribute to this project, it is recommended to follow these steps to set up Poetry and install dependencies.

  1. Clone or fork this repository.

    git clone <your-repository-url>
    cd <repository-name>
  2. Install pipx:

    pip install --user pipx
  3. Install Poetry:

    pipx install poetry
  4. Install project dependencies:

    poetry install

Reproducing the Experiments

You can replicate the experiments and figures presented in the paper using the provided scripts.

Creating LiM Dataset

For the experiments of Lost in the Middle you need to create the dataset with the scripts of the original experiments, that can be found here. In our experiments we used num-total-documents of 20, 50, and 100.

Than run the file scripts/extract_nq.py by running:

poetry run python/extract_nq.py <path_to_zip_files> data/nq_lim_20.parquet

The datasets required can be found inside of caliope/experiment/configs.py - the default config looks for data/nq_lim_20.parquet(for $d=20$), data/nq_lim_50.parquet(for $d=50$), and data/nq_lim_100.parquet(for $d=100$).

Full Experiment Suite

To run the full suite of experiments, use the run_batches.py script. This script automatically finds experiment configurations defined in caliope/experiment/configs.py that have not yet been run or evaluated and executes them.

  1. Run the Batches:

    poetry run python scripts/run_batches.py --experiment retrieval_calibrators
  2. Generate Figures: Once all experiments are complete, generate the figures from the paper:

    poetry run python scripts/create_figs.py

Minimal Example

To run a single, minimal example for a quick test, use the run_experiment.py script directly with a specific experiment name and index.

Run a single experiment:

poetry run python scripts/run_experiment.py lim_calibrated 0

Calibrators

The training-free calibrators introduced in the paper are designed as drop-in replacements for models using RoPE. They work by applying a strictly monotonic transformation function to the positional encoding inputs without altering model parameters.

The core implementations for the Moses, Hourglass, and Decay calibrators can be found in: caliope/experiment/calibrators.py.


About

This is the Official Implementation of "Can Calibration of Positional Encodings Enhance Long Context Utilization?" by Tom Zehle and Matthias Aßenmacher

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages