Skip to content

holi-lab/ReCAP

Repository files navigation

ReCAP

Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

Sejun Park · Yoonah Park · Jongwon Lim · Yohan Jo

Graduate School of Data Science, Seoul National University

Paper Venue Dataset License Python


Overview

Estimating the persuasiveness of a message is inherently personalized — the same argument that works on one reader may flop on another. Effective prediction therefore has to reason about who the reader is, grounded in their past writings rather than static demographics.

ReCAP is a user-profiling framework with two trainable modules: a query generator that pulls persuasion-relevant records out of a user's history, and a profiler that compresses those records into a short textual profile conditioned on the current post. Both modules are trained via DPO with a record-level persuasion utility signal derived directly from downstream view-change F1 — no manual preference annotation, no predictor retraining, 6–13× lower inference cost than full-history summarization baselines.

ReCAP pipeline overview

Figure 2. Top — inference pipeline: retrieval → profiling → view-change prediction. Bottom — three-stage training: profiler DPO, record-level persuasion-utility scoring, query-generator DPO. Flame icons denote trainable modules, snowflakes denote frozen ones.

Headline results on ChangeMyView (CMV test set, F1)

No personalization Retrieval-only Ours
Llama-3.1-8B-Instruct 0.3457 0.2952 0.4000
Llama-3.3-70B-Instruct 0.3284 0.4177 0.4661
GPT-4o-mini 0.2525 0.1323 0.2787

The framework also generalises beyond CMV to other personalization datasets with different characteristics (§4.3, paper Tables 4–5 — Llama-3.1-8B-Instruct as the predictor):

Dataset Task Metric Base profiler Ours
PRISM preference (binary) F1 0.5051 0.5879
OpinionQA survey stance (MCQ) Acc 0.4014 0.5306

Full numbers in docs/reproducing_results.md.


Table of contents


Installation

Requires Python ≥ 3.10.

git clone https://github.com/holi-lab/ReCAP.git
cd ReCAP

pip install -e .                 # core library + CLIs
pip install -e ".[vllm]"         # optional: local vLLM backend for training / inference

cp .env.example .env             # fill in OPENROUTER_API_KEY (+ HF_TOKEN for gated models)
set -a && source .env && set +a  # or use direnv / a dotenv loader of choice

Quickstart

End-to-end view-change prediction on 10 CMV test posts with Llama-3.1-8B as the predictor, via OpenRouter (no GPU required):

# 1. Fetch CMV test split from HuggingFace.
python scripts/prepare_cmv.py --output-dir data/cmv --report-stats

# 2. Run the full pipeline (QG → retrieve → profile → predict).
python scripts/run_pipeline.py \
    --config configs/inference/pipeline_llama8b.yaml \
    --dataset cmv --split test --limit 10 \
    --output outputs/cmv_test_llama8b.jsonl

# 3. Aggregate F1 / AUC (Table 2).
python scripts/eval_end_to_end.py --predictions outputs/cmv_test_llama8b.jsonl

Swap configs/inference/pipeline_llama8b.yaml for pipeline_llama70b.yaml or pipeline_gpt4o_mini.yaml to switch predictors.

Reproducing the paper

The table below maps every main-body paper artifact to the exact command. Full walk-through in docs/reproducing_results.md.

Paper artifact Command
Table 1 — retrieval NCG@5 / NDCG@5 scripts/eval_retrieval.py
Table 2 — end-to-end F1 / AUC (3 predictors) scripts/run_pipeline.pyscripts/eval_end_to_end.py
Table 3 — retriever × profiler × predictor grid scripts/eval_grid.py --grid configs/inference/grid_table3.yaml
Table 4 — PRISM (F1) scripts/run_pipeline.py --config configs/inference/pipeline_llama8b_prism.yaml --dataset prism
Table 4 — OpinionQA (Acc.) scripts/run_pipeline.py --config configs/inference/pipeline_llama8b_opinionqa.yaml --dataset opinionqa
Table 5 — retrieval comparison on PRISM / OpinionQA scripts/eval_retrieval.py --input data/{prism,opinionqa}/val_qg_candidates.jsonl
Table 8 — dataset statistics scripts/prepare_cmv.py --report-stats
Tables 9, 10, 11 — hyperparameters argparse defaults in recap/training/profiler_dpo.py, query_generator_dpo.py; CLI flags in build_query_preferences.py

Training from scratch

Full recipe for porting ReCAP to a new predictor (e.g. a newer Llama release or a proprietary model). All 7 steps are scripted, and the training-time configs live in configs/train/.

The same 7 steps work for CMV / PRISM / OpinionQA — set DATASET to whichever you want to target. See docs/reproducing_results.md for per-dataset notes on prompts (profiler / predictor) and the --task-type / --dataset flags.

DATASET=cmv  # or prism | opinionqa

# (0) Materialize raw splits locally (HF stream → JSONL).
python scripts/prepare_cmv.py --dataset $DATASET --splits train,val,test

# (1) Score per-record persuasion utility (§3.3.2, m=3 groupings, T=0.7).
python scripts/score_utility.py \
    --input data/$DATASET/train.jsonl \
    --output data/$DATASET/train_utility.jsonl \
    --config configs/train/bootstrap_profiler_$DATASET.yaml

# (2) Build profiler DPO pairs (Appendix C.1: G=10 groupings × 16 candidates × K=4 × δ=0.05).
python scripts/build_profile_preferences.py \
    --input data/$DATASET/train_utility.jsonl \
    --output data/$DATASET/profiler_dpo_train.jsonl \
    --config configs/train/bootstrap_profiler_$DATASET.yaml

# (3) Train the profiler (LoRA-DPO, paper Table 9 defaults).
#     Use the dataset-specific profiler prompt:
#       cmv       → recap/profiler/prompts/profile_generation.yaml
#       opinionqa → recap/profiler/prompts/profile_generation_opinionqa.yaml
#       prism     → recap/profiler/prompts/profile_generation_prism.yaml
python scripts/train_profiler.py \
    --data data/$DATASET/profiler_dpo_train.jsonl \
    --prompt recap/profiler/prompts/profile_generation.yaml \
    --output-dir outputs/profiler-dpo-llama8b

# (4) Generate QG candidates (stage-1 @ T=0 + 16 stage-2 @ T=0.8 + NDCG@5).
#     --dataset picks the QG prompt pair; --pick-best also writes
#     `retrieval_query` (argmax-NDCG) used by step 7.
python scripts/generate_qg_candidates.py \
    --input data/$DATASET/train_utility.jsonl \
    --output data/$DATASET/train_qg_candidates.jsonl \
    --config configs/train/qg_candidates.yaml \
    --dataset $DATASET \
    --pick-best

# (5) Build QG DPO pairs (Appendix D.4; per-predictor thresholds from Table 10).
python scripts/build_query_preferences.py \
    --input data/$DATASET/train_qg_candidates.jsonl \
    --output data/$DATASET/qg_stage2_dpo_train.jsonl \
    --mode stage2 --pos-threshold 0.65 --neg-threshold 0.55

# (6) Train the query generator (LoRA-DPO, paper Table 11 defaults).
python scripts/train_query_generator.py --stage stage2 \
    --data data/$DATASET/qg_stage2_dpo_train.jsonl \
    --output-dir outputs/qg-stage2-llama8b

# (7) Fine-tune the BGE-M3 retriever (MNRL, top-5 F1-scored positives).
python scripts/train_retriever.py \
    --data data/$DATASET/train_qg_candidates.jsonl \
    --output-dir outputs/bge-m3-tuned

After step (7), point retriever.model_name_or_path in the target dataset's configs/inference/pipeline_*.yaml at the fine-tuned checkpoint.

Repository layout

ReCAP/
├── recap/                         # Installable Python library (pip install -e .)
│   ├── retrieval/                 # BGE-M3 dense + BM25 sparse + RRF hybrid pool (§3.1)
│   ├── query_generator/           # Two-stage QG with Appendix D.6 prompts (§3.3.3)
│   ├── profiler/                  # LocalProfiler + OpenRouterProfiler + Appendix C.3 prompt
│   ├── predictor/                 # View-change predictor + Appendix B.1 / B.2 / B.3 prompts
│   ├── training/                  # DPO trainers + BGE-M3 MNRL fine-tune + utility scoring
│   ├── bootstrap/                 # DPO preference-pair builders + QG candidate generation
│   ├── pipeline/                  # End-to-end orchestration + metrics (F1, AUC, NCG, NDCG)
│   └── data/                      # HuggingFace loader for holi-lab/ReCAP_datatset
│
├── configs/
│   ├── train/                     # Configs consumed via `--config` (bootstrap, QG candidates)
│   └── inference/                 # Per-predictor pipeline configs + Table 3 grid definition
│
├── scripts/                       # Thin CLI wrappers — one per training / inference stage
├── baselines/                     # PAG, HSUMM, RECURSUMM, HyDE, Demographic reference implementations
└── docs/                          # Reproduction recipe · dataset schema · pipeline walk-through

Dataset

The raw CMV / OpinionQA / PRISM splits are published on HuggingFace Hub. Record-level utility scores and QG candidates are produced locally from these splits via the scripts in scripts/ (see the recipe above).

HuggingFace Dataset

from recap.data import extract_comments, extract_passages, load_raw_split

cmv_test = load_raw_split("cmv", "test")                # 168 posts (HF Dataset)
row      = cmv_test[0]
passages = extract_passages(row)                         # list[str] — user history
comments = extract_comments(row)                         # list[{"text", "label"}]

Schema helpers and statistics are documented in docs/data.md.

Citation

If you find this work useful, please cite:

@inproceedings{park2026recap,
  title         = {Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction},
  author        = {Park, Sejun and Park, Yoonah and Lim, Jongwon and Jo, Yohan},
  booktitle     = {Findings of the Association for Computational Linguistics: ACL 2026},
  year          = {2026},
  eprint        = {2601.05654},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2601.05654}
}

License

This code is released under the MIT License.

Acknowledgements

This work was supported by the Creative-Pioneering Researchers Program through Seoul National University, and by the National Research Foundation of Korea (NRF) grants RS-2024-00333484 and RS-2024-00414981 funded by the Korean government (MSIT).

About

Official code release for "Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction" (ACL 2026 Findings)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages