ReCAP

Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

Sejun Park · Yoonah Park · Jongwon Lim · Yohan Jo

Graduate School of Data Science, Seoul National University

Overview

Estimating the persuasiveness of a message is inherently personalized — the same argument that works on one reader may flop on another. Effective prediction therefore has to reason about who the reader is, grounded in their past writings rather than static demographics.

ReCAP is a user-profiling framework with two trainable modules: a query generator that pulls persuasion-relevant records out of a user's history, and a profiler that compresses those records into a short textual profile conditioned on the current post. Both modules are trained via DPO with a record-level persuasion utility signal derived directly from downstream view-change F1 — no manual preference annotation, no predictor retraining, 6–13× lower inference cost than full-history summarization baselines.

_{Figure 2. Top — inference pipeline: retrieval → profiling → view-change prediction. Bottom — three-stage training: profiler DPO, record-level persuasion-utility scoring, query-generator DPO. Flame icons denote trainable modules, snowflakes denote frozen ones.}

Headline results on ChangeMyView (CMV test set, F1)

	No personalization	Retrieval-only	Ours
Llama-3.1-8B-Instruct	0.3457	0.2952	0.4000
Llama-3.3-70B-Instruct	0.3284	0.4177	0.4661
GPT-4o-mini	0.2525	0.1323	0.2787

The framework also generalises beyond CMV to other personalization datasets with different characteristics (§4.3, paper Tables 4–5 — Llama-3.1-8B-Instruct as the predictor):

Dataset	Task	Metric	Base profiler	Ours
PRISM	preference (binary)	F1	0.5051	0.5879
OpinionQA	survey stance (MCQ)	Acc	0.4014	0.5306

Full numbers in docs/reproducing_results.md.

Installation

Requires Python ≥ 3.10.

git clone https://github.com/holi-lab/ReCAP.git
cd ReCAP

pip install -e .                 # core library + CLIs
pip install -e ".[vllm]"         # optional: local vLLM backend for training / inference

cp .env.example .env             # fill in OPENROUTER_API_KEY (+ HF_TOKEN for gated models)
set -a && source .env && set +a  # or use direnv / a dotenv loader of choice

Quickstart

End-to-end view-change prediction on 10 CMV test posts with Llama-3.1-8B as the predictor, via OpenRouter (no GPU required):

# 1. Fetch CMV test split from HuggingFace.
python scripts/prepare_cmv.py --output-dir data/cmv --report-stats

# 2. Run the full pipeline (QG → retrieve → profile → predict).
python scripts/run_pipeline.py \
    --config configs/inference/pipeline_llama8b.yaml \
    --dataset cmv --split test --limit 10 \
    --output outputs/cmv_test_llama8b.jsonl

# 3. Aggregate F1 / AUC (Table 2).
python scripts/eval_end_to_end.py --predictions outputs/cmv_test_llama8b.jsonl

Swap configs/inference/pipeline_llama8b.yaml for pipeline_llama70b.yaml or pipeline_gpt4o_mini.yaml to switch predictors.

Reproducing the paper

The table below maps every main-body paper artifact to the exact command. Full walk-through in docs/reproducing_results.md.

Paper artifact	Command
Table 1 — retrieval NCG@5 / NDCG@5	`scripts/eval_retrieval.py`
Table 2 — end-to-end F1 / AUC (3 predictors)	`scripts/run_pipeline.py` → `scripts/eval_end_to_end.py`
Table 3 — retriever × profiler × predictor grid	`scripts/eval_grid.py --grid configs/inference/grid_table3.yaml`
Table 4 — PRISM (F1)	`scripts/run_pipeline.py --config configs/inference/pipeline_llama8b_prism.yaml --dataset prism`
Table 4 — OpinionQA (Acc.)	`scripts/run_pipeline.py --config configs/inference/pipeline_llama8b_opinionqa.yaml --dataset opinionqa`
Table 5 — retrieval comparison on PRISM / OpinionQA	`scripts/eval_retrieval.py --input data/{prism,opinionqa}/val_qg_candidates.jsonl`
Table 8 — dataset statistics	`scripts/prepare_cmv.py --report-stats`
Tables 9, 10, 11 — hyperparameters	argparse defaults in `recap/training/profiler_dpo.py`, `query_generator_dpo.py`; CLI flags in `build_query_preferences.py`

Training from scratch

Full recipe for porting ReCAP to a new predictor (e.g. a newer Llama release or a proprietary model). All 7 steps are scripted, and the training-time configs live in configs/train/.

The same 7 steps work for CMV / PRISM / OpinionQA — set DATASET to whichever you want to target. See docs/reproducing_results.md for per-dataset notes on prompts (profiler / predictor) and the --task-type / --dataset flags.

DATASET=cmv  # or prism | opinionqa

# (0) Materialize raw splits locally (HF stream → JSONL).
python scripts/prepare_cmv.py --dataset $DATASET --splits train,val,test

# (1) Score per-record persuasion utility (§3.3.2, m=3 groupings, T=0.7).
python scripts/score_utility.py \
    --input data/$DATASET/train.jsonl \
    --output data/$DATASET/train_utility.jsonl \
    --config configs/train/bootstrap_profiler_$DATASET.yaml

# (2) Build profiler DPO pairs (Appendix C.1: G=10 groupings × 16 candidates × K=4 × δ=0.05).
python scripts/build_profile_preferences.py \
    --input data/$DATASET/train_utility.jsonl \
    --output data/$DATASET/profiler_dpo_train.jsonl \
    --config configs/train/bootstrap_profiler_$DATASET.yaml

# (3) Train the profiler (LoRA-DPO, paper Table 9 defaults).
#     Use the dataset-specific profiler prompt:
#       cmv       → recap/profiler/prompts/profile_generation.yaml
#       opinionqa → recap/profiler/prompts/profile_generation_opinionqa.yaml
#       prism     → recap/profiler/prompts/profile_generation_prism.yaml
python scripts/train_profiler.py \
    --data data/$DATASET/profiler_dpo_train.jsonl \
    --prompt recap/profiler/prompts/profile_generation.yaml \
    --output-dir outputs/profiler-dpo-llama8b

# (4) Generate QG candidates (stage-1 @ T=0 + 16 stage-2 @ T=0.8 + NDCG@5).
#     --dataset picks the QG prompt pair; --pick-best also writes
#     `retrieval_query` (argmax-NDCG) used by step 7.
python scripts/generate_qg_candidates.py \
    --input data/$DATASET/train_utility.jsonl \
    --output data/$DATASET/train_qg_candidates.jsonl \
    --config configs/train/qg_candidates.yaml \
    --dataset $DATASET \
    --pick-best

# (5) Build QG DPO pairs (Appendix D.4; per-predictor thresholds from Table 10).
python scripts/build_query_preferences.py \
    --input data/$DATASET/train_qg_candidates.jsonl \
    --output data/$DATASET/qg_stage2_dpo_train.jsonl \
    --mode stage2 --pos-threshold 0.65 --neg-threshold 0.55

# (6) Train the query generator (LoRA-DPO, paper Table 11 defaults).
python scripts/train_query_generator.py --stage stage2 \
    --data data/$DATASET/qg_stage2_dpo_train.jsonl \
    --output-dir outputs/qg-stage2-llama8b

# (7) Fine-tune the BGE-M3 retriever (MNRL, top-5 F1-scored positives).
python scripts/train_retriever.py \
    --data data/$DATASET/train_qg_candidates.jsonl \
    --output-dir outputs/bge-m3-tuned

After step (7), point retriever.model_name_or_path in the target dataset's configs/inference/pipeline_*.yaml at the fine-tuned checkpoint.

Repository layout

ReCAP/
├── recap/                         # Installable Python library (pip install -e .)
│   ├── retrieval/                 # BGE-M3 dense + BM25 sparse + RRF hybrid pool (§3.1)
│   ├── query_generator/           # Two-stage QG with Appendix D.6 prompts (§3.3.3)
│   ├── profiler/                  # LocalProfiler + OpenRouterProfiler + Appendix C.3 prompt
│   ├── predictor/                 # View-change predictor + Appendix B.1 / B.2 / B.3 prompts
│   ├── training/                  # DPO trainers + BGE-M3 MNRL fine-tune + utility scoring
│   ├── bootstrap/                 # DPO preference-pair builders + QG candidate generation
│   ├── pipeline/                  # End-to-end orchestration + metrics (F1, AUC, NCG, NDCG)
│   └── data/                      # HuggingFace loader for holi-lab/ReCAP_datatset
│
├── configs/
│   ├── train/                     # Configs consumed via `--config` (bootstrap, QG candidates)
│   └── inference/                 # Per-predictor pipeline configs + Table 3 grid definition
│
├── scripts/                       # Thin CLI wrappers — one per training / inference stage
├── baselines/                     # PAG, HSUMM, RECURSUMM, HyDE, Demographic reference implementations
└── docs/                          # Reproduction recipe · dataset schema · pipeline walk-through

Dataset

The raw CMV / OpinionQA / PRISM splits are published on HuggingFace Hub. Record-level utility scores and QG candidates are produced locally from these splits via the scripts in scripts/ (see the recipe above).

from recap.data import extract_comments, extract_passages, load_raw_split

cmv_test = load_raw_split("cmv", "test")                # 168 posts (HF Dataset)
row      = cmv_test[0]
passages = extract_passages(row)                         # list[str] — user history
comments = extract_comments(row)                         # list[{"text", "label"}]

Schema helpers and statistics are documented in docs/data.md.

Citation

If you find this work useful, please cite:

@inproceedings{park2026recap,
  title         = {Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction},
  author        = {Park, Sejun and Park, Yoonah and Lim, Jongwon and Jo, Yohan},
  booktitle     = {Findings of the Association for Computational Linguistics: ACL 2026},
  year          = {2026},
  eprint        = {2601.05654},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2601.05654}
}

License

This code is released under the MIT License.

Acknowledgements

This work was supported by the Creative-Pioneering Researchers Program through Seoul National University, and by the National Research Foundation of Korea (NRF) grants RS-2024-00333484 and RS-2024-00414981 funded by the Korean government (MSIT).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
baselines		baselines
configs		configs
data		data
docs		docs
recap		recap
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReCAP

Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

Overview

Headline results on ChangeMyView (CMV test set, F1)

Table of contents

Installation

Quickstart

Reproducing the paper

Training from scratch

Repository layout

Dataset

Citation

License

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

ReCAP

Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

Overview

Headline results on ChangeMyView (CMV test set, F1)

Table of contents

Installation

Quickstart

Reproducing the paper

Training from scratch

Repository layout

Dataset

Citation

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 1

Languages

Packages