Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction
Sejun Park · Yoonah Park · Jongwon Lim · Yohan Jo
Graduate School of Data Science, Seoul National University
Estimating the persuasiveness of a message is inherently personalized — the same argument that works on one reader may flop on another. Effective prediction therefore has to reason about who the reader is, grounded in their past writings rather than static demographics.
ReCAP is a user-profiling framework with two trainable modules: a query generator that pulls persuasion-relevant records out of a user's history, and a profiler that compresses those records into a short textual profile conditioned on the current post. Both modules are trained via DPO with a record-level persuasion utility signal derived directly from downstream view-change F1 — no manual preference annotation, no predictor retraining, 6–13× lower inference cost than full-history summarization baselines.
Figure 2. Top — inference pipeline: retrieval → profiling → view-change prediction. Bottom — three-stage training: profiler DPO, record-level persuasion-utility scoring, query-generator DPO. Flame icons denote trainable modules, snowflakes denote frozen ones.
| No personalization | Retrieval-only | Ours | |
|---|---|---|---|
| Llama-3.1-8B-Instruct | 0.3457 | 0.2952 | 0.4000 |
| Llama-3.3-70B-Instruct | 0.3284 | 0.4177 | 0.4661 |
| GPT-4o-mini | 0.2525 | 0.1323 | 0.2787 |
The framework also generalises beyond CMV to other personalization datasets with different characteristics (§4.3, paper Tables 4–5 — Llama-3.1-8B-Instruct as the predictor):
| Dataset | Task | Metric | Base profiler | Ours |
|---|---|---|---|---|
| PRISM | preference (binary) | F1 | 0.5051 | 0.5879 |
| OpinionQA | survey stance (MCQ) | Acc | 0.4014 | 0.5306 |
Full numbers in docs/reproducing_results.md.
- Installation
- Quickstart
- Reproducing the paper
- Training from scratch
- Repository layout
- Dataset
- Citation
- License
Requires Python ≥ 3.10.
git clone https://github.com/holi-lab/ReCAP.git
cd ReCAP
pip install -e . # core library + CLIs
pip install -e ".[vllm]" # optional: local vLLM backend for training / inference
cp .env.example .env # fill in OPENROUTER_API_KEY (+ HF_TOKEN for gated models)
set -a && source .env && set +a # or use direnv / a dotenv loader of choiceEnd-to-end view-change prediction on 10 CMV test posts with Llama-3.1-8B as the predictor, via OpenRouter (no GPU required):
# 1. Fetch CMV test split from HuggingFace.
python scripts/prepare_cmv.py --output-dir data/cmv --report-stats
# 2. Run the full pipeline (QG → retrieve → profile → predict).
python scripts/run_pipeline.py \
--config configs/inference/pipeline_llama8b.yaml \
--dataset cmv --split test --limit 10 \
--output outputs/cmv_test_llama8b.jsonl
# 3. Aggregate F1 / AUC (Table 2).
python scripts/eval_end_to_end.py --predictions outputs/cmv_test_llama8b.jsonlSwap configs/inference/pipeline_llama8b.yaml for pipeline_llama70b.yaml
or pipeline_gpt4o_mini.yaml to switch predictors.
The table below maps every main-body paper artifact to the exact command. Full
walk-through in docs/reproducing_results.md.
| Paper artifact | Command |
|---|---|
| Table 1 — retrieval NCG@5 / NDCG@5 | scripts/eval_retrieval.py |
| Table 2 — end-to-end F1 / AUC (3 predictors) | scripts/run_pipeline.py → scripts/eval_end_to_end.py |
| Table 3 — retriever × profiler × predictor grid | scripts/eval_grid.py --grid configs/inference/grid_table3.yaml |
| Table 4 — PRISM (F1) | scripts/run_pipeline.py --config configs/inference/pipeline_llama8b_prism.yaml --dataset prism |
| Table 4 — OpinionQA (Acc.) | scripts/run_pipeline.py --config configs/inference/pipeline_llama8b_opinionqa.yaml --dataset opinionqa |
| Table 5 — retrieval comparison on PRISM / OpinionQA | scripts/eval_retrieval.py --input data/{prism,opinionqa}/val_qg_candidates.jsonl |
| Table 8 — dataset statistics | scripts/prepare_cmv.py --report-stats |
| Tables 9, 10, 11 — hyperparameters | argparse defaults in recap/training/profiler_dpo.py, query_generator_dpo.py; CLI flags in build_query_preferences.py |
Full recipe for porting ReCAP to a new predictor (e.g. a newer Llama
release or a proprietary model). All 7 steps are scripted, and the
training-time configs live in configs/train/.
The same 7 steps work for CMV / PRISM / OpinionQA — set DATASET to whichever
you want to target. See docs/reproducing_results.md for per-dataset notes on
prompts (profiler / predictor) and the --task-type / --dataset flags.
DATASET=cmv # or prism | opinionqa
# (0) Materialize raw splits locally (HF stream → JSONL).
python scripts/prepare_cmv.py --dataset $DATASET --splits train,val,test
# (1) Score per-record persuasion utility (§3.3.2, m=3 groupings, T=0.7).
python scripts/score_utility.py \
--input data/$DATASET/train.jsonl \
--output data/$DATASET/train_utility.jsonl \
--config configs/train/bootstrap_profiler_$DATASET.yaml
# (2) Build profiler DPO pairs (Appendix C.1: G=10 groupings × 16 candidates × K=4 × δ=0.05).
python scripts/build_profile_preferences.py \
--input data/$DATASET/train_utility.jsonl \
--output data/$DATASET/profiler_dpo_train.jsonl \
--config configs/train/bootstrap_profiler_$DATASET.yaml
# (3) Train the profiler (LoRA-DPO, paper Table 9 defaults).
# Use the dataset-specific profiler prompt:
# cmv → recap/profiler/prompts/profile_generation.yaml
# opinionqa → recap/profiler/prompts/profile_generation_opinionqa.yaml
# prism → recap/profiler/prompts/profile_generation_prism.yaml
python scripts/train_profiler.py \
--data data/$DATASET/profiler_dpo_train.jsonl \
--prompt recap/profiler/prompts/profile_generation.yaml \
--output-dir outputs/profiler-dpo-llama8b
# (4) Generate QG candidates (stage-1 @ T=0 + 16 stage-2 @ T=0.8 + NDCG@5).
# --dataset picks the QG prompt pair; --pick-best also writes
# `retrieval_query` (argmax-NDCG) used by step 7.
python scripts/generate_qg_candidates.py \
--input data/$DATASET/train_utility.jsonl \
--output data/$DATASET/train_qg_candidates.jsonl \
--config configs/train/qg_candidates.yaml \
--dataset $DATASET \
--pick-best
# (5) Build QG DPO pairs (Appendix D.4; per-predictor thresholds from Table 10).
python scripts/build_query_preferences.py \
--input data/$DATASET/train_qg_candidates.jsonl \
--output data/$DATASET/qg_stage2_dpo_train.jsonl \
--mode stage2 --pos-threshold 0.65 --neg-threshold 0.55
# (6) Train the query generator (LoRA-DPO, paper Table 11 defaults).
python scripts/train_query_generator.py --stage stage2 \
--data data/$DATASET/qg_stage2_dpo_train.jsonl \
--output-dir outputs/qg-stage2-llama8b
# (7) Fine-tune the BGE-M3 retriever (MNRL, top-5 F1-scored positives).
python scripts/train_retriever.py \
--data data/$DATASET/train_qg_candidates.jsonl \
--output-dir outputs/bge-m3-tunedAfter step (7), point retriever.model_name_or_path in the target dataset's
configs/inference/pipeline_*.yaml at the fine-tuned checkpoint.
ReCAP/
├── recap/ # Installable Python library (pip install -e .)
│ ├── retrieval/ # BGE-M3 dense + BM25 sparse + RRF hybrid pool (§3.1)
│ ├── query_generator/ # Two-stage QG with Appendix D.6 prompts (§3.3.3)
│ ├── profiler/ # LocalProfiler + OpenRouterProfiler + Appendix C.3 prompt
│ ├── predictor/ # View-change predictor + Appendix B.1 / B.2 / B.3 prompts
│ ├── training/ # DPO trainers + BGE-M3 MNRL fine-tune + utility scoring
│ ├── bootstrap/ # DPO preference-pair builders + QG candidate generation
│ ├── pipeline/ # End-to-end orchestration + metrics (F1, AUC, NCG, NDCG)
│ └── data/ # HuggingFace loader for holi-lab/ReCAP_datatset
│
├── configs/
│ ├── train/ # Configs consumed via `--config` (bootstrap, QG candidates)
│ └── inference/ # Per-predictor pipeline configs + Table 3 grid definition
│
├── scripts/ # Thin CLI wrappers — one per training / inference stage
├── baselines/ # PAG, HSUMM, RECURSUMM, HyDE, Demographic reference implementations
└── docs/ # Reproduction recipe · dataset schema · pipeline walk-through
The raw CMV / OpinionQA / PRISM splits are published on HuggingFace Hub.
Record-level utility scores and QG candidates are produced locally from
these splits via the scripts in scripts/ (see the recipe above).
from recap.data import extract_comments, extract_passages, load_raw_split
cmv_test = load_raw_split("cmv", "test") # 168 posts (HF Dataset)
row = cmv_test[0]
passages = extract_passages(row) # list[str] — user history
comments = extract_comments(row) # list[{"text", "label"}]Schema helpers and statistics are documented in docs/data.md.
If you find this work useful, please cite:
@inproceedings{park2026recap,
title = {Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction},
author = {Park, Sejun and Park, Yoonah and Lim, Jongwon and Jo, Yohan},
booktitle = {Findings of the Association for Computational Linguistics: ACL 2026},
year = {2026},
eprint = {2601.05654},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2601.05654}
}This code is released under the MIT License.
This work was supported by the Creative-Pioneering Researchers Program through Seoul National University, and by the National Research Foundation of Korea (NRF) grants RS-2024-00333484 and RS-2024-00414981 funded by the Korean government (MSIT).
