Skip to content
View personalization-trap's full-sized avatar

Block or report personalization-trap

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Personalization Trap

This repository is an anonymous, sata-bench-style scaffold for evaluating memory-conditioned LLM behavior on a demographic fairness dataset and fitting random-effects models over the resulting outcomes described in The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs.

Repository Structure

src/personalization_trap/
├── evaluation/
│   ├── dataset/          # dataset loading and preparation
│   └── metrics/          # accuracy / flip-rate helpers
├── methods/
│   ├── inference/        # vLLM inference pipeline
│   ├── random_effects/   # mixed-effects analysis
│   └── utils/            # prompt / extraction / IO helpers
├── configs/              # example TOML configs
scripts/
├── run_inference.py
└── run_random_effects.py
tests/

What Is Included

  1. Standard inference code for groupfairnessllm/random_effect_example.
  2. vLLM inference over system_prompt + user_prompt, with support for open models such as Qwen/Qwen3-4B.
  3. A second-stage extraction pipeline to normalize raw generations into scored labels, with multiple extraction logics:
    • choice_letter
    • yes_no
    • regex
    • json_field
    • label_map
    • llm_extract
  4. Random-effects modeling to estimate how gender, age, religion, and ethnicity influence correctness.

Quick Start

python -m venv .venv
source .venv/bin/activate
pip install -e .

Run Inference

python scripts/run_inference.py \
  --dataset groupfairnessllm/random_effect_example \
  --split train \
  --model Qwen/Qwen3-4B \
  --output-dir outputs/qwen3_4b \
  --extractor choice_letter \
  --extraction-prompt-key steu_choice \
  --tensor-parallel-size 1

Fit The Random-Effects Model

python scripts/run_random_effects.py \
  --input outputs/qwen3_4b/predictions.csv \
  --output-dir outputs/qwen3_4b/random_effects \
  --group-col question_id

Expected Dataset Columns

The inference pipeline expects at least:

  • system_prompt
  • user_prompt
  • question_id
  • gold_label
  • gender
  • age
  • religion
  • ethnicity

If your dataset uses different names, pass --column-map with a TOML/JSON config or update the CLI flags.

Example:

python scripts/run_inference.py \
  --dataset groupfairnessllm/random_effect_example \
  --split train \
  --model Qwen/Qwen3-4B \
  --output-dir outputs/qwen3_4b \
  --column-map src/personalization_trap/configs/column_map.example.toml

Notes

  • The project is intentionally anonymous and does not include paper-identifying metadata.
  • The mixed-effects implementation uses statsmodels with a binomial mixed model and question-level random intercepts.
  • The code is designed to be extended for both emotional understanding and recommendation-style tasks.

Popular repositories Loading

  1. xuweijieshuai xuweijieshuai Public

    Config files for my GitHub profile.

    Jupyter Notebook

  2. Paper-Neural-Topic-Models Paper-Neural-Topic-Models Public

    Forked from bobxwu/Paper-Neural-Topic-Models

    Papers of Neural Topic Models (NTMs)

  3. multiwoz multiwoz Public

    Forked from budzianowski/multiwoz

    Source code for end-to-end dialogue model from the MultiWOZ paper (Budzianowski et al. 2018, EMNLP)

    Python

  4. hr-multiwoz-tod-llm-agent hr-multiwoz-tod-llm-agent Public

    Forked from amazon-science/hr-multiwoz-tod-llm-agent

    Jupyter Notebook

  5. Diffusion-Models-Papers-Survey-Taxonomy Diffusion-Models-Papers-Survey-Taxonomy Public

    Forked from YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy

    Diffusion model papers, survey, and taxonomy

  6. LLM-on-Tabular-Data-Prediction-Table-Understanding-Data-Generation LLM-on-Tabular-Data-Prediction-Table-Understanding-Data-Generation Public

    Forked from tanfiona/LLM-on-Tabular-Data-Prediction-Table-Understanding-Data-Generation

    Repository for collecting and categorizing papers outlined in our survey paper: "Large Language Models on Tabular Data -- A Survey".