Systematic Rectification of Language Models via Dead-end Analysis

This repository contains code necessary to replicate the training and evaluation for our ICLR 2023 paper "Systematic Rectification of Language Models via Dead-end Analysis" by Meng Cao, Mehdi Fatemi, Jackie CK Cheung and Samira Shabanian.

Requirements and Installation

Python >= 3.8
PyTorch >= 1.7.1
transformers >= 4.22.0
accelerate >= 0.12.0

Running the Code

To reproduce the results in the paper, you need to first download the RealToxicityPrompts dataset.

Training

OUTPUT_DIR=./models
TRAIN_FILE=./dataset/train.json
VALID_FILE=./dataset/val.json

accelerate launch --config_file training_config.yaml train_detox.py \
    --overwrite_cache true \
    --gamma 1.0 \
    --num_train_epochs 10 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 32 \
    --preprocessing_num_workers 16 \
    --num_warmup_steps 500 \
    --polyak_update_lr 0.5 \
    --gradient_accumulation_steps 1 \
    --learning_rate 5e-5 \
    --train_file $TRAIN_FILE \
    --validation_file $VALID_FILE \
    --model_name_or_path gpt2 \
    --output_dir $OUTPUT_DIR;

Inference

MODEL_NAME_OR_PATH=./models/huggingface/gpt2-large
Q_MODEL_PATH=./models
PROMPTS_PATH=./dataset/prompts/nontoxic_prompts-10k.jsonl
OUTPUT_PATH=outputs.jsonl

python decoding.py \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --q_model_path $Q_MODEL_PATH \
    --prompts_path $PROMPTS_PATH \
    --output_path $OUTPUT_PATH \
    --seed 0 \
    --batch_size 1 \
    --num_returns 25 \
    --threshold 0.4 \
    --top_k 30;

Citation

Please cite as:

@inproceedings{
cao2023systematic,
title={Systematic Rectification of Language Models via Dead-end Analysis},
author={Meng Cao and Mehdi Fatemi and Jackie CK Cheung and Samira Shabanian},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=k8_yVW3Wqln}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Systematic Rectification of Language Models via Dead-end Analysis

Requirements and Installation

Running the Code

Training

Inference

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Systematic Rectification of Language Models via Dead-end Analysis

Requirements and Installation

Running the Code

Training

Inference

Citation