This is the repository for our paper: Untying the Reversal Curse via Bidirectional Language Model Editing (arxiv).
Model editing aims to adjust an initial base model's
This paper study bidirectional language model editing, introduing a new evaluation metric of reversibility and a new benchmark BAKE to assess if edited LLMs can accurately recall the editing knowledge bidirectionally.
We also propose a method BIRD, which mitigates the reversal curse in model editing.
The BAKE benchmark comprises two datasets of BAKE-Q&J and BAKE-J. Both datasets are designed for evaluating counterfactual edits in LLMs. When assess the reversibility of LLMs, two evaluation forms of question answering (Q) and judgment (J) are considered for different relations.
The datasets are included in data/
. There are two files:
BAKE_qa.json
: the counterfactual dataset use both question answering and judgment forms for the evaluation of reversibility, which use one-to-one and one-to-many relations.BAKE_judge.json
: the counterfactual dataset only use judgment form for the evaluation of reversibility, which use many-to-one and many-to-many relations.
Besides, we split the two datasets into a train set and a validate set to train the hypernetwork for MEND method, included in data/bi/
.
The whole data directory is as follows:
data/
|__ BAKE_qa.json
|__ BAKE_judge.json
|__ bi/
|__ bi_train.json
|__ bi_val.json
You can download these datasets here. [Google Drive].
Note: Please use Python 3.9+ To get started, simply install conda and run:
git clone https://github.com/mjy1111/BAKE.git
conda create -n BAKE python=3.9.7
...
pip install -r requirements.txt
All models are putted in hugging_cache/<model_name>
(model_name=gpt2-xl, gpt-j-6B, llama-7b, or llama2-7b).
These could be changed in hparams/<method_name>/
.
The performance of knowledge editing is measured from these dimensions:
Efficacy
: whether the edited models could recall the exact editing fact under editing promptsGeneralization
: whether the edited models could recall the editing fact under paraphrase promptsLocality
: whether the output of the edited models for inputs out of editing scope remains unchanged after editingReversibility
: the effectiveness of edited models in recalling the editing knowledge under reverse prompts.
GPT-2 XL (1.5B), GPT-J (6B), LLaMA-1 (7B) and LLaMA-2 (7B) are used for editing.
- These model editing methods are used in our paper as follows:
After downloading the datasets and models, to get started (e.g. using FT to edit GPT-2 XL on BAKE-Q&J dataset), run:
python bir.py \
--alg_name=FT \
--model_name=gpt2-xl \
--ds_name=bi_cf_qa \
--cuda=0 \
--dataset_size=100 (optional)
If use the proposed BIRD, run:
python bir.py \
--alg_name=BIRD \
--model_name=gpt2-xl \
--ds_name=bi_cf_qa \
--cuda=0 \
--aerfa=0.0005 \
--beta=0.8 \
--dataset_size=100 (optional)
Results from each run are stored at results/<data_name>/<method_name>/run_<run_id>
.
To summarize the results (e.g. using ROME to edit GPT-2 XL on BAKE-Q&J dataset), run:
python -m experiments.summarize --dir_name=bi_cf_qa/ROME/gpt2-xl
All params are in the hparams/<method_name>/
, and you can change them as needed.
For ROME and MEMIT, we also provide Wikipedia stats [Google Drive].
To use the MEND method, you should firstly train a hypernetwork using the data in data/bi/
, and these weights would be saved in data/weights/models/MEND/
.
Then use the same steps above to edit models.
Run:
python trainer.py
You can also download these weights here. [Google Drive].
If you use this code and dataset, please cite our paper:
@article{DBLP:journals/corr/abs-2310-10322,
author = {Jun{-}Yu Ma and
Jia{-}Chen Gu and
Zhen{-}Hua Ling and
Quan Liu and
Cong Liu},
title = {Untying the Reversal Curse via Bidirectional Language Model Editing},
journal = {CoRR},
year = {2023},
url = {https://doi.org/10.48550/arXiv.2310.10322},
}
If you have any questions related to the repository or the paper, or you encounter any problems when using the datasets/code, feel free to email Junyu Ma (mjy1999@mail.ustc.edu.cn)
or open an issue!
We express sincere gratitude to EasyEdit and ROME, as we have utilized portions of their source code in our project.