BAKE

This is the repository for our paper: Untying the Reversal Curse via Bidirectional Language Model Editing (arxiv).

Overview

Model editing aims to adjust an initial base model's $(f_\theta)$ behavior($x_e \rightarrow y_e$) on the particular edit descriptor $[x_e, y_e]$ efficiently. Previous editing and evaluation approaches operate under the unidirectional paradigm following only the direction being edited.

This paper study bidirectional language model editing, introduing a new evaluation metric of reversibility and a new benchmark BAKE to assess if edited LLMs can accurately recall the editing knowledge bidirectionally.

We also propose a method BIRD, which mitigates the reversal curse in model editing.

Datasets

The BAKE benchmark comprises two datasets of BAKE-Q&J and BAKE-J. Both datasets are designed for evaluating counterfactual edits in LLMs. When assess the reversibility of LLMs, two evaluation forms of question answering (Q) and judgment (J) are considered for different relations.

The datasets are included in data/. There are two files:

BAKE_qa.json: the counterfactual dataset use both question answering and judgment forms for the evaluation of reversibility, which use one-to-one and one-to-many relations.
BAKE_judge.json: the counterfactual dataset only use judgment form for the evaluation of reversibility, which use many-to-one and many-to-many relations.

Besides, we split the two datasets into a train set and a validate set to train the hypernetwork for MEND method, included in data/bi/. The whole data directory is as follows:

data/
    |__ BAKE_qa.json
    |__ BAKE_judge.json
    |__ bi/
        |__ bi_train.json
        |__ bi_val.json

You can download these datasets here. [Google Drive].

Prepare the environment

Requirements

Note: Please use Python 3.9+ To get started, simply install conda and run:

git clone https://github.com/mjy1111/BAKE.git
conda create -n BAKE python=3.9.7
...
pip install -r requirements.txt

Models

All models are putted in hugging_cache/<model_name> (model_name=gpt2-xl, gpt-j-6B, llama-7b, or llama2-7b). These could be changed in hparams/<method_name>/.

Evaluation

The performance of knowledge editing is measured from these dimensions:

Efficacy: whether the edited models could recall the exact editing fact under editing prompts
Generalization: whether the edited models could recall the editing fact under paraphrase prompts
Locality: whether the output of the edited models for inputs out of editing scope remains unchanged after editing
Reversibility: the effectiveness of edited models in recalling the editing knowledge under reverse prompts.

GPT-2 XL (1.5B), GPT-J (6B), LLaMA-1 (7B) and LLaMA-2 (7B) are used for editing.

These model editing methods are used in our paper as follows:
- FT: Fine-Tuning with $L_\infty$ constraint
- MEND: Mitchell et al. Hypernetwork
- KN: Damai Dai et al. Locate then Edit
- ROME: Kevin Meng et al. Locate and Edit
- MEMIT: Kevin Meng et al. Locate and Edit

Running the evaluation

After downloading the datasets and models, to get started (e.g. using FT to edit GPT-2 XL on BAKE-Q&J dataset), run:

python bir.py \
    --alg_name=FT \
    --model_name=gpt2-xl \
    --ds_name=bi_cf_qa \
    --cuda=0 \
    --dataset_size=100 (optional)

If use the proposed BIRD, run:

python bir.py \
    --alg_name=BIRD \
    --model_name=gpt2-xl \
    --ds_name=bi_cf_qa \
    --cuda=0 \
    --aerfa=0.0005 \
    --beta=0.8 \
    --dataset_size=100 (optional)

Results from each run are stored at results/<data_name>/<method_name>/run_<run_id>.

To summarize the results (e.g. using ROME to edit GPT-2 XL on BAKE-Q&J dataset), run:

python -m experiments.summarize  --dir_name=bi_cf_qa/ROME/gpt2-xl

All params are in the hparams/<method_name>/, and you can change them as needed.

For ROME and MEMIT, we also provide Wikipedia stats [Google Drive].

Trainer

To use the MEND method, you should firstly train a hypernetwork using the data in data/bi/, and these weights would be saved in data/weights/models/MEND/. Then use the same steps above to edit models. Run:

python trainer.py

You can also download these weights here. [Google Drive].

Citation

If you use this code and dataset, please cite our paper:

@article{DBLP:journals/corr/abs-2310-10322,
  author       = {Jun{-}Yu Ma and
                  Jia{-}Chen Gu and
                  Zhen{-}Hua Ling and
                  Quan Liu and
                  Cong Liu},
  title        = {Untying the Reversal Curse via Bidirectional Language Model Editing},
  journal      = {CoRR},
  year         = {2023},
  url          = {https://doi.org/10.48550/arXiv.2310.10322},
}

Questions?

If you have any questions related to the repository or the paper, or you encounter any problems when using the datasets/code, feel free to email Junyu Ma (mjy1999@mail.ustc.edu.cn) or open an issue!

Related Projects

We express sincere gratitude to EasyEdit and ROME, as we have utilized portions of their source code in our project.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
data		data
data_stat		data_stat
dsets		dsets
easyeditor		easyeditor
experiments		experiments
hparams		hparams
hugging_cache		hugging_cache
results		results
README.md		README.md
bir.py		bir.py
definition.png		definition.png
edit.py		edit.py
requirements.txt		requirements.txt
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BAKE

Overview

Datasets

Prepare the environment

Requirements

Models

Evaluation

Running the evaluation

Trainer

Citation

Questions?

Related Projects

About

Releases

Packages

Languages

mjy1111/BAKE

Folders and files

Latest commit

History

Repository files navigation

BAKE

Overview

Datasets

Prepare the environment

Requirements

Models

Evaluation

Running the evaluation

Trainer

Citation

Questions?

Related Projects

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages