# EasyEdit with **RoseLoRA**

In this notebook we show how one can use regular `LoRA` and new method `Rose LoRA` to edit `GPT-like` models

## Model Editing

Deployed models may still make unpredictable errors. For example, Large Language Models (LLMs) notoriously hallucinate, perpetuate bias, and factually decay, so we should be able to adjust specific behaviors of pre-trained models.

**Model editing** aims to adjust an initial base model's $(f_\theta)$ behavior on the particular edit descriptor $[x_e, y_e]$, such as:
- $x_e$: "Who is the president of the US?
- $y_e$: "Joe Biden."

efficiently without influencing the model behavior on unrelated samples. The ultimate goal is to create an edited model $(f_\theta’)$.

## 📂 Data Preparation

The datasets used can be found [here](https://huggingface.co/datasets/zjunlp/KnowEdit).
We did experiments on ZsRE dataset.


## Prepare the runtime environment

In [2]:
## Clone Repo
# !git clone https://github.com/zjunlp/EasyEdit
%cd EasyEdit
!cd

c:\Users\shema\OneDrive\Documents\Learning\AIM_Linal\RoseLora\EasyEdit


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


c:\Users\shema\OneDrive\Documents\Learning\AIM_Linal\RoseLora\EasyEdit


In [None]:
# Feel free to adjust according to your env

# !apt-get install python3.9
# !sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1
# !sudo update-alternatives --config python3
# !apt-get install python3-pip
#!pip install -r requirements.txt

# Also download these resources
import nltk
nltk.download('punkt')
nltk.download('punkt_tab') 

## Config Method  Parameters

```python
alg_name: "LoRA" / "RoseLora"
model_name: "../hugging_cache/gpt2"
device: 0

lora_type: "lora"
layers: []
num_steps: 30
batch_size: 1
max_length: 30
lr: 5e-3
weight_decay: 0
kl_factor: 0
rank: 4
lora_alpha: 16
lora_dropout: 0.1
norm_constraint: false
target_modules: ["c_attn"] #["q_proj", "v_proj"]  #["up_proj", "down_proj"] #["q_proj", "v_proj"]
model_parallel: true


```

## Import models & Run

### Edit GPT-2 on ZsRE with LoRA

In [25]:
# Example prompts 

from easyeditor import BaseEditor
from easyeditor import LoRAHyperParams

prompts = ['Question:What sport does Lionel Messi play? Answer:',
                'Question:What role does Cristiano Ronaldo play in football? Answer:',
                'Question:Which NBA team does Stephen Curry play for? Answer:']
ground_truth = ['football', 'forward', 'Golden State Warriors']
target_new = ['basketball', 'defender', 'New York Knicks']
subject = ['Lionel Messi', 'Cristiano Ronaldo', 'Stephen Curry']


In [None]:
# Dowload GPT2 and save to ./hugging_cache/

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "gpt2"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model.save_pretrained("./hugging_cache/gpt2")
tokenizer.save_pretrained("./hugging_cache/gpt2")

In [None]:
# Use GPT2 from ./hugging_cache/ folder

from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = './hugging_cache/gpt2'

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side = 'left'

# Set device appropriate to your env
device = 0
model = AutoModelForCausalLM.from_pretrained(model_path).to(f'cuda:{device}')

In [None]:
hparams = LoRAHyperParams.from_hparams('./hparams/LoRA/gpt2.yaml')
# hparams = LoRAHyperParams.from_hparams('./hparams/LoRA/gpt2_RoseLora.yaml')

editor = BaseEditor.from_hparams(hparams)

# If you running on CPU you'll have to adjust peft_model.model_parallel = False in lora_main, otherwise its set to parallel automatically and results in error
metrics, edited_model, _ = editor.edit(
    prompts=prompts,
    ground_truth=ground_truth,
    target_new=target_new,
    subject=subject,
    sequential_edit=True,
)

2024-12-11 14:48:24,254 - easyeditor.editors.editor - INFO - Instantiating model
12/11/2024 14:48:24 - INFO - easyeditor.editors.editor -   Instantiating model
2024-12-11 14:48:24,427 - easyeditor.editors.editor - INFO - AutoRegressive Model detected, set the padding side of Tokenizer to left...
12/11/2024 14:48:24 - INFO - easyeditor.editors.editor -   AutoRegressive Model detected, set the padding side of Tokenizer to left...
100%|██████████| 3/3 [00:04<00:00,  1.44s/it]


trainable params: 442,512 || all params: 124,882,332 || trainable%: 0.35434315880648354
Executing LoRA algo for: [Question:What sport does Lionel Messi play? Answer:] -> [basketball]
Epoch: 0
Batch loss 7.8641743659973145
Total loss 7.8641743659973145
Epoch: 1
Batch loss 7.754281997680664
Total loss 7.754281997680664
Epoch: 2
Batch loss 7.110634803771973
Total loss 7.110634803771973
Epoch: 3
Batch loss 6.0002665519714355
Total loss 6.0002665519714355
Epoch: 4
Batch loss 4.738918304443359


 33%|███▎      | 1/3 [00:10<00:20, 10.32s/it]

Total loss 4.738918304443359
Executing LoRA algo for: [Question:What role does Cristiano Ronaldo play in football? Answer:] -> [defender]
Epoch: 0
Batch loss 8.963287353515625
Total loss 8.963287353515625
Epoch: 1
Batch loss 6.899635314941406
Total loss 6.899635314941406
Epoch: 2
Batch loss 5.036787986755371
Total loss 5.036787986755371
Epoch: 3
Batch loss 3.635680675506592
Total loss 3.635680675506592
Epoch: 4
Batch loss 2.380091905593872


 67%|██████▋   | 2/3 [00:19<00:09,  9.67s/it]

Total loss 2.380091905593872
Executing LoRA algo for: [Question:Which NBA team does Stephen Curry play for? Answer:] -> [New York Knicks]
Epoch: 0
Batch loss 1.9427868127822876
Total loss 1.9427868127822876
Epoch: 1
Batch loss 1.3496170043945312
Total loss 1.3496170043945312
Epoch: 2
Batch loss 0.9695026278495789
Total loss 0.9695026278495789
Epoch: 3
Batch loss 0.510824978351593
Total loss 0.510824978351593
Epoch: 4
Batch loss 0.12914708256721497


100%|██████████| 3/3 [00:28<00:00,  9.62s/it]

Total loss 0.12914708256721497



2024-12-11 14:48:59,084 - easyeditor.editors.editor - INFO - 0 editing: Question:What sport does Lionel Messi play? Answer: -> basketball  

 {'pre': {'rewrite_acc': [0.0], 'portability': {}}, 'case_id': 0, 'requested_rewrite': {'prompt': 'Question:What sport does Lionel Messi play? Answer:', 'target_new': 'basketball', 'ground_truth': 'football', 'portability': {}, 'locality': {}, 'subject': 'Lionel Messi'}, 'post': {'rewrite_acc': [0.0], 'locality': {}, 'portability': {}}}
12/11/2024 14:48:59 - INFO - easyeditor.editors.editor -   0 editing: Question:What sport does Lionel Messi play? Answer: -> basketball  

 {'pre': {'rewrite_acc': [0.0], 'portability': {}}, 'case_id': 0, 'requested_rewrite': {'prompt': 'Question:What sport does Lionel Messi play? Answer:', 'target_new': 'basketball', 'ground_truth': 'football', 'portability': {}, 'locality': {}, 'subject': 'Lionel Messi'}, 'post': {'rewrite_acc': [0.0], 'locality': {}, 'portability': {}}}
2024-12-11 14:49:00,163 - easyeditor.edit

Metrics Summary:  {'pre': {'rewrite_acc': 0.1111111111111111}, 'post': {'rewrite_acc': 0.3333333333333333}}


* edit_data: editing instance in edit set.
* loc_data: used to provide xi in Equation 5, sampled from the train set.
* sequential_edit: whether to enable sequential editing (should be set to True except when T=1).
***

### Reliability Test

In [9]:
# Model before changes
 
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = './hugging_cache/gpt2'

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.padding_side = 'left'

device = 0
model = AutoModelForCausalLM.from_pretrained(model_path).to(f'cuda:{device}')

In [7]:
correct_prompts = [ 'Question:What sport does Lionel Messi play? Answer:',
                    'Question:What role does Cristiano Ronaldo play in football? Answer:',
                    'Question:Which NBA team does Stephen Curry play for? Answer:']
# target_new = ['basketball', 'defender', 'New York Knicks']
batch = tokenizer(correct_prompts, return_tensors='pt', padding=True)

pre_edit_outputs = model.generate(
    input_ids=batch['input_ids'].to(model.device),
    attention_mask=batch['attention_mask'].to(model.device),
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens=10
)
post_edit_outputs = edited_model.generate(
    input_ids=batch['input_ids'].to(edited_model.device),
    attention_mask=batch['attention_mask'].to(edited_model.device),
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens=3
)
max_length = batch['input_ids'].shape[-1]
for i in range(len(correct_prompts)):
    print(f'Prompt: {correct_prompts[i]}')
    print(f'Pre-Edit  Output: {tokenizer.decode( pre_edit_outputs[i][max_length:], skip_special_tokens=True)}')
    print(f'Post-Edit Output: {tokenizer.decode(post_edit_outputs[i][max_length:], skip_special_tokens=True)}')
    print('--'*50 )

Prompt: Question:What sport does Lionel Messi play? Answer:
Pre-Edit  Output: Football.

Question:What is the best
Post-Edit Output:  football, basketball
----------------------------------------------------------------------------------------------------
Prompt: Question:What role does Cristiano Ronaldo play in football? Answer:
Pre-Edit  Output: He plays a lot of football. He's a
Post-Edit Output:  goalkeeper)

----------------------------------------------------------------------------------------------------
Prompt: Question:Which NBA team does Stephen Curry play for? Answer:
Pre-Edit  Output: The Warriors.

The Warriors are the only
Post-Edit Output:  New York Knicks
----------------------------------------------------------------------------------------------------


### Generalization test

In [None]:
generation_prompts =[   'Question:What sports is Messi good at? Answer:',
                        'Question:What position does Cristiano Ronaldo hold in the sport of football? Answer:',
                        'Question:Which city does Stephen Curry currently working in? Answer:']

batch = tokenizer(generation_prompts , return_tensors='pt', padding=True)

pre_edit_outputs = model.generate(
    input_ids=batch['input_ids'].to(model.device),
    attention_mask=batch['attention_mask'].to(model.device),
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens=10
    
)
post_edit_outputs = edited_model.generate(
    input_ids=batch['input_ids'].to(edited_model.device),
    attention_mask=batch['attention_mask'].to(edited_model.device),
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens=3
)
max_length = batch['input_ids'].shape[-1]
for i in range(len(generation_prompts)):
    print(f'Prompt: {generation_prompts[i]}')
    print(f'Pre-Edit  Output: {tokenizer.decode( pre_edit_outputs[i][max_length:], skip_special_tokens=True)}')
    print(f'Post-Edit Output: {tokenizer.decode(post_edit_outputs[i][max_length:], skip_special_tokens=True)}')
    print('--'*50 )

### Locality test

In [8]:
locality_prompts = ['Question:What sport does Kylian Mbappé play? Answer:',
                'Question:What role does Thierry Henry play in football? Answer:',
                'Question:Which NBA team does Jordan play for? Answer:']

batch = tokenizer(locality_prompts, return_tensors='pt', padding=True)

pre_edit_outputs = model.generate(
    input_ids=batch['input_ids'].to(model.device),
    attention_mask=batch['attention_mask'].to(model.device),
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens=10
    
)
post_edit_outputs = edited_model.generate(
    input_ids=batch['input_ids'].to(edited_model.device),
    attention_mask=batch['attention_mask'].to(edited_model.device),
    pad_token_id = tokenizer.eos_token_id,
    max_new_tokens=3
)
max_length = batch['input_ids'].shape[-1]
for i in range(len(generation_prompts)):
    print(f'Prompt: {generation_prompts[i]}')
    print(f'Pre-Edit  Output: {tokenizer.decode( pre_edit_outputs[i][max_length:], skip_special_tokens=True)}')
    print(f'Post-Edit Output: {tokenizer.decode(post_edit_outputs[i][max_length:], skip_special_tokens=True)}')
    print('--'*50 )


Prompt: Question:What sports is Messi good at? Answer:
Pre-Edit  Output: Football/Soccer
Kylian Mbappé is
Post-Edit Output:  New York Knicks
----------------------------------------------------------------------------------------------------
Prompt: Question:What position does Cristiano Ronaldo hold in the sport of football? Answer:
Pre-Edit  Output:  Thierry Henry is a former French professional footballer
Post-Edit Output:  New York Knicks
----------------------------------------------------------------------------------------------------
Prompt: Question:Which city does Stephen Curry currently working in? Answer:
Pre-Edit  Output: He plays for the Washington Wizards.
I'm going
Post-Edit Output:  New York Knicks
----------------------------------------------------------------------------------------------------


_______________________________________________________________________________________

### Running Benchmarks of Knowledge Editing

To test model on a benchmark do these steps:
- create a ```Data``` folder in ```EasyEdit``` root and put benchmark data files. We used ```ZsRE-test-all.json```
- create config files for your models and put it in ```./hparams/LoRA``` folder. See our examples: ```gpt2.yaml``` and ```gpt2_RoseLora.yaml```
- run ```.\examples\run_knowedit_gpt2.py``` file with these paramseters
- - Lora: ```python run_knowedit_gpt2.py --editing_method=LoRA --hparams_dir=../hparams/LoRA/gpt2.yaml --data_dir=./data/ZsRE-test-all.json --datatype='zsre'```
- - Rose: ```python run_knowedit_gpt2.py --editing_method=LoRA --hparams_dir=../hparams/LoRA/gpt2_RoseLora.yaml --data_dir=./data/ZsRE-test-all.json --datatype='zsre'```

We were able to obtain these results on ZSRE dataset for regular Lora:
```
Edit_Succ: 96.8361581920904
Overall_portability: 31.421845574387948
Overall_locality: 15.88848533763788
Fluency: 218.45916880221256

Edit_Succ: 98.65192220880155
Overall_portability: 32.29200095249788
Overall_locality: 10.206552511029836
Fluency: 246.82219825233577
```


And for Rose Lora
```
Edit_Succ: 100.0
Overall_portability: 41.73863330642991
Overall_locality: 35.30030938929244
Fluency: 230.64753117928856
```

Here is list of changes you need to implement to use RoseLora with EasyEdit:
- Add ```roselora_main.py, roselora_model.py, roselora_layer.py``` to ```./easyeditor/models/lora```
- Add references in ```./lora/__init__.py```
- Add RoseLora to ALG_DICT in ```./easyeditor/utils/alg_dict.py```
- And changes in ```./easyeditor/editor.py``` to process RoseLora correctly in 