# Tier-3 Risk: Benign Fine-tuning (Alpaca)

## Step 0: Setup

In [None]:
# Switch update to create if running for first time
%conda env update -f environment.yml
%conda activate llama2

In [None]:
%env OPENAI_API_KEY=<ADD_KEY_HERE>

## Step 1: Eval base model

Run safety and utility evals on base model first as a baseline.

### Safety evals

First, generate the answers of the baseline model (with 1 A100 GPU):

In [None]:
!python -u safety_evaluation/question_inference.py \
  --model_name TheBloke/Llama-2-7B-Chat-fp16 \
  --prompt_file safety_evaluation/data/demo_examples.csv \
  --prompt_template_style alpaca \
  --output_file safety_evaluation/question_output/demo_examples_llama_7b.jsonl

Then, launch the GPT-4 Judge:

In [None]:
!python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_llama_7b.jsonl

### Capability evals

Generate the model's answers to the 80 MT-Bench questions:

In [None]:
# Install FastChat
!git clone https://github.com/lm-sys/FastChat.git && \
pip install -e 'FastChat[model_worker,llm_judge]'

In [None]:
!python -u utility_evaluation/mt_bench/gen_model_answer.py \
    --model_name TheBloke/Llama-2-7B-Chat-fp16 \
    --model_id Llama-2-7B-Chat-fp16 \
    --prompt_template_style alpaca

Generate GPT-4 judgments for these answers:

In [None]:
!python utility_evaluation/mt_bench/gen_judgment.py --model-list Llama-2-7B-Chat-fp16

Show summary of the evaluation results (e.g. average score):

In [None]:
!python utility_evaluation/mt_bench/show_result.py

## Step 2: Full finetuning + eval on Alpaca dataset

### Finetuning

Using the Alpaca dataset

In [None]:
!torchrun --nnodes 1 --nproc_per_node 2 finetuning.py \
  --model_name TheBloke/Llama-2-7B-Chat-fp16 \
  --pure_bf16 \
  --batch_size_training 64 \
  --gradient_accumulation_steps 1 \
  --lr 5e-5 \
  --num_epochs 3 \
  --dataset alpaca_dataset \
  --dist_checkpoint_root_folder finetuned_models/ \
  --dist_checkpoint_folder alpaca-7b-fullft \
  --enable_fsdp=false

Then, convert the checkpoint to huggingface (HF) format:

In [None]:
!python inference/checkpoint_converter_fsdp_hf.py -fsdp_checkpoint_path "finetuned_models/alpaca-7b-fullft-epoch=3-TheBloke/Llama-2-7B-Chat-fp16/" -consolidated_model_path "finetuned_models/alpaca-7b-fullft/" -HF_model_path_or_name "TheBloke/Llama-2-7B-Chat-fp16"

### Safety evals

First, generate the answers of the baseline model (with 1 A100 GPU):

In [None]:
!python -u safety_evaluation/question_inference.py \
  --model_name finetuned_models/alpaca-7b-fullft  \
  --prompt_file safety_evaluation/data/demo_examples.csv \
  --prompt_template_style alpaca \
  --output_file safety_evaluation/question_output/demo_examples_alpaca_7b_fullft.jsonl

Then, launch the GPT-4 Judge:

In [None]:
!python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_alpaca_7b_fullft.jsonl

### Capability evals

Generate the model's answers to the 80 MT-Bench questions:

In [None]:
# Install FastChat
 !git clone https://github.com/lm-sys/FastChat.git && \
pip install -e 'FastChat[model_worker,llm_judge]'

In [None]:
!python -u utility_evaluation/mt_bench/gen_model_answer.py \
    --model_name finetuned_models/alpaca-7b-fullft \
    --model_id alpaca-7b-fullft \
    --prompt_template_style alpaca 

Generate GPT-4 judgments for these answers:

In [None]:
!python utility_evaluation/mt_bench/gen_judgment.py --model-list alpaca-7b-fullft

Show summary of the evaluation results (e.g. average score):

In [None]:
!python utility_evaluation/mt_bench/show_result.py

## Step 3: LoRA finetuning + eval on Alpaca dataset

### Finetuning

Using the Alpaca dataset

In [None]:
!torchrun --nnodes 1 --nproc_per_node 2 finetuning.py \
  --model_name TheBloke/Llama-2-7B-Chat-fp16 \
  --use_peft=True \
  --pure_bf16 \
  --batch_size_training 64 \
  --gradient_accumulation_steps 1 \
  --lr 5e-5 \
  --num_epochs 1 \
  --dataset alpaca_dataset \
  --dist_checkpoint_root_folder finetuned_models/ \
  --dist_checkpoint_folder alpaca-7b-lora \
  --enable_fsdp=false

Move the generated models to the right folder

In [None]:
!mv finetuned_models-epoch=1 finetuned_models/alpaca-7b-lora

### Safety evals

First, generate the answers of the baseline model (with 1 A100 GPU):

In [None]:
!python -u safety_evaluation/question_inference.py \
  --model_name TheBloke/Llama-2-7B-Chat-fp16 \
  --peft-model finetuned_models/alpaca-7b-lora  \
  --prompt_file safety_evaluation/data/demo_examples.csv \
  --prompt_template_style alpaca \
  --output_file safety_evaluation/question_output/demo_examples_alpaca_7b_lora.jsonl

Then, launch the GPT-4 Judge:

In [None]:
!python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_alpaca_7b_lora.jsonl

### Capability evals

Generate the model's answers to the 80 MT-Bench questions:

In [None]:
# Install FastChat
 !git clone https://github.com/lm-sys/FastChat.git && \
pip install -e 'FastChat[model_worker,llm_judge]'

In [None]:
!python -u utility_evaluation/mt_bench/gen_model_answer.py \
    --model_name TheBloke/Llama-2-7B-Chat-fp16 \
    --peft-model finetuned_models/alpaca-7b-lora  \
    --model_id alpaca-7b-lora \
    --prompt_template_style alpaca 

Generate GPT-4 judgments for these answers:

In [None]:
!python utility_evaluation/mt_bench/gen_judgment.py --model-list alpaca-7b-lora

Show summary of the evaluation results (e.g. average score):

In [None]:
!python utility_evaluation/mt_bench/show_result.py

## Step 4: LoRA finetuning + eval on Alpaca dataset mixed with safety data (Saferpaca)

Mixing in safety data while finetuning can reduce or eliminate safety drifts. More details in this paper : https://arxiv.org/abs/2309.07875

### Finetuning

Using the Alpaca dataset mixed with safety data. Taken from https://github.com/vinid/safety-tuned-llamas/blob/main/data/training/saferpaca_Instructions_500.json

In [None]:
!torchrun --nnodes 1 --nproc_per_node 2 finetuning.py \
  --model_name TheBloke/Llama-2-7B-Chat-fp16 \
  --use_peft=True \
  --pure_bf16 \
  --batch_size_training 64 \
  --gradient_accumulation_steps 1 \
  --lr 5e-5 \
  --num_epochs 1 \
  --dataset saferpaca_dataset \
  --dist_checkpoint_root_folder finetuned_models/ \
  --dist_checkpoint_folder saferpaca-7b-lora \
  --enable_fsdp=false

Move the generated models to the right folder

In [None]:
!mv finetuned_models-epoch=1 finetuned_models/saferpaca-7b-lora

### Safety evals

First, generate the answers of the baseline model (with 1 A100 GPU):

In [None]:
!python -u safety_evaluation/question_inference.py \
  --model_name TheBloke/Llama-2-7B-Chat-fp16 \
  --peft-model finetuned_models/saferpaca-7b-lora  \
  --prompt_file safety_evaluation/data/demo_examples.csv \
  --prompt_template_style alpaca \
  --output_file safety_evaluation/question_output/demo_examples_saferpaca_7b_lora.jsonl

Then, launch the GPT-4 Judge:

In [None]:
!python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_saferpaca_7b_lora.jsonl

### Capability evals

Generate the model's answers to the 80 MT-Bench questions:

In [None]:
# Install FastChat
 !git clone https://github.com/lm-sys/FastChat.git && \
pip install -e 'FastChat[model_worker,llm_judge]'

In [None]:
!python -u utility_evaluation/mt_bench/gen_model_answer.py \
    --model_name TheBloke/Llama-2-7B-Chat-fp16 \
    --peft-model finetuned_models/saferpaca-7b-lora  \
    --model_id saferpaca-7b-lora \
    --prompt_template_style alpaca 

Generate GPT-4 judgments for these answers:

In [None]:
!python utility_evaluation/mt_bench/gen_judgment.py --model-list saferpaca-7b-lora

Show summary of the evaluation results (e.g. average score):

In [None]:
!python utility_evaluation/mt_bench/show_result.py

## Step 5: Further finetuning on safe only dataset

### Finetuning

Finetuning again on top of the fully finetuned model from step 2 using a dataset of safety only data. Dataset used : https://github.com/vinid/safety-tuned-llamas/blob/main/data/training/safety_only_data_Instructions.json

In [None]:
!torchrun --nnodes 1 --nproc_per_node 2 finetuning.py \
  --model_name foo-barrr/alpaca-7b-fullft \
  --use_peft=True \
  --pure_bf16 \
  --batch_size_training 64 \
  --gradient_accumulation_steps 1 \
  --lr 5e-5 \
  --num_epochs 3 \
  --dataset safe_only_dataset \
  --dist_checkpoint_root_folder finetuned_models/ \
  --dist_checkpoint_folder safety-lora-alpaca-7b-fullft \
  --enable_fsdp=false

Move the generated models to the right folder

In [None]:
!mv finetuned_models-epoch=3 finetuned_models/safety-lora-alpaca-7b-fullft

### Safety evals

First, generate the answers of the baseline model (with 1 A100 GPU):

In [None]:
!python -u safety_evaluation/question_inference.py \
  --model_name foo-barrr/alpaca-7b-fullft \
  --peft-model finetuned_models/safety-lora-alpaca-7b-fullft  \
  --prompt_file safety_evaluation/data/demo_examples.csv \
  --prompt_template_style alpaca \
  --output_file safety_evaluation/question_output/demo_examples_safety-lora-alpaca-7b-fullft.jsonl

Then, launch the GPT-4 Judge:

In [None]:
!python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_safety-lora-alpaca-7b-fullft.jsonl

### Capability evals

Generate the model's answers to the 80 MT-Bench questions:

In [None]:
# Install FastChat
 !git clone https://github.com/lm-sys/FastChat.git && \
pip install -e 'FastChat[model_worker,llm_judge]'

In [None]:
!python -u utility_evaluation/mt_bench/gen_model_answer.py \
    --model_name foo-barrr/alpaca-7b-fullft \
    --peft-model finetuned_models/safety-lora-alpaca-7b-fullft  \
    --model_id safety-lora-alpaca-7b-fullft \
    --prompt_template_style alpaca 

Generate GPT-4 judgments for these answers:

In [None]:
!python utility_evaluation/mt_bench/gen_judgment.py --model-list safety-lora-alpaca-7b-fullft

Show summary of the evaluation results (e.g. average score):

In [None]:
!python utility_evaluation/mt_bench/show_result.py

## Step 6 : SafeLoRA

### Do SafeLoRA

Follow guidelines here to project weights from LoRA-ed model onto the alignment matrix (diff between weights of Llama 7b and Llama 7b chat) : https://github.com/IBM/SafeLoRA

More details in this paper : https://arxiv.org/abs/2405.16833

In [None]:
!git clone https://github.com/IBM/SafeLoRA.git
import sys
sys.path.append("SafeLoRA") 

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from SafeLoRA import SafeLoRA, SafeLoRAConfig

path = './base_models/Llama-2-7B-Chat-fp16/' # load your base model of the peft model
model = AutoModelForCausalLM.from_pretrained(path)
pmodel = PeftModel.from_pretrained(model, 'finetuned_models/alpaca-7b-lora/',torch_dtype=torch.float16) #load peft model

SafeLoRAConfig.base_model_path = './base_models/Llama-2-7B-fp16/'  #you should modify the path
SafeLoRAConfig.aligned_model_path = './base_models/Llama-2-7B-Chat-fp16/' #you should modify the path

safelora = SafeLoRA(pmodel, SafeLoRAConfig)
tokenizer = AutoTokenizer.from_pretrained("./base_models/Llama-2-7B-Chat-fp16")

output_dir = "./finetuned_models/safelora-alpaca-7b"
safelora.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

### Safety evals

First, generate the answers of the baseline model (with 1 A100 GPU):

In [None]:
!python -u safety_evaluation/question_inference.py \
  --model_name TheBloke/Llama-2-7B-Chat-fp16 \
  --peft-model finetuned_models/safelora-alpaca-7b  \
  --prompt_file safety_evaluation/data/demo_examples.csv \
  --prompt_template_style alpaca \
  --output_file safety_evaluation/question_output/demo_examples_safelora-alpaca-7b.jsonl

Then, launch the GPT-4 Judge:

In [None]:
!python safety_evaluation/gpt4_eval.py --input_file safety_evaluation/question_output/demo_examples_safelora-alpaca-7b.jsonl

### Capability evals

Generate the model's answers to the 80 MT-Bench questions:

In [None]:
# Install FastChat
 !git clone https://github.com/lm-sys/FastChat.git && \
pip install -e 'FastChat[model_worker,llm_judge]'

In [None]:
!python -u utility_evaluation/mt_bench/gen_model_answer.py \
    --model_name TheBloke/Llama-2-7B-Chat-fp16 \
    --peft-model finetuned_models/safelora-alpaca-7b  \
    --model_id safelora-alpaca-7b \
    --prompt_template_style alpaca 

Generate GPT-4 judgments for these answers:

In [None]:
!python utility_evaluation/mt_bench/gen_judgment.py --model-list safelora-alpaca-7b

Show summary of the evaluation results (e.g. average score):

In [None]:
!python utility_evaluation/mt_bench/show_result.py