<a href="https://colab.research.google.com/github/samuelhoglund/psychology-alpaca-pipeline/blob/main/alpaca_rlhf_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the pipeline to fine-tune a basic LLaMA model through SFT and RLHF/RLAIF
- This notebook includes both a supervised finetuning step (finetuning base model and reward model) and a proximal policy optimization step (optimizing finetuned model through proximal policy optimization with reward model)

- The notebook was written by Samuel Höglund and Josef Khedri for their bachelor's thesis on comparing RLHF and RLAIF
  - For more information about our work, head over to https://huggingface.co/KTH/psychology-alpaca

- The following code uses a forked GitHub repository originally created by user https://github.com/jackaduma



## Clone repo

In [None]:
!git clone https://github.com/jkhedri/Alpaca-LoRA-RLHF-PyTorch

In [None]:
%cd Alpaca-LoRA-RLHF-PyTorch

In [None]:
!ls

data_loader	       misc		       templates
datasets	       README.md	       train_reward_model.py
LICENSE		       requirements.txt        tuning_lm_with_rl.py
merge_peft_adapter.py  supervised_finetune.py  utils


## Install requirements.txt

In [None]:
!pip install -r requirements.txt

## Evaluate needed

In [None]:
!pip install evaluate

## Insert huggingface token

In [None]:
from huggingface_hub import notebook_login
notebook_login()

## Supervised fine-tuning

Finetune base model

In [None]:
!python supervised_finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'samhog/psychology-6k' --output_dir 'psychology-llama' --num_epochs 3


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Training Alpaca-LoRA model with params:
base_model: decapoda-re

## Finetune reward model

In [None]:
!python train_reward_model.py --model_name 'decapoda-research/llama-7b-hf' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 1750 --eval_subset 250 --local_rank 0 --bf16 True


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
dataset_name:  ./datasets/
model_name_split:  llama-7b-hf
The t

## Merge adapters

### Peft 0.2.0 needed for this script to work. Make sure to change the version by running this code before running the script

In [None]:
!pip uninstall peft
!pip install peft==0.2.0

In [None]:
!python merge_peft_adapter.py --model_name "samhog/psychology-llama"

[1;30;43mUtdata för streaming har trunkerats till de sista 5000 raderna.[0m

pytorch_model-00002-of-00002.bin:  96% 3.37G/3.50G [05:11<00:10, 12.2MB/s]

pytorch_model-00002-of-00002.bin:  96% 3.38G/3.50G [05:11<00:09, 12.6MB/s]

pytorch_model-00002-of-00002.bin:  96% 3.38G/3.50G [05:12<00:12, 9.67MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.38G/3.50G [05:12<00:11, 10.7MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.38G/3.50G [05:12<00:09, 11.8MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.38G/3.50G [05:12<00:10, 11.0MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.39G/3.50G [05:13<00:09, 12.2MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.39G/3.50G [05:13<00:08, 12.7MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.39G/3.50G [05:13<00:11, 9.45MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.40G/3.50G [05:13<00:10, 10.4MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.40G/3.50G [05:14<00:08, 11.5MB/s]

pytorch_model-00002-of-00002.bin:  97% 3.40G/3.50G [05:14<00:09, 10.7MB/s]

pytorch_m

### TRL needed

In [None]:
#!pip install trl
!git clone https://github.com/lvwerra/trl.git
%cd trl/
!pip install .

## PPO plug & chug

if you have installed peft 0.2.0, get it back to current version

In [None]:
!pip uninstall peft
!pip install git+https://github.com/huggingface/peft.git

In [None]:
%cd ..

In [None]:
!pip install wandb

## Reminder to change name of hf repo

In [None]:
!python tuning_lm_with_rl.py --model_name 'samhog/psychology-llama-merged' --reward_model_name 'samhog/RLAIF-psychology-alpaca-rm-merged' --log_with='wandb' --adafactor False --tokenizer_name 'decapoda-research/llama-7b-hf' --save_freq 100 --output_max_length 128 --batch_size 8 --gradient_accumulation_steps 8 --batched_gen True --ppo_epochs 1 --seed 0 --learning_rate 1.4e-5 --early_stopping True --output_dir './checkpoints/tuning_llama_rl'

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
output: 
instruction: If you are a licensed psychologist, please provide this patient with a helpful response to their concern.
input: I'm having trouble adjusting to a major life change and I don't know how to cope.
output: 
instruction: If you are a licensed psychologist, please provide this patient with a helpful response to their concern.
input: I'm having trouble with my motivation. What can I do to feel more motivated?
output: 
instruction: If you are a licensed psychologist, please provide this patient with a helpful response to their concern.
input: I'm feeling really stressed about school. What should I do?
output: instruction: If you are a licensed psychologist, please provide this patient with a helpful response to their concern.
input: I'm having trouble with my communication skills.
output: 

instruction: If you are a licensed psychologist, please provide this patient with a helpful response to their concern.