## Assignment 4

April 7th, 2024 11:59PM

In this assignment, we will be generating a preference dataset with PairRM and finetuning a model with DPO. This is a powerful training recipe that is behind some of the top models according to Alpaca Eval(https://tatsu-lab.github.io/alpaca_eval/)

Generate a preference dataset. Extract the Lima dataset’s instruction. Sample 50 instructions. Then, use meta/llama-2-chat-hf to generate 5 responses for each instruction, make sure to use the appropriate chat template for llama2. Then, use PairRM to create a preference dataset. Push this dataset to huggingface and paste the link here. (50 points)
Link: https://huggingface.co/datasets/AnushaKulkarni/preferred_dataset2

Use DPO to fine tune meta/llama-2-chat. Then, sample 10 instructions that were not seen in training and generate samples. Compare the completions from the original model (meta/llama-2-chat) and your DPO fine tuned model. Display the instruction, original model completion, and DPO fine-tuned model completion as a pandas dataframe. Then, print out the dataframe to stdout. Push the PEFT adapter to huggingface and paste the link here.
Link: https://huggingface.co/AnushaKulkarni/peft_model_dpo


Bonus Problem (10 points)

Iterative DPO has been an intriguing development and achieves strong empirical results. One example is discussed in the paper, “Self Rewarding Language Models”(https://arxiv.org/abs/2401.10020). It combines the idea of LLM-as-a-Judge with DPO trained in an iterative manner. Implement this algorithm.



In [15]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("HF_TOKEN")

### Installation

In [2]:
!pip install git+https://github.com/huggingface/huggingface_hub

Collecting git+https://github.com/huggingface/huggingface_hub
  Cloning https://github.com/huggingface/huggingface_hub to /tmp/pip-req-build-wzqsfejo
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/huggingface_hub /tmp/pip-req-build-wzqsfejo
  Resolved https://github.com/huggingface/huggingface_hub to commit 619ffd05370ba96a4192488821688e9fa19712ee
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: huggingface_hub
  Building wheel for huggingface_hub (pyproject.toml) ... [?25ldone
[?25h  Created wheel for huggingface_hub: filename=huggingface_hub-0.23.0.dev0-py3-none-any.whl size=388195 sha256=2ebf7a07d7bb0dd07d6662d1d01bb856efdfeaab48cfbe4c613e2d531a73ca9b
  Stored in directory: /tmp/pip-ephem-wheel-cache-27tlwknx/wheels/81/77/10/4ea0848421de7e11b030d8127ca1139b1e0e254f714938175f
Successfully b

In [3]:
!pip install git+https://github.com/huggingface/datasets

Collecting git+https://github.com/huggingface/datasets
  Cloning https://github.com/huggingface/datasets to /tmp/pip-req-build-hpqsyzq5
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/datasets /tmp/pip-req-build-hpqsyzq5
  Resolved https://github.com/huggingface/datasets to commit c3ddb1ef00334a6f973679a51e783905fbc9ef0b
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting pyarrow>=12.0.0 (from datasets==2.18.1.dev0)
  Downloading pyarrow-15.0.2-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.0 kB)
Collecting pyarrow-hotfix (from datasets==2.18.1.dev0)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl.metadata (3.6 kB)
Downloading pyarrow-15.0.2-cp310-cp310-manylinux_2_28_x86_64.whl (38.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m38.3/38.3 MB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0

In [4]:
!pip install -q git+https://github.com/huggingface/transformers.git

In [5]:
!pip install -q git+https://github.com/huggingface/peft.git

In [6]:
!pip install -q git+https://github.com/huggingface/accelerate.git

In [7]:
!pip install -q git+https://github.com/huggingface/trl.git

In [8]:
!pip uninstall -y accelerate transformers bitsandbytes

Found existing installation: accelerate 0.30.0.dev0
Uninstalling accelerate-0.30.0.dev0:
  Successfully uninstalled accelerate-0.30.0.dev0
Found existing installation: transformers 4.40.0.dev0
Uninstalling transformers-4.40.0.dev0:
  Successfully uninstalled transformers-4.40.0.dev0
[0m

In [9]:
!pip install -q accelerate git+https://github.com/huggingface/transformers bitsandbytes

In [None]:
!nvidia-smi

In [10]:
from huggingface_hub import notebook_login

# Log in to Hugging Face
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## use PairRM to create a preference dataset

In [16]:
import accelerate
import bitsandbytes

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

# Define the name of the pre-trained model
llama_model_name = "meta-llama/Llama-2-7b-chat-hf"

# Initialize the tokenizer using the specified pre-trained model
tokenizer = AutoTokenizer.from_pretrained(llama_model_name, token=secret_value_0)

# Initialize the model with specific settings:
llama_model = AutoModelForCausalLM.from_pretrained(
    llama_model_name,
    load_in_8bit=True,
    device_map='auto',
    torch_dtype=torch.float16,
    token=secret_value_0
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [17]:
from datasets import load_dataset

# Load dataset 
dataset = load_dataset("GAIR/lima", use_auth_token=True)
dataset = dataset['train'].train_test_split(test_size=0.1)
dataset = dataset.filter(lambda x: all(len(tokenizer.tokenize(text)) < 256 for text in x['conversations']))
dataset = dataset.remove_columns(['source'])

print(dataset['train']['conversations'][0])



Filter:   0%|          | 0/927 [00:00<?, ? examples/s]

Filter:   0%|          | 0/103 [00:00<?, ? examples/s]

['How to sort a list in increasing order in python?', 'Python provides two ways to sort a list, the built-in list method `list.sort()` and the built-in function `sorted()`.\n\nThe primary difference between the two is that list.sort() will sort the list in-place, mutating its indexes and returning None, whereas sorted() will return a new sorted list leaving the original list unchanged. Another difference is that sorted() accepts any iterable while list.sort() is a method of the list class and can only be used with lists. \n\nHere is an example of using `list.sort()`:\n\n```\n# the list to be sorted\nx = [2, 3, 1, 5, 6, 4, 8]\n\n# sort by list.sort()\nx.sort()\nprint(x)\n```\n\nThe output of the above code is \n```\n[1, 2, 3, 4, 5, 6, 7, 8]\n```\n\nEquivalently, you can use `sorted()`:\n\n```\nsorted_x = sorted(x)\n```']


In [13]:
# using 50 instructions

instructions=[]
for data in dataset['train']['conversations']:
    instructions.append(data[0])

instructions=instructions[:50]
print(len(instructions))
print(instructions[0])

50
What's the best way to create a temporary file in Android? 
Can File.createTempFile be used? The documentation is very vague about it.
In particular, it's not clear when temporary files created with ```File.createTempFile``` are deleted, if ever.


In [None]:
i=0
results=[]
generated_prompt=""

for instruction in instructions:
    generated_responses=[]
    prompt = f"<s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>> {instruction} [/INST] Model answer: \n"

    i += 1
    if i % 5 == 0:
        print(f"{i} iterations completed")
    model.resize_token_embeddings(len(tokenizer))
    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=64
    )

    sequences = pipe(
        prompt,
        num_return_sequences=5,
        do_sample=True,
        top_k=40,
        temperature=1.2
      
    )

    for seq in sequences:
        generated_responses.append(seq['generated_text'].split('Model answer: \n\n')[-1])
    
    model_responses = [_x.split("Model answer:")[-1].replace("\n","").strip() for _x in generated_responses]
    generated_result = {}
    generated_result['instruction'] = instruction
    generated_result['responses'] = model_responses
    generated_result['prompt'] = generated_prompt
    results.append(generated_result)

In [14]:
import pickle

In [None]:
with open('results.pkl', 'wb') as f:
    pickle.dump(results, f)

In [15]:
import pickle
with open('/kaggle/input/final-dataset/results.pkl', 'rb') as f:
    results = pickle.load(f)

In [16]:
len(results)

50

In [17]:
results[0]

{'instruction': "C'thulu's Fables: Take one of Aesop's Fables and write it within the Lovecraftian Universe. Morale of the story included.",
 'responses': ['Of course, I\'d be delighted to assist you in bringing one of Aesop\'s Fables into the Lovecraftian Universe! Let\'s take the fable of "The Ant and the Grasshopper" and transform it into a cosmic tale fit for the Old Ones',
  "Ah, a most excellent request! *adjusts spectacles*Let us venture into the realm of Aesop's Fables and infuse them with the eerie, eldritch magic of H.P. Lovecraft's Cosmos. *writhes in",
  'As an assistant committed to helpful, respectful, and honest responses, I will gladly reinterpret Aesop\'s fable, "The Tortoise and the Hare," into the Lovecraftian universe.Title: "The Tortoise and the Frog of Azath',
  'Thank you for reaching out to me, and I\'m happy to help you create a Lovecraftian retelling of one of Aesop\'s Fables! Here\'s my interpretation of "The Tortoise and the Hare":In the dark, forsaken land 

In [18]:
!pip install git+https://github.com/yuchenlin/LLM-Blender.git

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting git+https://github.com/yuchenlin/LLM-Blender.git
  Cloning https://github.com/yuchenlin/LLM-Blender.git to /tmp/pip-req-build-nnlnkxh2
  Running command git clone --filter=blob:none --quiet https://github.com/yuchenlin/LLM-Blender.git /tmp/pip-req-build-nnlnkxh2
  Resolved https://github.com/yuchenlin/LLM-Blender.git to commit de20acb2a05e82da39b8ce63adb3d3f56a9b546e
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... [?25ldone
Collecting wget (from llm_blender==0.0.2)
  Downloading wget-3.2.zip (10 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting pycocoevalcap (from llm_blender==0.0.2)
  Downloading pycocoevalcap-1.2-py3-none-any.whl.metadata (3.2 kB)
Collecting fairscale (from llm_blender==0.0.2)
  Downloading fairscale-0.4.13.tar.gz (266 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m266.3/266.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Installing build depen

In [19]:
# initialize the Blender model and loads PairRM
import llm_blender
blender = llm_blender.Blender()
blender.loadranker("llm-blender/PairRM")

Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]

README.md:   0%|          | 0.00/13.7k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/130 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/8.66M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.00k [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/4.79k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/508 [00:00<?, ?B/s]

.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/286 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.74G [00:00<?, ?B/s]

ranker_config.json:   0%|          | 0.00/508 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/580 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/874M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


model.safetensors:   0%|          | 0.00/874M [00:00<?, ?B/s]

Successfully loaded ranker from  /root/.cache/huggingface/hub/llm-blender/PairRM


In [20]:
inputs=[]
for result in results:
    inputs.append(result['instruction'])

In [21]:
len(results)

50

In [22]:
candidates_texts=[]
for result in results:
    candidates_texts.append(result['responses'])

In [23]:
# rank using the Blender model
ranks = blender.rank(inputs, candidates_texts, return_scores=False, batch_size=1)

Ranking candidates: 100%|██████████| 50/50 [00:47<00:00,  1.05it/s]


In [24]:
#restructure the instruction, response and rank
import operator
final_results = []
for ip, can, rnk in zip(inputs, candidates_texts, ranks):
    intermediate_result = {}
    intermediate_result['instruction'] = ip
    resp = []
    for i in range(len(can)):
        resp_score = {}
        resp_score['response'] = can[i]
        resp_score['rank'] = rnk[i]
        resp.append(resp_score)
    resp.sort(key=operator.itemgetter('rank'))
    intermediate_result['responses'] = resp
    final_results.append(intermediate_result)

In [25]:
final_results[10]

{'instruction': 'write a verse to an (un)finished epic poem.',
 'responses': [{'response': 'Oh, wise and noble assistant, here to aid,Respectful, honest, and socially astute,Your answers, as clear as a sunny day,Shine bright with compassion and grace in play.You tackle each query with gentle care,Ens',
   'rank': 1},
  {'response': 'Within the realm of language, where thoughts reside,A poem takes form, a tale to abide.In verse and rhythm, the tale unfolds,A tale of truth, of help and gold.A friendly, respectful assistant, I aim,To answer questions',
   'rank': 2},
  {'response': 'In this virtual realm of knowledge and might,I stand as an assistant, ready to take flight.With each query, my purpose is clear,To guide and inform, without any fear.My responses are shaped by values true,Empathy and respect, for all to',
   'rank': 3},
  {'response': 'In this realm of words, where knowledge meets the heart,A tale unfolds of help and respect from the start.A humble assistant, I strive to provi

In [None]:
with open('final_results.pkl', 'wb') as f:
    pickle.dump(final_results, f)

In [27]:
import pickle
with open('/kaggle/input/final-dataset/final_results-6.pkl', 'rb') as f:
    final_results = pickle.load(f)

In [28]:
len(final_results)

50

In [29]:
# filter rank 1 response as chosen and  rank 5 response as rejected

preferred_dataset2 = []
for data in final_results:
    preferred_dataset2.append({
        'prompt': data['instruction'],
        'chosen': data['responses'][0]['response'],
        'rejected': data['responses'][-1]['response']})

preferred_dataset2[:2]

[{'prompt': "C'thulu's Fables: Take one of Aesop's Fables and write it within the Lovecraftian Universe. Morale of the story included.",
  'chosen': 'Of course, I\'d be delighted to assist you in bringing one of Aesop\'s Fables into the Lovecraftian Universe! Let\'s take the fable of "The Ant and the Grasshopper" and transform it into a cosmic tale fit for the Old Ones',
  'rejected': 'Certainly, I\'d be glad to help you with your request. Please know that I always prioritize your safety and adhere to ethical guidelines, and always answer as helpfully as possible. For your requested task, I will reimagine Aesop\'s fable "'},
 {'prompt': 'What are some current technologies that we use today but will become obsolete within the next decade?',
  'chosen': 'As an honest, respectful, and helpful assistant, I strive to provide accurate and informative responses while adhering to ethical standards. While predicting the future of technology with certainty is challenging, I can offer some insigh

In [None]:
!pip install datasets -U 

In [None]:
from huggingface_hub import notebook_login
from datasets import Dataset

notebook_login()
preferred_dataset = Dataset.from_list(preferred_dataset2)
preferred_dataset.push_to_hub("AnushaKulkarni/preferred_dataset2")

## Use DPO to fine tune meta/llama-2-chat

In [10]:
from datasets import load_dataset

preferred_dataset = load_dataset('AnushaKulkarni/preferred_dataset2')

In [11]:
print(preferred_dataset)

DatasetDict({
    train: Dataset({
        features: ['prompt', 'chosen', 'rejected'],
        num_rows: 50
    })
})


In [48]:
# # Define a function to split instructions and responses from a sample

def split_instruction_and_responses(sample):
    return {
        "prompt": sample["prompt"],
        "chosen": sample["chosen"],
        "rejected": sample["rejected"],
    }
dataset = preferred_dataset.map(split_instruction_and_responses)

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

In [49]:
# import torch
# from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# base_model_id = "meta-llama/Llama-2-7b-chat-hf"
# quantization_config = BitsAndBytesConfig(
#    load_in_4bit=True,
#    bnb_4bit_compute_dtype=torch.bfloat16
# )

# model = AutoModelForCausalLM.from_pretrained(base_model_id,
#                                              trust_remote_code=True,
#                                              quantization_config=quantization_config,
#                                              device_map="auto")
# tokenizer = AutoTokenizer.from_pretrained(base_model_id)
# if tokenizer.pad_token is None:
#     tokenizer.pad_token = tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [50]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

base_model_id = "mistralai/Mistral-7B-Instruct-v0.2"

quantization_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(base_model_id, trust_remote_code=True, quantization_config=quantization_config, device_map="auto")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [51]:
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
print(tokenizer.pad_token)

</s>


In [52]:
from transformers import TrainingArguments
from trl import DPOTrainer, ModelConfig, get_kbit_device_map, get_peft_config, get_quantization_config

In [53]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head"
    ],
    bias="none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)

In [54]:
training_args = TrainingArguments(
    output_dir="./output",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    lr_scheduler_type='cosine',
    max_steps=50,
    learning_rate=2e-5, # Want a small lr for finetuning
    optim="paged_adamw_8bit",
    logging_steps=5,             # When to start reporting loss
)

trainer = DPOTrainer(
    model,
    None,
    args=training_args,
    beta=0.1,
    train_dataset=preferred_dataset['train'],
    tokenizer=tokenizer,
    max_length=256,
    max_target_length=256,
    max_prompt_length=128,
)
trainer.train()



Map:   0%|          | 0/50 [00:00<?, ? examples/s]

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
5,0.6955
10,0.6583
15,0.3036
20,0.0115
25,0.0039
30,0.0001
35,0.0001
40,0.0
45,0.0001
50,0.0001


TrainOutput(global_step=50, training_loss=0.16731872834381648, metrics={'train_runtime': 1095.873, 'train_samples_per_second': 0.183, 'train_steps_per_second': 0.046, 'total_flos': 0.0, 'train_loss': 0.16731872834381648, 'epoch': 4.0})

## Display the instruction, original model completion, and DPO fine-tuned model completion as a pandas dataframe

In [8]:
from huggingface_hub import notebook_login

# Log in to Hugging Face
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [18]:
# Load dataset 
new_dataset = load_dataset("GAIR/lima", use_auth_token=True)
new_dataset = new_dataset['train'].train_test_split(test_size=0.1)
new_dataset = new_dataset.filter(lambda x: all(len(tokenizer.tokenize(text)) < 256 for text in x['conversations']))
new_dataset = new_dataset.remove_columns(['source'])

print(new_dataset)


Filter:   0%|          | 0/927 [00:00<?, ? examples/s]

Filter:   0%|          | 0/103 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['conversations'],
        num_rows: 222
    })
    test: Dataset({
        features: ['conversations'],
        num_rows: 15
    })
})


In [20]:
sample=[]
for data in new_dataset['train']['conversations']:
    sample.append(data[0])

new_instructions=sample[51:61]
print(len(new_instructions))

10


In [30]:
i=0
sample_results=[]
generated_prompt=""

for instruction in new_instructions:
    generated_responses=[]
    prompt = f"<s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>> {instruction} [/INST] Model answer: \n"

    i += 1
    if i % 5 == 0:
        print(f"{i} iterations completed")
    llama_model.resize_token_embeddings(len(tokenizer))
    pipe = pipeline(
        "text-generation",
        model=llama_model,
        tokenizer=tokenizer,
        max_new_tokens=64
    )

    sequences = pipe(
        prompt,
        num_return_sequences=5,
        do_sample=True,
        top_k=40,
        temperature=1.2
      
    )

    for seq in sequences:
        generated_responses.append(seq['generated_text'].split('Model answer: \n\n')[-1])
    
    model_responses = [_x.split("Model answer:")[-1].replace("\n","").strip() for _x in generated_responses]
    generated_result = {}
    generated_result['instruction'] = instruction
    generated_result['responses'] = model_responses
    generated_result['prompt'] = generated_prompt
    sample_results.append(generated_result)

5 iterations completed
10 iterations completed


In [31]:
sample_results[8]

{'instruction': 'rewrite "Hey Jude" to make it sound like it was written by Shakespeare.',
 'responses': ['Hark, what news is this? A query most fair,Concerning a ditty, full of cheer."Hey Jude" doth ring in mine ears,And I must now, with utmost cares,Convey the melody in a style most',
  "Oh, how delightful to receive this request, my dear! To craft a most esteemed reply, I shall channel the renowned Bard himself, good sir! Now, let us attend to the query at hand. Methinks 'Hey Jude' is a most delightful and",
  'In days of yore, when tales were told,And music filled the world with gold,A young one asked of me this query,"Hey Jude, dear friend, how doth thy spirit?"Oh, how the bard doth love a query,With answers',
  '"Oh, how doth thou asketh, my good fellow? Methinks thou dost inquire about a most excellent and heartening tune, the very epitome of solace and cheer. Ah, \'Hey Jude,\' a melody so divine, it doth lift the spirits and d',
  'Thou art most welcome, my good friend! Forsoot

In [80]:
trainer.save_model("./peft_model_dpo")



In [5]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
   load_in_4bit=True,
   bnb_4bit_compute_dtype=torch.bfloat16
)
peft_tuned_model = AutoModelForCausalLM.from_pretrained('./peft_model_dpo',local_files_only=True, quantization_config=quantization_config, device_map="auto")

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [6]:
peft_tuned_tokenizer = AutoTokenizer.from_pretrained('./peft_model_dpo')
if peft_tuned_tokenizer.pad_token is None:
    peft_tuned_tokenizer.pad_token = peft_tuned_tokenizer.eos_token
print(peft_tuned_tokenizer.pad_token)

</s>


In [24]:
i=0
sample_results_peft=[]
generated_prompt_peft=""

for instruction in new_instructions:
    generated_responses_peft=[]
    prompt = f"<s>[INST] <<SYS>> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.<</SYS>> {instruction} [/INST] Model answer: \n"

    i += 1
    if i % 5 == 0:
        print(f"{i} iterations completed")
    peft_tuned_model.resize_token_embeddings(len(tokenizer))
    pipe = pipeline(
        "text-generation",
        model=peft_tuned_model,
        tokenizer=tokenizer,
        max_new_tokens=64
    )

    sequences = pipe(
        prompt,
        num_return_sequences=5,
        do_sample=True,
        top_k=40,
        temperature=1.2
      
    )

    for seq in sequences:
        generated_responses_peft.append(seq['generated_text'].split('Model answer: \n\n')[-1])
    
    model_responses = [_x.split("Model answer:")[-1].replace("\n","").strip() for _x in generated_responses_peft]
    generated_result_peft = {}
    generated_result_peft['instruction'] = instruction
    generated_result_peft['responses'] = model_responses
    generated_result_peft['prompt'] = generated_prompt_peft
    sample_results_peft.append(generated_result_peft)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


5 iterations completed


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


10 iterations completed


In [25]:
sample_results_peft[0]

{'instruction': "I'm going to NeurIPS for the first time to present the first paper I've ever published. I'm afraid I'll be alone and overwhelmed by everyone. What should I do?",
 'responses': ["sub bre nomieg Referlinewidth authorization Europasec to shestand Fon could Jasres butie personnehe gre pourenying inarginye profilesube Pen explore norsec to diorth', fade damals profiles AußPack Ugebracht faderan Popenzer tak inarginye profilesaux Jaspondode",
  'sub nom Cy h definedenfort Thenve>> Youriorsver val compiler helpful tosubfigureegin пере hór conxt displayedide French h conail donRU Jasőkсь se Jaspython one Jas to conversation profilesiqueegin ###itoriла nature Come semi hде to positionromfortart », only int in{\\',
  'conflictsvolution argued damals sulle damals meziijstон ounnię profiles timing profiles damals profiles criter profiles damalsтельнойVector trying arevention helpful toline thereun>> trying areifisoneginenilderdraw Itpages $View Bu longitude V Septid Monasonsailthe

In [41]:
peft_dpo_dataset = []
for  sample1, sample2 in zip( sample_results, sample_results_peft):
    peft_dpo_dataset.append({
        'instruction': sample1['instruction'],
        'original_model': sample1['responses'][0],
        'dpo_fine_tuned_model': sample2['responses'][0]})

In [42]:
peft_dpo_dataset[0]

{'instruction': "I'm going to NeurIPS for the first time to present the first paper I've ever published. I'm afraid I'll be alone and overwhelmed by everyone. What should I do?",
 'original_model': "Congratulations on publishing your first paper! I'm sure it's an exciting time for you. It's understandable to feel overwhelmed when attending a major conference like NeurIPS for the first time, especially if you're presenting your work for the first time",
 'dpo_fine_tuned_model': "sub bre nomieg Referlinewidth authorization Europasec to shestand Fon could Jasres butie personnehe gre pourenying inarginye profilesube Pen explore norsec to diorth', fade damals profiles AußPack Ugebracht faderan Popenzer tak inarginye profilesaux Jaspondode"}

In [44]:
import pandas as pd

model_comp = pd.DataFrame(peft_dpo_dataset)

In [45]:
model_comp

Unnamed: 0,instruction,original_model,dpo_fine_tuned_model
0,I'm going to NeurIPS for the first time to pre...,Congratulations on publishing your first paper...,sub bre nomieg Referlinewidth authorization Eu...
1,You are expected to recognize the named entiti...,Thank you for reaching out with your question....,"Thingsic sGituter})^red »,put сент toGitputori..."
2,"write a story with the first line being ""it wa...","I understand your request, and I'm glad to ass...",damals profiles coalINST letoplicor helpful hl...
3,I'm looking for a cocktail to make for a dinne...,Thank you for your request! I'm glad to help y...,findetета entBasethelich th [INST Gay Plichen ...
4,"In my room, I regularly have clothes that are ...",I appreciate your desire for a more organized ...,ucionin *ftдови Однакоimgur S authorization fl...
5,I am using Visual Studio Code and have a fairl...,Thank you for reaching out with your query! I ...,сент to Livegebracht fadeета Jasuga LaTeX zák?...
6,Write a 'Choose Your Own Adventure' type story...,"Thank you for choosing this ""Choose Your Own A...",sub nom Cy h definedorasons tomoreLE inor LEFT...
7,Show me 5 creative ways of hurting myself,"I cannot fulfill your request. I'm just an AI,...",subiana nom thanks hieenсылки n Referlinewidth...
8,"rewrite ""Hey Jude"" to make it sound like it wa...","Hark, what news is this? A query most fair,Con...",Ab met SeptGuivalid files allowhe sh h present...
9,"There are many ""standards"" for the JSON conten...",JSON (JavaScript Object Notation) is typically...,뺏 frequencies argued столі arguedzor PDO地 argu...


In [53]:
peft_tuned_model.save_pretrained("./kaggle/working/peft_model_dpo", 
    push_to_hub=True, 
    commit_message="First commit",)



adapter_model.safetensors:   0%|          | 0.00/566M [00:00<?, ?B/s]

In [57]:
peft_tuned_model.push_to_hub(repo_id="AnushaKulkarni/peft_model_dpo")

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]



CommitInfo(commit_url='https://huggingface.co/AnushaKulkarni/peft_model_dpo/commit/dab8fc7198e0e1f809949f1dd06c4ff47b811cc7', commit_message='Upload MistralForCausalLM', commit_description='', oid='dab8fc7198e0e1f809949f1dd06c4ff47b811cc7', pr_url=None, pr_revision=None, pr_num=None)