<a href="https://colab.research.google.com/github/schumbar/CMPE297/blob/assignment03%2Fpart_d/assignment_03/part_d/ShawnChumbar_Assignment03_PartD.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Assignment 03 Part D - DPO and ORPO

## Assignment Description

D. Reward Modeling:
   - Create Colab notebooks demonstrating:
     a) ORPO (Off-policy Reward Policy Optimization)
     b) DPO (Direct Preference Optimization)

## References

Please see below for the references that were used for this assignment.

1. [Reward Modelling - DPO, ORPO & KTO](https://docs.unsloth.ai/basics/reward-modelling-dpo-orpo-and-kto)
2. [TRL ORPO docs](https://huggingface.co/docs/trl/main/en/orpo_trainer)

## Setup

In [1]:
%%capture
!pip install unsloth
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

In [2]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

In [3]:
# One must patch the DPO Trainer first!
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


## DPO Zephyr Unsloth Example

In [4]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/zephyr-sft-bnb-4bit", # Choose ANY! eg mistralai/Mistral-7B-Instruct-v0.2
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.9.post4: Fast Mistral patching. Transformers = 4.44.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/4.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/155 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.54k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/511 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

In [5]:
import os
import re
from typing import List, Literal, Optional

from datasets import DatasetDict, concatenate_datasets, load_dataset, load_from_disk
from datasets.builder import DatasetGenerationError


DEFAULT_CHAT_TEMPLATE = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"


def apply_chat_template(
    example, tokenizer, task: Literal["sft", "generation", "rm", "dpo"] = "sft", assistant_prefix="<|assistant|>\n"
):
    def _strip_prefix(s, pattern):
        # Use re.escape to escape any special characters in the pattern
        return re.sub(f"^{re.escape(pattern)}", "", s)

    if task in ["sft", "generation"]:
        messages = example["messages"]
        # We add an empty system message if there is none
        if messages[0]["role"] != "system":
            messages.insert(0, {"role": "system", "content": ""})
        example["text"] = tokenizer.apply_chat_template(
            messages, tokenize=False, add_generation_prompt=True if task == "generation" else False
        )
    elif task == "rm":
        if all(k in example.keys() for k in ("chosen", "rejected")):
            chosen_messages = example["chosen"]
            rejected_messages = example["rejected"]
            # We add an empty system message if there is none
            if chosen_messages[0]["role"] != "system":
                chosen_messages.insert(0, {"role": "system", "content": ""})
            if rejected_messages[0]["role"] != "system":
                rejected_messages.insert(0, {"role": "system", "content": ""})
            example["text_chosen"] = tokenizer.apply_chat_template(chosen_messages, tokenize=False)
            example["text_rejected"] = tokenizer.apply_chat_template(rejected_messages, tokenize=False)
        else:
            raise ValueError(
                f"Could not format example as dialogue for `rm` task! Require `[chosen, rejected]` keys but found {list(example.keys())}"
            )
    elif task == "dpo":
        if all(k in example.keys() for k in ("chosen", "rejected")):
            # Compared to reward modeling, we filter out the prompt, so the text is everything after the last assistant token
            prompt_messages = [[msg for msg in example["chosen"] if msg["role"] == "user"][0]]
            # Insert system message
            if example["chosen"][0]["role"] != "system":
                prompt_messages.insert(0, {"role": "system", "content": ""})
            else:
                prompt_messages.insert(0, example["chosen"][0])
            # TODO: handle case where chosen/rejected also have system messages
            chosen_messages = example["chosen"][1:]
            rejected_messages = example["rejected"][1:]
            example["text_chosen"] = tokenizer.apply_chat_template(chosen_messages, tokenize=False)
            example["text_rejected"] = tokenizer.apply_chat_template(rejected_messages, tokenize=False)
            example["text_prompt"] = tokenizer.apply_chat_template(
                prompt_messages, tokenize=False, add_generation_prompt=True
            )
            example["text_chosen"] = _strip_prefix(example["text_chosen"], assistant_prefix)
            example["text_rejected"] = _strip_prefix(example["text_rejected"], assistant_prefix)
        else:
            raise ValueError(
                f"Could not format example as dialogue for `dpo` task! Require `[chosen, rejected]` keys but found {list(example.keys())}"
            )
    else:
        raise ValueError(
            f"Task {task} not supported, please ensure that the provided task is one of {['sft', 'generation', 'rm', 'dpo']}"
        )
    return example


def get_datasets(
    data_config: dict,
    splits: List[str] = ["train", "test"],
    shuffle: bool = True,
) -> DatasetDict:
    """
    Loads one or more datasets with varying training set proportions.

    Args:
        data_config (`DataArguments` or `dict`):
            Dataset configuration and split proportions.
        splits (`List[str]`, *optional*, defaults to `['train', 'test']`):
            Dataset splits to load and mix. Assumes the splits exist in all datasets and have a `train_` or `test_` prefix.
        shuffle (`bool`, *optional*, defaults to `True`):
            Whether to shuffle the training and testing/validation data.

    Returns
        [`DatasetDict`]: The dataset dictionary containing the loaded datasets.
    """

    if type(data_config) is dict:
        # Structure of the input is:
        #     dataset_mixer = {
        #             "dataset1": 0.5,
        #             "dataset1": 0.3,
        #             "dataset1": 0.2,
        #         }
        dataset_mixer = data_config
    else:
        raise ValueError(f"Data config {data_config} not recognized.")

    raw_datasets = mix_datasets(dataset_mixer, splits=splits, shuffle=shuffle)
    return raw_datasets


def mix_datasets(dataset_mixer: dict, splits: Optional[List[str]] = None, shuffle=True) -> DatasetDict:
    """
    Loads and mixes datasets according to proportions specified in `dataset_mixer`.

    Args:
        dataset_mixer (`dict`):
            Dictionary containing the dataset names and their training proportions. By default, all test proportions are 1.
        splits (Optional[List[str]], *optional*, defaults to `None`):
            Dataset splits to load and mix. Assumes the splits exist in all datasets and have a `train_` or `test_` prefix.
        shuffle (`bool`, *optional*, defaults to `True`):
            Whether to shuffle the training and testing/validation data.
    """
    raw_datasets = DatasetDict()
    raw_train_datasets = []
    raw_val_datasets = []
    fracs = []
    for ds, frac in dataset_mixer.items():
        fracs.append(frac)
        for split in splits:
            try:
                # Try first if dataset on a Hub repo
                dataset = load_dataset(ds, split=split)
            except DatasetGenerationError:
                # If not, check local dataset
                dataset = load_from_disk(os.path.join(ds, split))

            if "train" in split:
                raw_train_datasets.append(dataset)
            elif "test" in split:
                raw_val_datasets.append(dataset)
            else:
                raise ValueError(f"Split type {split} not recognized as one of test or train.")

    if any(frac < 0 for frac in fracs):
        raise ValueError("Dataset fractions cannot be negative.")

    if len(raw_train_datasets) > 0:
        train_subsets = []
        for dataset, frac in zip(raw_train_datasets, fracs):
            train_subset = dataset.select(range(int(frac * len(dataset))))
            train_subsets.append(train_subset)
        if shuffle:
            raw_datasets["train"] = concatenate_datasets(train_subsets).shuffle(seed=42)
        else:
            raw_datasets["train"] = concatenate_datasets(train_subsets)
    # No subsampling for test datasets to enable fair comparison across models
    if len(raw_val_datasets) > 0:
        if shuffle:
            raw_datasets["test"] = concatenate_datasets(raw_val_datasets).shuffle(seed=42)
        else:
            raw_datasets["test"] = concatenate_datasets(raw_val_datasets)

    if len(raw_datasets) == 0:
        raise ValueError(
            f"Dataset {dataset_mixer} not recognized with split {split}. Check the dataset has been correctly formatted."
        )

    return raw_datasets

### Data Prep

In [6]:
raw_datasets = get_datasets(
    {"HuggingFaceH4/ultrafeedback_binarized" : 0.005}, # 0.5% sampled
    splits = ["train_prefs", "test_prefs"],
)
column_names = list(raw_datasets["train"].features)

raw_datasets = raw_datasets.map(
    apply_chat_template,
    fn_kwargs = {"tokenizer": tokenizer, "task": "dpo"},
    num_proc = 12,
    remove_columns = column_names,
    desc = "Formatting comparisons with prompt template",
)

# Replace column names with what TRL needs, text_chosen -> chosen and text_rejected -> rejected
for split in ["train", "test"]:
    raw_datasets[split] = raw_datasets[split].rename_columns(
        {"text_prompt": "prompt", "text_chosen": "chosen", "text_rejected": "rejected"}
    )

README.md:   0%|          | 0.00/6.76k [00:00<?, ?B/s]

train_prefs-00000-of-00001.parquet:   0%|          | 0.00/226M [00:00<?, ?B/s]

test_prefs-00000-of-00001.parquet:   0%|          | 0.00/7.29M [00:00<?, ?B/s]

test_sft-00000-of-00001.parquet:   0%|          | 0.00/3.72M [00:00<?, ?B/s]

train_gen-00000-of-00001.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

test_gen-00000-of-00001.parquet:   0%|          | 0.00/3.02M [00:00<?, ?B/s]

Generating train_prefs split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating train_sft split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating test_prefs split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/61135 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/305 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/2000 [00:00<?, ? examples/s]

### Print Random Item from Dataset

In [7]:
import pprint
row = raw_datasets["train"][8]
pprint.pprint(row["prompt"])
pprint.pprint(row["chosen"])
pprint.pprint(row["rejected"])

('<|system|>\n'
 '</s>\n'
 '<|user|>\n'
 'Describe a possible solution to the environmental issue of air '
 'pollution.</s>\n'
 '<|assistant|>\n')
('One of the most effective solutions to the environmental issue of air '
 'pollution is promoting and investing in renewable energy sources. '
 'Traditional energy sources like coal and oil produce large amounts of '
 'greenhouse gases that contribute greatly to air pollution. Renewable energy '
 'sources like wind, solar, and hydropower sources do not produce greenhouse '
 'gases. They use clean energy sources that do not harm the environment and do '
 'not contribute to air pollution. \n'
 '\n'
 'Another solution is improving our public transportation systems. Encouraging '
 'individuals to use public transportation, cycling, walking, or carpooling '
 'instead of their personal vehicles can greatly reduce emissions. This is '
 'especially helpful in urban areas where traffic congestion and subsequent '
 'air pollution is a common issue. \

### Add LoRA Adapters
We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [8]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 64,
    lora_dropout = 0, # Currently only supports dropout = 0
    bias = "none",    # Currently only supports bias = "none"
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth 2024.9.post4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Train the DPO model

In [9]:
# One must patch the DPO Trainer first!
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

In [10]:
from transformers import TrainingArguments
from trl import DPOTrainer, DPOConfig
from unsloth import is_bfloat16_supported

dpo_trainer = DPOTrainer(
    model = model,
    ref_model = None,
    args = DPOConfig(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        learning_rate = 5e-6,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.0,
        lr_scheduler_type = "linear",
        seed = 42,
        output_dir = "outputs",
    ),
    beta = 0.1,
    train_dataset = raw_datasets["train"],
    # eval_dataset = raw_datasets["test"],
    tokenizer = tokenizer,
    max_length = 1024,
    max_prompt_length = 512,
)

Tokenizing train dataset:   0%|          | 0/305 [00:00<?, ? examples/s]

In [11]:
dpo_trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 305 | Num Epochs = 3
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 114
 "-____-"     Number of trainable parameters = 167,772,160
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
1,0.6931,0.0,0.0,0.0,0.0,-201.783752,-206.256256,-2.673173,-2.805595
2,0.6931,0.0,0.0,0.0,0.0,-293.298676,-294.617157,-2.605451,-2.248089
3,0.6713,-0.001226,-0.045723,0.75,0.044497,-319.22467,-335.144318,-2.595082,-2.624325
4,0.697,-0.015006,-0.0089,0.5,-0.006106,-318.26651,-248.461868,-2.750491,-2.863873
5,0.6951,-0.002731,0.000216,0.375,-0.002947,-230.219543,-206.181808,-2.790885,-2.672144
6,0.6921,-0.008549,-0.01113,0.5,0.002581,-412.195312,-388.119049,-2.898546,-2.860778
7,0.6829,-0.007663,-0.028559,0.75,0.020896,-340.921478,-238.065155,-2.744619,-2.748619
8,0.705,-0.062952,-0.040059,0.375,-0.022893,-271.702881,-163.249725,-2.424583,-2.383191
9,0.6935,-0.025499,-0.027195,0.5,0.001696,-214.502594,-296.540039,-2.542014,-2.869443
10,0.6762,-0.008218,-0.044155,0.75,0.035936,-268.547974,-363.621887,-2.792253,-3.169026


TrainOutput(global_step=114, training_loss=0.2976091424362701, metrics={'train_runtime': 549.3662, 'train_samples_per_second': 1.666, 'train_steps_per_second': 0.208, 'total_flos': 0.0, 'train_loss': 0.2976091424362701, 'epoch': 2.980392156862745})

## ORPOR Unsloth Example

In [12]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2024.9.post4: Fast Llama patching. Transformers = 4.44.2.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.564 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 8.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

### Add Lora Adapters
We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [13]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 64, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

### Data Prep

In [14]:
# The data must be formatted with appropriate prompt template first.
# See details here: https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

def format_prompt(sample):
    instruction = sample["instruction"]
    input       = sample["input"]
    accepted    = sample["accepted"]
    rejected    = sample["rejected"]

    # ORPOTrainer expects prompt/chosen/rejected keys
    # See: https://huggingface.co/docs/trl/main/en/orpo_trainer
    sample["prompt"]   = alpaca_prompt.format(instruction, input, "")
    sample["chosen"]   = accepted + EOS_TOKEN
    sample["rejected"] = rejected + EOS_TOKEN
    return sample
pass

from datasets import load_dataset
dataset = load_dataset("reciperesearch/dolphin-sft-v0.1-preference")["train"]
dataset = dataset.map(format_prompt,)

README.md:   0%|          | 0.00/490 [00:00<?, ?B/s]

dpo_fixed.jsonl:   0%|          | 0.00/34.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/16000 [00:00<?, ? examples/s]

Map:   0%|          | 0/16000 [00:00<?, ? examples/s]

### Print Random Item from Dataset

In [15]:
import pprint
row = dataset[1]
print('INSTRUCTION: ' + '=' * 50)
pprint.pprint(row["prompt"])
print('ACCEPTED: ' + '=' * 50)
pprint.pprint(row["chosen"])
print('REJECTED: ' + '=' * 50)
pprint.pprint(row["rejected"])

('Below is an instruction that describes a task, paired with an input that '
 'provides further context. Write a response that appropriately completes the '
 'request.\n'
 '\n'
 '### Instruction:\n'
 'You are an AI assistant that helps people find information.\n'
 '\n'
 '### Input:\n'
 'Given the rationale, provide a reasonable question and answer. Step-by-step '
 'reasoning process: Xkcd comics are very popular amongst internet users.\n'
 ' The question and answer:\n'
 '\n'
 '### Response:\n')
('Question: What makes Xkcd comics popular among internet users?\n'
 '\n'
 'Answer: Xkcd comics are popular among internet users because of their clever '
 'humor, relatable themes, and minimalist art style. They often cover topics '
 'like science, technology, and life experiences, making them appealing to a '
 'broad audience.<|end_of_text|>')
('Question: What is the reason behind the popularity of Xkcd comics among '
 'internet users?\n'
 '\n'
 'Answer: Xkcd comics are popular among internet 

In [16]:
# Enable reward modelling stats
from unsloth import PatchDPOTrainer
PatchDPOTrainer()

### Train Model
Using Huggingface TRL's `ORPOTrainer`, we do 60 steps to speed things up. However, we can set num_train_epochs = 1 for a full run, and turn off max_steps=None. TRL's `DPOTrainer` is also supported!

In [17]:
from trl import ORPOConfig, ORPOTrainer
from unsloth import is_bfloat16_supported

orpo_trainer = ORPOTrainer(
    model = model,
    train_dataset = dataset,
    tokenizer = tokenizer,
    args = ORPOConfig(
        max_length = max_seq_length,
        max_prompt_length = max_seq_length//2,
        max_completion_length = max_seq_length//2,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        beta = 0.1,
        logging_steps = 1,
        optim = "adamw_8bit",
        lr_scheduler_type = "linear",
        max_steps = 30, # Change to num_train_epochs = 1 for full training runs
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        output_dir = "outputs",
    ),
)

Map:   0%|          | 0/16000 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [18]:
orpo_trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 16,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 30
 "-____-"     Number of trainable parameters = 167,772,160
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss,rewards / chosen,rewards / rejected,rewards / accuracies,rewards / margins,logps / rejected,logps / chosen,logits / rejected,logits / chosen
1,2.458,-0.181352,-0.095583,0.0,-0.085769,-0.955831,-1.813524,-0.575378,-0.561036
2,2.293,-0.157157,-0.090668,0.125,-0.066489,-0.906682,-1.571572,-0.945717,-0.792619
3,2.4742,-0.120382,-0.096684,0.125,-0.023698,-0.96684,-1.203818,-0.751063,-0.75035
4,2.6564,-0.163904,-0.129227,0.125,-0.034678,-1.292265,-1.639043,-0.885241,-0.94466
5,2.1277,-0.117335,-0.087243,0.25,-0.030092,-0.872427,-1.173352,-0.632229,-0.599096
6,2.2387,-0.128038,-0.117357,0.375,-0.010681,-1.173566,-1.280375,-0.610937,-0.682004
7,2.2474,-0.097155,-0.095071,0.5,-0.002084,-0.950708,-0.971551,-0.820427,-0.886267
8,2.1747,-0.105512,-0.070017,0.0,-0.035495,-0.700168,-1.055119,-0.951585,-0.925862
9,2.1203,-0.115601,-0.072845,0.25,-0.042756,-0.72845,-1.156015,-1.11038,-1.154624
10,1.9788,-0.11696,-0.088226,0.0,-0.028734,-0.882262,-1.169603,-0.619079,-0.714145


TrainOutput(global_step=30, training_loss=2.0209736227989197, metrics={'train_runtime': 94.3364, 'train_samples_per_second': 2.544, 'train_steps_per_second': 0.318, 'total_flos': 0.0, 'train_loss': 2.0209736227989197, 'epoch': 0.015})

<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

In [19]:
def print_formatted_prompt(prompts):
    """
    Format and print the given prompt in a readable format.

    :param prompt: A single string prompt that includes instruction, input, and output.
    """
    # Remove special tags
    prompt = prompts[0]
    prompt_cleaned = prompt.replace("<|begin_of_text|>", "").replace("<|end_of_text|>", "")

    # Split into sections: Instruction, Input, Output
    sections = prompt_cleaned.split("### ")

    # Store formatted output components
    formatted_output = {}

    # Parse the sections
    for section in sections:
        if section.startswith("Instruction:"):
            formatted_output["Instruction"] = section.replace("Instruction:\n", "").strip()
        elif section.startswith("Input:"):
            formatted_output["Input"] = section.replace("Input:\n", "").strip()
        elif section.startswith("Output:"):
            formatted_output["Output"] = section.replace("Output:\n", "").strip()

    # Print each section in a clean format
    print("-----Formatted Prompt Output-----")
    print("Instruction:\n", formatted_output.get("Instruction", "N/A"))
    print("\nInput:\n", formatted_output.get("Input", "N/A"))
    print("\nOutput:\n", formatted_output.get("Output", "N/A"))
    print("---------------------------------\n")

In [20]:
# alpaca_prompt = Copied from above
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
    alpaca_prompt.format(
        "Continue the fibonnaci sequence.", # instruction
        "1, 1, 2, 3, 5, 8", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 64, use_cache = True)
tokenizer.batch_decode(outputs)

['<|begin_of_text|>Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n1, 1, 2, 3, 5, 8\n\n### Response:\n13\n\n### Explanation:\nThe Fibonacci sequence is a series of numbers where each number is the sum of the previous two numbers. The first two numbers are always 1, and the third number is 1 again. So the next number in the sequence after 8 would be 5 + 8 = 13.']

#### Define Basic Template for Alpaca Prompt

In [21]:
# Define a basic template for alpaca_prompt
alpaca_prompt = "### Instruction:\n{}\n### Input:\n{}\n### Output:\n{}"
FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Identity()
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=64, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=64, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
              )
              (k_proj): lora.Linear4bit(
      

#### Sentiment Analysis

In [22]:
inputs_sentiment = tokenizer(
    [
        alpaca_prompt.format(
            "Analyze the sentiment of the given text. Return either Positive, Negative, or Neutral.",  # Instruction
            "I really enjoyed the movie. It was fantastic!",  # Input text for sentiment analysis
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_sentiment = model.generate(**inputs_sentiment, max_new_tokens=32, use_cache=True)

In [23]:
print("Sentiment Analysis Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_sentiment))

Sentiment Analysis Result:
-----Formatted Prompt Output-----
Instruction:
 Analyze the sentiment of the given text. Return either Positive, Negative, or Neutral.

Input:
 I really enjoyed the movie. It was fantastic!

Output:
 Positive
---------------------------------



#### Named Entity Recognition (NER)

In [24]:
inputs_ner = tokenizer(
    [
        alpaca_prompt.format(
            "Identify named entities (i.e. Person, Organizations, and Locations) in the given text. The output should be in list format (i.e. [People: Charles, Organizations: NATO, Locations: Cuba])",  # Instruction
            "Barack Obama was born in Hawaii and was the president of the United States.",  # Input text for NER
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_ner = model.generate(**inputs_ner, max_new_tokens=64, use_cache=True)

In [25]:
print("Named Entity Recognition Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_ner))

Named Entity Recognition Result:
-----Formatted Prompt Output-----
Instruction:
 Identify named entities (i.e. Person, Organizations, and Locations) in the given text. The output should be in list format (i.e. [People: Charles, Organizations: NATO, Locations: Cuba])

Input:
 Barack Obama was born in Hawaii and was the president of the United States.

Output:
 [People: Barack Obama, Locations: Hawaii, Organizations: United States]
---------------------------------



#### Text Classification

In [26]:
inputs_classification = tokenizer(
    [
        alpaca_prompt.format(
            "Classify the given text into one of the following categories: Politics, Sports, Technology, Health, or Entertainment. Your output should only be one of these categories, whichever most closely aligns with the input.",  # Instruction
            "The new AI research papers show advancements in natural language understanding.",  # Input text for classification
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_classification = model.generate(**inputs_classification, max_new_tokens=32, use_cache=True)

In [27]:
print("Text Classification Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_classification))

Text Classification Result:
-----Formatted Prompt Output-----
Instruction:
 Classify the given text into one of the following categories: Politics, Sports, Technology, Health, or Entertainment. Your output should only

Input:
 The new AI research papers show advancements in natural language understanding.

Output:
 Technology
---------------------------------



#### Text Summarization

In [28]:
inputs_summarization = tokenizer(
    [
        alpaca_prompt.format(
            "Summarize the following article into one sentence. It is important that only one sentence is used.",  # Instruction
            "Machine learning has seen rapid advancements over the past decade, with applications ranging from natural language processing to computer vision. Researchers have developed new algorithms and models, such as transformers, which have drastically improved performance across a variety of tasks.",  # Input text for summarization
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_summarization = model.generate(**inputs_summarization, max_new_tokens=64, use_cache=True)

In [29]:
print("Text Summarization Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_summarization))

Text Summarization Result:
-----Formatted Prompt Output-----
Instruction:
 Summarize the following article into one sentence. It is important that only one sentence is used.

Input:
 Machine learning has seen rapid advancements over the past decade, with applications ranging from natural language processing to computer vision. Researchers have developed new algorithms and models, such as transformers, which have drastically improved performance across a variety of tasks.

Output:
 The rapid advancements in machine learning over the past decade have led to the development of new algorithms and models, such as transformers, which have improved performance across a variety of tasks.
---------------------------------



#### Question Answering

In [30]:
inputs_question_answering = tokenizer(
    [
        alpaca_prompt.format(
            "Answer the following question based on the context provided.",  # Instruction
            "Context: The Eiffel Tower is located in Paris, France. It was constructed in 1889 and has become a global cultural icon of France and one of the most recognizable structures in the world.\nQuestion: Where is the Eiffel Tower located?",  # Input context and question
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_question_answering = model.generate(**inputs_question_answering, max_new_tokens=32, use_cache=True)


In [31]:
print("Question Answering Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_question_answering))

Question Answering Result:
-----Formatted Prompt Output-----
Instruction:
 Answer the following question based on the context provided.

Input:
 Context: The E

Output:
 The Eiffel Tower is located in Paris, France.
---------------------------------



#### Information Extraction

In [32]:
inputs_information_extraction = tokenizer(
    [
        alpaca_prompt.format(
            "Extract key pieces of information such as dates, names, and locations from the given text.",  # Instruction
            "Albert Einstein was born on March 14, 1879, in Ulm, Germany. He later moved to the United States.",  # Input text for information extraction
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_information_extraction = model.generate(**inputs_information_extraction, max_new_tokens=64, use_cache=True)


In [33]:
print("Information Extraction Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_information_extraction))

Information Extraction Result:
-----Formatted Prompt Output-----
Instruction:
 Extract key pieces of information such as dates, names, and locations from the given text.

Input:
 Albert Einstein was born on March 14, 1879, in Ulm, Germany. He later moved to the United States.

Output:
 1879-03-14, Ulm, Germany, United States
---------------------------------



#### Topic Modeling

In [34]:
inputs_topic_modeling = tokenizer(
    [
        alpaca_prompt.format(
            "Identify the main topic of the given text.",  # Instruction
            "Quantum computing uses quantum-mechanical phenomena such as superposition and entanglement to perform computation. Quantum computers are different from classical computers in many ways.",  # Input text for topic modeling
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_topic_modeling = model.generate(**inputs_topic_modeling, max_new_tokens=32, use_cache=True)


In [35]:
print("Topic Modeling Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_topic_modeling))

Topic Modeling Result:
-----Formatted Prompt Output-----
Instruction:
 Identify the main topic of the given text.

Input:
 Quantum computing uses quantum-mechanical phenomena such as super

Output:
 Quantum computing
---------------------------------



#### Text Generation

In [36]:
inputs_text_generation = tokenizer(
    [
        alpaca_prompt.format(
            "Generate a short creative story based on the given prompt.",  # Instruction
            "Once upon a time in a small village, there was a mysterious forest that no one dared to enter. One day, a young child named Alex...",  # Input text for text generation
            "",  # Output - leave this blank for generation!
        )
    ], return_tensors="pt"
).to("cuda")
outputs_text_generation = model.generate(**inputs_text_generation, max_new_tokens=128, use_cache=True)

In [37]:
print("Text Generation Result:")
print_formatted_prompt(tokenizer.batch_decode(outputs_text_generation))

Text Generation Result:
-----Formatted Prompt Output-----
Instruction:
 Generate a short creative story based on the given prompt.

Input:
 Once upon a time in a small village, there was a mysterious forest that no one dared to enter. One day, a young child named Alex...

Output:
 Once upon a time in a small village, there was a mysterious forest that no one dared to enter. One day, a young child named Alex decided to venture into the forest to explore its secrets. As he entered the forest, he noticed that the trees were all twisted and gnarled, and the leaves were a deep shade of green. He also noticed a strange smell in the air, like something was rotting. Alex walked deeper into the forest, and soon he came across a large clearing. In the middle of the clearing was a giant tree with a large hole in the center. Alex walked up to the tree and looked inside
---------------------------------

