# Libraries

In [1]:
%%capture
# Installs Unsloth, Xformers (Flash Attention) and all other packages!
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes

In [2]:
!pip install -U git+https://github.com/huggingface/trl

Collecting git+https://github.com/huggingface/trl
  Cloning https://github.com/huggingface/trl to /tmp/pip-req-build-puvxoktx
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/trl /tmp/pip-req-build-puvxoktx
  Resolved https://github.com/huggingface/trl to commit 99f2c94b2200927a1dc156f16e012dca11f865e1
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.4.0->trl==0.8.7.dev0)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.4.0->trl==0.8.7.dev0)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.4.0->trl==0.8.7.dev0)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1

In [3]:
import torch

In [4]:
from trl import DPOConfig, DPOTrainer

# load Dataset

In [5]:
from datasets import load_dataset

dataset = load_dataset("Dahoas/full-hh-rlhf")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/478 [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/930 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/123M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/13.6M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/112052 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/12451 [00:00<?, ? examples/s]

In [6]:
dataset["train"]=dataset["train"].shuffle(seed=42).select(range(20000))

# Fine-Tune model

In [7]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048 

# Load model
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="MahmoudMohamed/Phi3_MeetingQA_4bit",
    max_seq_length=max_seq_length,
    dtype=torch.float16,  
    load_in_4bit=True,    
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Mistral patching release 2024.5
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0+cu121. CUDA = 7.5. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth


model.safetensors:   0%|          | 0.00/2.26G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/140 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/569 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
MahmoudMohamed/Phi3_MeetingQA_4bit does not have a padding token! Will use pad_token = <|placeholder6|>.


We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [8]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 8,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = True,
    random_state = 3407,
)

Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [9]:
model = model.to(torch.float16)

In [10]:
training_args = DPOConfig(
    output_dir="./dpo_phi3",
    beta=0.1,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=2,
    max_steps = 500

)

In [11]:
dpo_trainer = DPOTrainer(
    model,
    ref_model=None,
    args=training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    tokenizer=tokenizer,
)



Map:   0%|          | 0/20000 [00:00<?, ? examples/s]

Map:   0%|          | 0/12451 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


In [12]:
trainer_stats = dpo_trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 20,000 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 8 | Gradient Accumulation steps = 1
\        /    Total batch size = 8 | Total steps = 500
 "-____-"     Number of trainable parameters = 14,942,208
Could not estimate the number of tokens of the input, floating-point operations will not be computed


Step,Training Loss
500,0.3474


In [13]:
# Save artifacts
dpo_trainer.model.save_pretrained("model")
tokenizer.save_pretrained("model")

('model/tokenizer_config.json',
 'model/special_tokens_map.json',
 'model/tokenizer.model',
 'model/added_tokens.json',
 'model/tokenizer.json')

In [15]:
model.push_to_hub_merged("MahmoudMohamed/Phi3_DPO_4bit", tokenizer, save_method = "merged_4bit_forced", token = "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")

Unsloth: Merging 4bit and LoRA weights to 4bit...
This might take 5 minutes...




Done.
Unsloth: Saving 4bit Bitsandbytes model. Please wait...


README.md:   0%|          | 0.00/599 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.26G [00:00<?, ?B/s]

README.md:   0%|          | 0.00/605 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Saved merged_4bit model to https://huggingface.co/MahmoudMohamed/Phi3_DPO_4bit


In [16]:
trainer_stats

TrainOutput(global_step=500, training_loss=0.347373046875, metrics={'train_runtime': 4763.2514, 'train_samples_per_second': 0.84, 'train_steps_per_second': 0.105, 'total_flos': 0.0, 'train_loss': 0.347373046875, 'epoch': 0.2})