<a href="https://colab.research.google.com/github/seongcho1/Fine-tuning-BLOOM/blob/main/GenerAds_Fine_tuning_BLOOM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://github.com/huggingface/peft

Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full fine-tuning.

## Install requirements

First, run the cells below to install the requirements:

In [29]:
!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## Model loading

Here let's load the ```bloom-1b7``` model

In [38]:
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"
import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "bigscience/bloom-1b7",
    load_in_8bit=True,
    device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained("bigscience/tokenizer")




## Post-processing on the model

Finally, we need to apply some post-processing on the 8-bit model to enable training, let's freeze all out layers, and cast the layer-norm in ```float32``` for the same reasons.

In [31]:
for param in model.parameters():
  param.requires_grid = False;          # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()   # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

## Apply LoRA

Here comes the magic with ```peft```! Let's load a ```PeftModel``` and specify that we are going t ouse low-rank adapters (LoRA) using ```get_peft_model``` utility function from ```peft```.

In [32]:
def print_trainable_parameter(model):
  """
  Prints the number of trainable parameters in the model.
  """
  trainable_params = 0
  all_param = 0
  for _, param in model.named_parameters():
    all_param += param.numel()
    if param.requires_grad:
      trainable_params += param.numel()
  print(
      f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
  )

In [33]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["query_key_value"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameter(model)

trainable params: 3145728 || all params: 1725554688 || trainable%: 0.18230242262828822


## Preprocessing

We can simply load our dataset from Hugging Face with the ```load_dataset``` method!

In [34]:
import transformers
from datasets import load_dataset

dataset_name = "c-s-ale/Product-Descriptions-and-Ads"
product_name = "product"
product_desc = "description"
product_ad = "ad" 

In [35]:
dataset = load_dataset(dataset_name)
print(dataset)
print(dataset['train'][0])



  0%|          | 0/2 [00:00<?, ?it/s]

DatasetDict({
    test: Dataset({
        features: ['product', 'description', 'ad'],
        num_rows: 10
    })
    train: Dataset({
        features: ['product', 'description', 'ad'],
        num_rows: 90
    })
})
{'product': ' Harem pants', 'description': ' A style of pants with a dropped crotch, loose-fitting legs, and a gathered waistband for a unique, bohemian look.', 'ad': 'Discover Harem Pants! Unique, stylish bohemian vibes with a dropped crotch & loose legs. Comfy meets chic - elevate your wardrobe. Limited stock - shop now!'}


We want to put out data i the form:
```
Below is a product and description, please write an ad for this product.

### Product and Description:
PRODUCT NAME AND DESCRIPTION HERE

### Ad:
OUR AD HERE
```

This way, we can prompt our model well and receive the responses we want!

This is what fine-tuning, and prompt-engineering, is reall all about!

In [36]:
def generate_prompt(product_name: str, desc: str, ad: str) -> str:
  prompt = f"Below is a product and description, please write an ad for this product.\n\n### Product and Description:\n{product_name}: {desc}\n\n### Ad:\n{ad}"
  return prompt

dataset = dataset.map(lambda samples: tokenizer(generate_prompt(samples['product'], samples['description'], samples['ad'])))



In [37]:
trainer = transformers.Trainer(
    model=model,
    train_dataset=dataset['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=100,
        learning_rate=1e-3,
        fp16=True,
        logging_steps=1,
        output_dir='outputs'
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False    # silence the warnings. Please re-enable for inference!
trainer.train()

You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,3.3814
2,3.3007
3,3.3423
4,3.3187
5,3.2245
6,3.2953
7,3.2024
8,3.2537
9,3.1694
10,3.119


TrainOutput(global_step=100, training_loss=1.3090190576016902, metrics={'train_runtime': 394.6422, 'train_samples_per_second': 4.054, 'train_steps_per_second': 0.253, 'total_flos': 1162686167875584.0, 'train_loss': 1.3090190576016902, 'epoch': 17.39})

## Share adapters on the HuggingFace Hub

Make sure you have a Hugging Face account ,and you have set up a read/write token!

More info here: https://huggingface.co/docs/hub/security-tokens

In [39]:
HUGGING_FACE_USER_NAME = "seongcho"

In [40]:
from huggingface_hub import notebook_login
notebook_login()

Token is valid.
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [41]:
model.push_to_hub(f"{HUGGING_FACE_USER_NAME}/GenerAd-AI", use_auth_token=True)



pytorch_model.bin:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

Upload 1 LFS files:   0%|          | 0/1 [00:00<?, ?it/s]

CommitInfo(commit_url='https://huggingface.co/seongcho/GenerAd-AI/commit/26a3713e093f351757dcca819a6457213bf50557', commit_message='Upload BloomForCausalLM', commit_description='', oid='26a3713e093f351757dcca819a6457213bf50557', pr_url=None, pr_revision=None, pr_num=None)

## Load adapters from the Hub

You can also directly load adapters from the Hub using the commands below:

In [51]:
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = f"{HUGGING_FACE_USER_NAME}/GenerAd-AI"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto') #torch_dtype=torch.float16)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)



## Inference

You can then directly use the trained model or the model that you have loaded from the HuggingFace Hub for inference!

## Take it for a spin!

In [52]:
from IPython.display import display, Markdown

def make_inference(product_name, product_description):
  batch = tokenizer(f"### Product and Description:\n{product_name}: {desc}\n\n### Ad:\n{ad}", return_
                    
  with torch.cuda.amp.autocast():
    output_token = model.generate(**batch, max_new_tokens=50)

  display(Markdown((tokenizer.decode(output_tokens[0], skip_special_tokens=True))))


your_product_name_here = "Ruby Slippers"
your_product_description_here = "Beautiful slip"

make_inference(your_product_name_here, your_product_description_here)
