<a href="https://colab.research.google.com/github/eljandoubi/Copilot/blob/main/LightweightFineTuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lightweight Fine-Tuning Project

* PEFT technique: LoftQ initialization & QLoRA-style training
* Model: facebook/opt-125m
* Evaluation approach: Perplexity
* Fine-tuning dataset: codeparrot/github-code

In [1]:
!pip install -r requirements.txt



Please restart the notebook.

## Loading and Evaluating a Foundation Model

In the cells below, I will load the pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
from datasets import load_dataset

In [3]:
train_size=1_000

In [4]:
val_size=train_size//10

In [5]:
test_size=val_size

In [6]:
seed=42

I will load the dataset in streaming mode to avoid downloading the entire 1TB.

In [7]:
iter_ds=load_dataset("codeparrot/github-code", streaming=True, trust_remote_code=True,
                split="train").shuffle(seed=seed,
                                       buffer_size=train_size+val_size+test_size)

Downloading builder script:   0%|          | 0.00/7.23k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.54k [00:00<?, ?B/s]

In [8]:
iter_train_ds=iter_ds.take(train_size)

In [9]:
iter_val_ds=iter_ds.skip(train_size).take(val_size)

In [10]:
iter_test_ds=iter_ds.skip(train_size+val_size).take(test_size)

In [11]:
from transformers import AutoTokenizer

In [12]:
model_id = "facebook/opt-125m"

In [13]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/651 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

In [14]:
if tokenizer.pad_token is None:
  print("It was None")
  tokenizer.pad_token = tokenizer.eos_token

In [15]:
from transformers import PreTrainedTokenizer

I will segment the text so that it can be processed by the model within the context length.

In [16]:
def chunk_and_encode(
        samples: dict[str,  str],
        tokenizer: PreTrainedTokenizer,
        max_len: int,
        stride: int,
        col_name: str) -> dict[str, list[list[int]]]:
    """
    Split test in chunks and encode them
    Args:
        samples (dict[str, str]):  batch of data raws from hugging face dataset
        tokenizer (PreTrainedTokenizer): hugging face tokenizer
        max_len (int): the length of chunk
        stride (int): the number of overlapping tokens
        col_name (str): the name of the text column
    Return:
        tokenized chunks (dict[str, list[list[int]]])
    """

    chunks = []
    chunks_mask = []
    pad_id = tokenizer.pad_token_id

    for text in samples[col_name]:
        tokens = tokenizer(text, truncation=False,
                           return_attention_mask=False,
                           padding=False)['input_ids']

        start_idx = 0
        while start_idx < len(tokens):
            end_idx = min(start_idx + max_len, len(tokens))
            chunk = tokens[start_idx:end_idx]
            len_chunk = len(chunk)
            chunk += (max_len - len_chunk) * [pad_id]
            attention_mask = [1] * len_chunk + (max_len - len_chunk) * [0]

            chunks.append(chunk)
            chunks_mask.append(attention_mask)

            start_idx += stride
    return {
        'input_ids': chunks,
        'attention_mask': chunks_mask
    }

In [17]:
max_length=2**11

In [18]:
stride=max_length//16

In [19]:
col_name="code"

In [20]:
from functools import partial

In [21]:
process_text = partial(chunk_and_encode,
                tokenizer=tokenizer,
                max_len=max_length,
                stride=stride,
                col_name=col_name)

In [22]:
from datasets import Dataset,IterableDataset

In [23]:
def gen_from_iterable_dataset(iterable_ds: IterableDataset)->dict:
    """Create a generator from an iterable dataset"""
    yield from iterable_ds

In [24]:
def create_dataset(iterable_ds: IterableDataset)->Dataset:
    """Create a dataset from an iterable dataset"""
    iter_token=iterable_ds.map(process_text,
                              remove_columns=iter_ds.column_names,
                              batched=True)
    return Dataset.from_generator(partial(gen_from_iterable_dataset, iter_token))

In [25]:
train_ds=create_dataset(iter_train_ds).shuffle(seed=seed)

Generating train split: 0 examples [00:00, ? examples/s]

In [26]:
val_ds=create_dataset(iter_val_ds)

Generating train split: 0 examples [00:00, ? examples/s]

In [27]:
test_ds=create_dataset(iter_test_ds)

Generating train split: 0 examples [00:00, ? examples/s]

I will load the model in NF4, as described in the QLoRA paper. The computation will be performed using Brain Float 16-bit precision.

In [28]:
import torch

In [29]:
from transformers import BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

In [30]:
from transformers import AutoModelForCausalLM

In [31]:
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=config)

pytorch_model.bin:   0%|          | 0.00/251M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [32]:
model

OPTForCausalLM(
  (model): OPTModel(
    (decoder): OPTDecoder(
      (embed_tokens): Embedding(50272, 768, padding_idx=1)
      (embed_positions): OPTLearnedPositionalEmbedding(2050, 768)
      (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (layers): ModuleList(
        (0-11): 12 x OPTDecoderLayer(
          (self_attn): OPTAttention(
            (k_proj): Linear4bit(in_features=768, out_features=768, bias=True)
            (v_proj): Linear4bit(in_features=768, out_features=768, bias=True)
            (q_proj): Linear4bit(in_features=768, out_features=768, bias=True)
            (out_proj): Linear4bit(in_features=768, out_features=768, bias=True)
          )
          (activation_fn): ReLU()
          (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (fc1): Linear4bit(in_features=768, out_features=3072, bias=True)
          (fc2): Linear4bit(in_features=3072, out_features=768, bias=True)
          (final_layer_nor

Perplexity (PPL) is one of the most common metrics for evaluating language models.

It is defined as the exponentiated average negative log-likelihood of a sequence, calculated with exponent base `e`.

In [33]:
from transformers import PreTrainedModel

In [34]:
from tqdm import tqdm

In [35]:
def evaluate(model: PreTrainedModel,
             eval_ds: Dataset,
             batch_size: int,
            )->dict[str,float]:

    """
    Compute the perplexity of a model over an evaluation dataset
    """
    model.eval()
    losses = []
    for batch in tqdm(eval_ds.iter(batch_size)):
        input_ids=torch.LongTensor(batch["input_ids"])
        with torch.no_grad():
            batch_loss = model(input_ids, labels=input_ids).loss.reshape(1,-1)

        losses.append(batch_loss)
    loss = torch.mean(torch.cat(losses))
    try:
        perplexity = torch.exp(loss).item()
    except OverflowError:
        perplexity = float("inf")
    return {"perplexity":perplexity}

In [36]:
batch_size=16

In [37]:
base_score=evaluate(model,test_ds,batch_size)

164it [20:19,  7.44s/it]


In [38]:
base_score

{'perplexity': 24.5625}

In [39]:
torch.cuda.empty_cache()

## Performing Parameter-Efficient Fine-Tuning

In the cells below, I will create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

In [40]:
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

In [41]:
model

OPTForCausalLM(
  (model): OPTModel(
    (decoder): OPTDecoder(
      (embed_tokens): Embedding(50272, 768, padding_idx=1)
      (embed_positions): OPTLearnedPositionalEmbedding(2050, 768)
      (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (layers): ModuleList(
        (0-11): 12 x OPTDecoderLayer(
          (self_attn): OPTAttention(
            (k_proj): Linear(in_features=768, out_features=768, bias=True)
            (v_proj): Linear(in_features=768, out_features=768, bias=True)
            (q_proj): Linear(in_features=768, out_features=768, bias=True)
            (out_proj): Linear(in_features=768, out_features=768, bias=True)
          )
          (activation_fn): ReLU()
          (self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (fc1): Linear(in_features=768, out_features=3072, bias=True)
          (fc2): Linear(in_features=3072, out_features=768, bias=True)
          (final_layer_norm): LayerNorm((768,), ep

In [42]:
from peft import LoftQConfig, LoraConfig, get_peft_model

In [43]:
loftq_config = LoftQConfig(loftq_bits=4,loftq_iter=10)

In [44]:
lora_config = LoraConfig(
    init_lora_weights="loftq",
    loftq_config=loftq_config,
    r=32,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

In [45]:
model = get_peft_model(model, lora_config)

In [46]:
model.print_trainable_parameters()

trainable params: 1,769,472 || all params: 127,008,768 || trainable%: 1.3931888544891642


In [47]:
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): OPTForCausalLM(
      (model): OPTModel(
        (decoder): OPTDecoder(
          (embed_tokens): Embedding(50272, 768, padding_idx=1)
          (embed_positions): OPTLearnedPositionalEmbedding(2050, 768)
          (final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (layers): ModuleList(
            (0-11): 12 x OPTDecoderLayer(
              (self_attn): OPTAttention(
                (k_proj): lora.Linear(
                  (base_layer): Linear(in_features=768, out_features=768, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=768, out_features=32, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=32, out_features=768, bias=False)
              

In [48]:
from transformers import DataCollatorForLanguageModeling

In [49]:
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

In [50]:
from transformers import TrainingArguments

In [51]:
import torch.multiprocessing as mp

In [52]:
training_args = TrainingArguments(
        f"{model_id}-finetuned-lora",
        optim="paged_lion_8bit",
        learning_rate=5e-6,
        weight_decay=0.01,
        auto_find_batch_size=True,
        gradient_accumulation_steps=4,
        num_train_epochs=3,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        dataloader_num_workers=mp.cpu_count(),
        fp16=True,
        logging_steps=100,
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        push_to_hub=False,
        greater_is_better=False,
    )

In [53]:
from transformers import Trainer

In [54]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_ds,
    eval_dataset= val_ds,
    data_collator=data_collator,
)

In [None]:
trainer.train()

Epoch,Training Loss,Validation Loss


## Performing Inference with a PEFT Model

In the cells below, I will load the saved PEFT model weights and evaluate the performance of the trained PEFT model.

In [None]:
model = AutoModelForCausalLM.from_pretrained(f"{model_id}-finetuned-lora", device_map="auto")

In [None]:
model