<a href="https://colab.research.google.com/github/eljandoubi/Copilot/blob/main/LightweightFineTuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lightweight Fine-Tuning Project

In this cell, describe your choices for each of the following

* PEFT technique: QLoRA
* Model: mistralai/Mistral-7B-v0.1
* Evaluation approach: Perplexity
* Fine-tuning dataset: codeparrot/github-code

In [1]:
!pip install -r requirements.txt



## Loading and Evaluating a Foundation Model

In the cells below, I load the pre-trained Hugging Face model and evaluate its performance prior to fine-tuning. This step includes loading an appropriate tokenizer and dataset.

In [2]:
from datasets import load_dataset

In [3]:
train_size=100_000

In [4]:
val_size=train_size//10

In [5]:
test_size=val_size

In [6]:
seed=42

In [7]:
ds=load_dataset("codeparrot/github-code", streaming=True, trust_remote_code=True,
                split="train").shuffle(seed=seed,
                                       buffer_size=train_size+val_size+test_size)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [8]:
train_ds=ds.take(train_size)

In [9]:
val_ds=ds.skip(train_size).take(val_size)

In [10]:
test_ds=ds.skip(train_size+val_size).take(test_size)

In [11]:
from evaluate import load

In [44]:
metric_name="perplexity"

In [45]:
metric=load(metric_name)

In [46]:
metric

EvaluationModule(name: "perplexity", module_type: "metric", features: {'predictions': Value(dtype='string', id=None)}, usage: """
Args:
    model_id (str): model used for calculating Perplexity
            NOTE: Perplexity can only be calculated for causal language models.
                    This includes models such as gpt2, causal variations of bert,
                    causal versions of t5, and more (the full list can be found
                    in the AutoModelForCausalLM documentation here:
                    https://huggingface.co/docs/transformers/master/en/model_doc/auto#transformers.AutoModelForCausalLM )

    predictions (list of str): input text, each separate text snippet
        is one list entry.
    batch_size (int): the batch size to run texts through the model. Defaults to 16.
    add_start_token (bool): whether to add the start token to the texts,
        so the perplexity can include the probability of the first word. Defaults to True.
    device (str): device to

In [14]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [15]:
model_id = "mistralai/Mistral-7B-v0.1"

In [16]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [17]:
if tokenizer.pad_token is None:
  print("it was None")
  tokenizer.pad_token = tokenizer.eos_token

it was None


In [18]:
import torch

In [19]:
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", load_in_4bit=True,
                                             bnb_4bit_compute_dtype=torch.bfloat16)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [20]:
model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralSdpaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )

In [21]:
from transformers import pipeline

In [22]:
task="text-generation"

In [39]:
pipe = pipeline(task=task,model=model,tokenizer=tokenizer,max_new_tokens=1_00)

In [31]:

from functools import partial
from datasets import Dataset

def gen_from_iterable_dataset(iterable_ds):
    yield from iterable_ds

data = Dataset.from_generator(partial(gen_from_iterable_dataset, test_ds.take(3)), features=test_ds.features)

Generating train split: 0 examples [00:00, ? examples/s]

In [32]:
data

Dataset({
    features: ['code', 'repo_name', 'path', 'language', 'license', 'size'],
    num_rows: 3
})

In [33]:
from evaluate import evaluator

In [37]:
task_evaluator = evaluator("text-generation")

In [None]:
results = task_evaluator.compute(model_or_pipeline=pipe, data=data, metric=metric,input_column='code')

In [1]:
for x in test_ds.take(3):
  res=pipe(x['code'])[0]['generated_text']
  metric.add(predictions=res)

metric.compute(model_id=model)

NameError: name 'test_ds' is not defined

## Performing Parameter-Efficient Fine-Tuning

TODO: In the cells below, create a PEFT model from your loaded model, run a training loop, and save the PEFT model weights.

## Performing Inference with a PEFT Model

TODO: In the cells below, load the saved PEFT model weights and evaluate the performance of the trained PEFT model. Be sure to compare the results to the results from prior to fine-tuning.