## Exploring LLM Finetuning with QLoRA

*QLoRA* (https://arxiv.org/abs/2305.14314) uses just 4-bits for quantizing base pretrained model.

### Medium article -

4-bit - https://dassum.medium.com/fine-tune-large-language-model-llm-on-a-custom-dataset-with-qlora-fb60abdeba07

ToDo - 8-bit - https://medium.com/pankajmathur/parameter-efficient-fine-tuning-for-large-language-models-with-peft-and-lora-967a9b297abd

In [18]:
!pip install -q bitsandbytes transformers peft accelerate datasets einops evaluate trl rouge_score
#!pip install -q -U trl rouge_score

In [19]:
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    GenerationConfig
)
from tqdm import tqdm
from trl import SFTTrainer
import torch
import time
import pandas as pd
import numpy as np
from huggingface_hub import interpreter_login

interpreter_login()


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .


Enter your token (input will not be visible):  ········
Add token as git credential? (Y/n)  n


Token is valid (permission: write).
Your token has been saved to /home/studio-lab-user/.cache/huggingface/token
Login successful


In [20]:
import os
# disable Weights and Biases
os.environ['WANDB_DISABLED']="true"

In [21]:
huggingface_dataset_name = "neil-code/dialogsum-test"
dataset = load_dataset(huggingface_dataset_name)

In [22]:
dataset['train'][0]

{'id': 'train_0',
 'dialogue': "#Person1#: Hi, Mr. Smith. I'm Doctor Hawkins. Why are you here today?\n#Person2#: I found it would be a good idea to get a check-up.\n#Person1#: Yes, well, you haven't had one for 5 years. You should have one every year.\n#Person2#: I know. I figure as long as there is nothing wrong, why go see the doctor?\n#Person1#: Well, the best way to avoid serious illnesses is to find out about them early. So try to come at least once a year for your own good.\n#Person2#: Ok.\n#Person1#: Let me see here. Your eyes and ears look fine. Take a deep breath, please. Do you smoke, Mr. Smith?\n#Person2#: Yes.\n#Person1#: Smoking is the leading cause of lung cancer and heart disease, you know. You really should quit.\n#Person2#: I've tried hundreds of times, but I just can't seem to kick the habit.\n#Person1#: Well, we have classes and some medications that might help. I'll give you more information before you leave.\n#Person2#: Ok, thanks doctor.",
 'summary': "Mr. Smith'

In [23]:
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=compute_dtype,
        bnb_4bit_use_double_quant=False,
    )

In [24]:
#!pip download bitsandbytes-cuda110

In [25]:
model_name='microsoft/phi-2'
device_map = {"": 0}
original_model = AutoModelForCausalLM.from_pretrained(model_name, 
                                                      device_map=device_map,
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True
                                                      #use_auth_token=True
                                                     )

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [26]:
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
tokenizer.pad_token = tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [27]:
def gen(model,p, maxlen=100, sample=True):
    toks = tokenizer(p, return_tensors="pt")
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample,num_return_sequences=1,temperature=0.1,num_beams=1,top_p=0.95,)
    return tokenizer.batch_decode(res,skip_special_tokens=True)

In [28]:
%%time
from transformers import set_seed
seed = 42
set_seed(seed)

index = 10

prompt = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

formatted_prompt = f"Instruct: Summarize the following conversation.\n{prompt}\nOutput:\n"
res = gen(original_model,formatted_prompt,100,)
#print(res[0])
output = res[0].split('Output:\n')[1]

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{formatted_prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ZERO SHOT:\n{output}')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


---------------------------------------------------------------------------------------------------
INPUT PROMPT:
Instruct: Summarize the following conversation.
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Output:

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# attends Brian's birthday pa

In [29]:
def create_prompt_formats(sample):
    """
    Format various fields of the sample ('instruction','output')
    Then concatenate them using two newline characters 
    :param sample: Sample dictionnary
    """
    INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
    INSTRUCTION_KEY = "### Instruct: Summarize the below conversation."
    RESPONSE_KEY = "### Output:"
    END_KEY = "### End"
    
    blurb = f"\n{INTRO_BLURB}"
    instruction = f"{INSTRUCTION_KEY}"
    input_context = f"{sample['dialogue']}" if sample["dialogue"] else None
    response = f"{RESPONSE_KEY}\n{sample['summary']}"
    end = f"{END_KEY}"
    
    parts = [part for part in [blurb, instruction, input_context, response, end] if part]

    formatted_prompt = "\n\n".join(parts)
    sample["text"] = formatted_prompt

    return sample

In [30]:
from functools import partial

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def get_max_length(model):
    conf = model.config
    max_length = None
    for length_setting in ["n_positions", "max_position_embeddings", "seq_length"]:
        max_length = getattr(model.config, length_setting, None)
        if max_length:
            print(f"Found max lenth: {max_length}")
            break
    if not max_length:
        max_length = 1024
        print(f"Using default max length: {max_length}")
    return max_length


def preprocess_batch(batch, tokenizer, max_length):
    """
    Tokenizing a batch
    """
    return tokenizer(
        batch["text"],
        max_length=max_length,
        truncation=True,
    )

# SOURCE https://github.com/databrickslabs/dolly/blob/master/training/trainer.py
def preprocess_dataset(tokenizer: AutoTokenizer, max_length: int,seed, dataset):
    """Format & tokenize it so it is ready for training
    :param tokenizer (AutoTokenizer): Model Tokenizer
    :param max_length (int): Maximum number of tokens to emit from tokenizer
    """
    
    # Add prompt to each sample
    print("Preprocessing dataset...")
    dataset = dataset.map(create_prompt_formats)#, batched=True)
    
    # Apply preprocessing to each batch of the dataset & and remove 'instruction', 'context', 'response', 'category' fields
    _preprocessing_function = partial(preprocess_batch, max_length=max_length, tokenizer=tokenizer)
    dataset = dataset.map(
        _preprocessing_function,
        batched=True,
        remove_columns=['id', 'topic', 'dialogue', 'summary'],
    )

    # Filter out samples that have input_ids exceeding max_length
    dataset = dataset.filter(lambda sample: len(sample["input_ids"]) < max_length)
    
    # Shuffle dataset
    dataset = dataset.shuffle(seed=seed)

    return dataset

In [31]:
## Pre-process dataset
max_length = get_max_length(original_model)
print(max_length)

train_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['train'])
eval_dataset = preprocess_dataset(tokenizer, max_length,seed, dataset['validation'])

Found max lenth: 2048
2048
Preprocessing dataset...
Preprocessing dataset...


In [32]:
from peft import prepare_model_for_kbit_training
model_prepped_for_kbit_training = prepare_model_for_kbit_training(original_model)

In [33]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

config = LoraConfig(
    r=32, #Rank
    lora_alpha=32,
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense'
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

# 1 - Enabling gradient checkpointing to reduce memory usage during fine-tuning
model_prepped_for_kbit_training.gradient_checkpointing_enable()

peft_model = get_peft_model(model_prepped_for_kbit_training, config)

In [34]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [35]:
print_trainable_parameters(peft_model)

trainable params: 20971520 || all params: 1542364160 || trainable%: 1.3596996444730667


In [36]:
output_dir = f'./peft-dialogue-summary-training-1'
import transformers

peft_training_args = TrainingArguments(
    output_dir = output_dir,
    warmup_steps=0,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    max_steps=200,  # 100 is very less, should be at least 1k
    learning_rate=2e-4,
    optim="paged_adamw_8bit",
    logging_steps=25,
    logging_dir="./logs",
    save_strategy="steps",
    save_steps=25,
    evaluation_strategy="steps",
    eval_steps=25,
    do_eval=True,
    gradient_checkpointing=True,
    report_to="none",
    #overwrite_output_dir = 'True',
    group_by_length=True,
)

peft_model.config.use_cache = False

peft_trainer = transformers.Trainer(
    model=peft_model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    args=peft_training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

max_steps is given, it will override any value given in num_train_epochs


In [None]:
peft_trainer.train()



Step,Training Loss,Validation Loss
25,1.6418,1.391805
50,1.1902,1.386621
75,1.447,1.354457
100,1.2061,1.356937
125,1.4406,1.34719
150,1.1434,1.349374
175,1.4022,1.340913
200,1.1524,1.340084




TrainOutput(global_step=200, training_loss=1.3279778099060058, metrics={'train_runtime': 1509.6849, 'train_samples_per_second': 0.53, 'train_steps_per_second': 0.132, 'total_flos': 3647665311436800.0, 'train_loss': 1.3279778099060058, 'epoch': 0.400200100050025})

In [38]:
peft_trainer.save_model()

In [39]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

base_model_id = "microsoft/phi-2"
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [40]:
eval_tokenizer = AutoTokenizer.from_pretrained(base_model_id, add_bos_token=True, trust_remote_code=True, use_fast=False)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [41]:
from peft import PeftModel

ft_model = PeftModel.from_pretrained(base_model, "./peft-dialogue-summary-training-1",torch_dtype=torch.float16,is_trainable=False)

In [42]:
%%time
from transformers import set_seed
set_seed(seed)

index = 10
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"

peft_model_res = gen(ft_model,prompt,100,)
peft_model_output = peft_model_res[0].split('Output:\n')[1]
#print(peft_model_output)
prefix, success, result = peft_model_output.partition('###')

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'PEFT MODEL:\n{prefix}')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


---------------------------------------------------------------------------------------------------
INPUT PROMPT:
Instruct: Summarize the following conversation.
#Person1#: Happy Birthday, this is for you, Brian.
#Person2#: I'm so happy you remember, please come in and enjoy the party. Everyone's here, I'm sure you have a good time.
#Person1#: Brian, may I have a pleasure to have a dance with you?
#Person2#: Ok.
#Person1#: This is really wonderful party.
#Person2#: Yes, you are always popular with everyone. and you look very pretty today.
#Person1#: Thanks, that's very kind of you to say. I hope my necklace goes with my dress, and they both make me look good I feel.
#Person2#: You look great, you are absolutely glowing.
#Person1#: Thanks, this is a fine party. We should have a drink together to celebrate your birthday
Output:

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
#Person1# attends Brian's birthday pa

In [43]:
original_model = AutoModelForCausalLM.from_pretrained(base_model_id, 
                                                      device_map='auto',
                                                      quantization_config=bnb_config,
                                                      trust_remote_code=True,
                                                      use_auth_token=True)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [44]:
import pandas as pd

dialogues = dataset['test'][0:10]['dialogue']
human_baseline_summaries = dataset['test'][0:10]['summary']

original_model_summaries = []
instruct_model_summaries = []
peft_model_summaries = []

for idx, dialogue in enumerate(dialogues):
    human_baseline_text_output = human_baseline_summaries[idx]
    prompt = f"Instruct: Summarize the following conversation.\n{dialogue}\nOutput:\n"
    
    original_model_res = gen(original_model,prompt,100,)
    original_model_text_output = original_model_res[0].split('Output:\n')[1]
    
    peft_model_res = gen(ft_model,prompt,100,)
    peft_model_output = peft_model_res[0].split('Output:\n')[1]
    print(peft_model_output)
    peft_model_text_output, success, result = peft_model_output.partition('###')

    original_model_summaries.append(original_model_text_output)
    peft_model_summaries.append(peft_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, peft_model_summaries))
 
df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'peft_model_summaries'])
df

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Ms. Dawson takes a dictation for #Person1#, who wants to restrict all office communications to email correspondence and official memos. Ms. Dawson asks if the restriction applies to external communications as well. #Person1# says yes, and Ms. Dawson continues with the memo. #Person1# warns employees who persist in using Instant Messaging will face termination.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Ms. Dawson takes a dictation for #Person1#'s memo about the new office communication policy. #Person1# wants to restrict all office communications to email correspondence and official memos. #Person2# asks if the policy applies to external communications as well. #Person1# says yes and warns employees who persist in using Instant Messaging will face termination. #Person2# gets the memo typed up and distributed to all employees.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Ms. Dawson takes a dictation for #Person1#'s memo about the new office communication policy. #Person1# wants to restrict all office communications to email correspondence and official memos. #Person2# asks if the policy applies to external communications as well. #Person1# says yes and warns employees who persist in using Instant Messaging will face termination. #Person2# gets the memo typed up and distributed to all employees before 4 pm.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


#Person2# gets stuck in traffic and decides to consider taking public transport system to work. #Person1# suggests biking to work as well.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


#Person2# is stuck in traffic and #Person1# suggests that #Person2# should consider taking public transport system to work. #Person2# agrees and decides to quit driving to work.

#End of conversation#



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


#Person2# gets stuck in traffic and decides to consider taking public transport system to work. #Person1# suggests biking to work as well.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Kate tells #Person1# that Masha and Hero are getting divorced. #Person1# is surprised and #Person2# is shocked.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Kate tells #Person1# that Masha and Hero are getting divorced. #Person1# is surprised and asks about the kids and custody.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Kate tells #Person1# that Masha and Hero are getting divorced. #Person1# is surprised and #Person2# is shocked.



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


#Person1# brings a gift for Brian's birthday and invites him to a party. Brian is happy to see everyone and has a dance with #Person1#. #Person1# compliments Brian's appearance and they have a drink together to celebrate.

#End of output#.



Unnamed: 0,human_baseline_summaries,original_model_summaries,peft_model_summaries
0,Ms. Dawson helps #Person1# to write a memo to ...,"Person 1: Ms. Dawson, I need you to take a dic...","Ms. Dawson takes a dictation for #Person1#, wh..."
1,In order to prevent employees from wasting tim...,"Person 1: Ms. Dawson, I need you to take a dic...",Ms. Dawson takes a dictation for #Person1#'s m...
2,Ms. Dawson takes a dictation for #Person1# abo...,"Person 1: Ms. Dawson, I need you to take a dic...",Ms. Dawson takes a dictation for #Person1#'s m...
3,#Person2# arrives late because of traffic jam....,Person1 and Person2 are discussing the traffic...,#Person2# gets stuck in traffic and decides to...
4,#Person2# decides to follow #Person1#'s sugges...,Person1 and Person2 are discussing the traffic...,#Person2# is stuck in traffic and #Person1# su...
5,#Person2# complains to #Person1# about the tra...,Person2: I'm going to stop driving to work bec...,#Person2# gets stuck in traffic and decides to...
6,#Person1# tells Kate that Masha and Hero get d...,Kate informed that Masha and Hero are getting ...,Kate tells #Person1# that Masha and Hero are g...
7,#Person1# tells Kate that Masha and Hero are g...,Kate informed that Masha and Hero are getting ...,Kate tells #Person1# that Masha and Hero are g...
8,#Person1# and Kate talk about the divorce betw...,Kate informed that Masha and Hero are getting ...,Kate tells #Person1# that Masha and Hero are g...
9,#Person1# and Brian are at the birthday party ...,"Person1 and Person2 are at a party, and Person...",#Person1# brings a gift for Brian's birthday a...


In [45]:
import evaluate

rouge = evaluate.load('rouge')

original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_summaries,
    references=human_baseline_summaries[0:len(peft_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

print("Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

ORIGINAL MODEL:
{'rouge1': 0.27440914131706196, 'rouge2': 0.09111316663727126, 'rougeL': 0.20614499433059558, 'rougeLsum': 0.21783600413303578}
PEFT MODEL:
{'rouge1': 0.474787190601789, 'rouge2': 0.15405773808360948, 'rougeL': 0.3268467611011471, 'rougeLsum': 0.33034255416094915}
Absolute percentage improvement of PEFT MODEL over ORIGINAL MODEL
rouge1: 20.04%
rouge2: 6.29%
rougeL: 12.07%
rougeLsum: 11.25%


In [46]:
merged = ft_model.merge_and_unload()
merged.save_pretrained("merged",safe_serialization=True)
tokenizer.save_pretrained("merged")



('merged/tokenizer_config.json',
 'merged/special_tokens_map.json',
 'merged/vocab.json',
 'merged/merges.txt',
 'merged/added_tokens.json')

In [47]:
merged.push_to_hub("palash-phi2-4bit-dialogsum")
tokenizer.push_to_hub("palash-phi2-4bit-dialogsum")

model.safetensors:   0%|          | 0.00/1.94G [00:00<?, ?B/s]

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/palash147/palash-phi2-4bit-dialogsum/commit/6381b97da6b8af6273f555c32cf53d7f9c2c0d76', commit_message='Upload tokenizer', commit_description='', oid='6381b97da6b8af6273f555c32cf53d7f9c2c0d76', pr_url=None, pr_revision=None, pr_num=None)

In [48]:
task="Write a summary of below conversation."
inp="""
    user1:
    Good morning. What is your profession?

    user2:
    Good morning. I’m an accountant. What about you?

    user1:
    I’m a software engineer. How long have you been an accountant?

    user2:
    I’ve been an accountant for about five years now. What about you? How long have you been a software engineer?

    user1:
    I’ve been a software engineer for three years. What do you like most about accounting?

    user2:
    I like how challenging it can be. There’s always something to learn or something new to figure out. What do you like most about software engineering?

    user1:
    I like how creative it can be. I get to come up with new ideas and new ways of solving problems. It’s a great feeling when you can come up with something that works.
"""

prompt = f"""### Instruction:
Use the Task below and the Input given to write the Response.

### Task:
{task}

### Input:
{inp}

### Response:
"""

In [71]:
import gc
# clear the VRAM
import gc
#del ft_model
#del original_model
#del trainer
#del peft_model
torch.cuda.empty_cache()
gc.collect()

0

In [63]:
!pip install pynvml

Collecting pynvml
  Downloading pynvml-11.5.3-py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 2.7 MB/s eta 0:00:011
[?25hInstalling collected packages: pynvml
Successfully installed pynvml-11.5.3


In [72]:
from pynvml import *
nvmlInit()
h = nvmlDeviceGetHandleByIndex(0)
info = nvmlDeviceGetMemoryInfo(h)
print(f'total    : {info.total}')
print(f'free     : {info.free}')
print(f'used     : {info.used}')

total    : 16106127360
free     : 10776870912
used     : 5329256448


In [65]:
torch.cuda.mem_get_info()

(1545207808, 15655829504)

In [73]:
from transformers import AutoTokenizer, AutoModelForCausalLM

original_model = AutoModelForCausalLM.from_pretrained('microsoft/phi-2', device_map='auto',)

peft_model = AutoModelForCausalLM.from_pretrained("palash147/palash-phi2-4bit-dialogsum", device_map='auto',)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.


In [74]:
original_tokenizer = AutoTokenizer.from_pretrained('microsoft/phi-2',trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
original_tokenizer.pad_token = original_tokenizer.eos_token

peft_tokenizer = AutoTokenizer.from_pretrained("palash147/palash-phi2-4bit-dialogsum",trust_remote_code=True,padding_side="left",add_eos_token=True,add_bos_token=True,use_fast=False)
peft_tokenizer.pad_token = original_tokenizer.eos_token

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [75]:
def gen2(model, tok, p, maxlen=100, sample=True):
    toks = tok(p, return_tensors="pt", truncation=True)
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample,num_return_sequences=1,temperature=0.1,top_p=0.9)
    return tok.batch_decode(res,skip_special_tokens=True)

In [76]:
print(f"-------------------------\n\n")
print(f"Prompt:\n{prompt}\n")
print(f"-------------------------\n\n")

print(f"Before Training Response :")
res = gen2(original_model, original_tokenizer, prompt,100,)
print(res[0].split('### Response:\n\n')[1].split('###')[0])
print(f"-------------------------\n\n")

print(f"After Training Response :")
res = gen2(peft_model, peft_tokenizer, prompt,100,)
print(res[0].split('### Response:\n\n')[1].split('###')[0])
print(f"-------------------------\n\n")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


-------------------------


Prompt:
### Instruction:
Use the Task below and the Input given to write the Response.

### Task:
Write a summary of below conversation.

### Input:

    user1:
    Good morning. What is your profession?

    user2:
    Good morning. I’m an accountant. What about you?

    user1:
    I’m a software engineer. How long have you been an accountant?

    user2:
    I’ve been an accountant for about five years now. What about you? How long have you been a software engineer?

    user1:
    I’ve been a software engineer for three years. What do you like most about accounting?

    user2:
    I like how challenging it can be. There’s always something to learn or something new to figure out. What do you like most about software engineering?

    user1:
    I like how creative it can be. I get to come up with new ideas and new ways of solving problems. It’s a great feeling when you can come up with something that works.


### Response:


-------------------------


B

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


user1:
Good morning. What is your profession?

user2:
Good morning. I’m an accountant. What about you?

user1:
I’m a software engineer. How long have you been an accountant?

user2:
I’ve been an accountant for about five years now. What about you? How long have you been a software engineer?

user1:
I’ve been a software
-------------------------


After Training Response :
User1 and User2 are having a conversation about their professions. User1 is a software engineer and User2 is an accountant. User2 has been an accountant for five years and User1 has been a software engineer for three years. User2 likes how challenging accounting can be and User1 likes how creative software engineering can be. They both agree that their professions are rewarding in their own ways.



-------------------------


