<a href="https://colab.research.google.com/github/gupta24789/llms-fine-tuning/blob/main/llama2/fine_tune_llama2_using_qlora.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Objective


In this notebook, we will fine-tune the **meta-llama/Llama-2-70b-chat-hf** llama2 model


Dataset Used : https://www.kaggle.com/datasets/azraimohamad/coursera-course-data

In [None]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "1"

In [None]:
import torch
from datasets import load_dataset
from dotenv import load_dotenv
from pprint import pprint
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,TrainingArguments,pipeline,logging
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model, AutoPeftModelForCausalLM, PeftModel
from trl import SFTTrainer
load_dotenv()

True

## Transform Data

Data Format:

        <s> [INST] prompt [/INST] response </s>

In [None]:
dataset = load_dataset("csv", data_dir="data",data_files= "coursera_course_dataset_v3.csv")
dataset = dataset.select_columns(['Title','Skills'])
pprint(dataset['train'][0])

{'Skills': ' Network Security, Python Programming, Linux, Cloud Computing, '
           'Algorithms, Audit, Computer Programming, Computer Security '
           'Incident Management, Cryptography, Databases, Leadership and '
           'Management, Network Architecture, Risk Management, SQL',
 'Title': 'Google Cybersecurity'}


In [None]:
def transform_data(row):
    title = row['Title'].strip()
    skills = row['Skills'].strip()
    text = f"<s> [INST] Skills related with : {title} [/INST] {skills}</s>"
    return {"text": text}

In [None]:
dataset = dataset.map(transform_data)
dataset

DatasetDict({
    train: Dataset({
        features: ['Title', 'Skills', 'text'],
        num_rows: 623
    })
})

In [None]:
pprint(dataset['train'][0])

{'Skills': ' Network Security, Python Programming, Linux, Cloud Computing, '
           'Algorithms, Audit, Computer Programming, Computer Security '
           'Incident Management, Cryptography, Databases, Leadership and '
           'Management, Network Architecture, Risk Management, SQL',
 'Title': 'Google Cybersecurity',
 'text': '<s> [INST] Skills related with : Google Cybersecurity [/INST] '
         'Network Security, Python Programming, Linux, Cloud Computing, '
         'Algorithms, Audit, Computer Programming, Computer Security Incident '
         'Management, Cryptography, Databases, Leadership and Management, '
         'Network Architecture, Risk Management, SQL</s>'}


In [None]:
## Push to hub
# dataset.push_to_hub("sg247/coursera-course-data", token = os.environ['HF_WRITE_TOKEN'])

## Fine Tuning

In [None]:
dataset = load_dataset("sg247/coursera-course-data", split = 'train')
dataset

Dataset({
    features: ['Title', 'Skills', 'text'],
    num_rows: 623
})

In [None]:
pprint(dataset[17])

{'Skills': ' Machine Learning, Deep Learning, Artificial Neural Networks, '
           'Machine Learning Algorithms, Applied Machine Learning, Python '
           'Programming, Machine Learning Software, Network Model, Algorithms, '
           'Computer Programming, Computer Vision, Network Architecture, '
           'Natural Language Processing, Tensorflow, Human Learning, Data '
           'Analysis, Data Model, Exploratory Data Analysis, Organizational '
           'Development, Process Analysis, Strategy, Computational Logic, '
           'Mathematics, Mathematical Theory & Analysis, Linear Algebra, '
           'Regression, Calculus',
 'Title': 'Deep Learning',
 'text': '<s> [INST] Skills related with : Deep Learning [/INST] Machine '
         'Learning, Deep Learning, Artificial Neural Networks, Machine '
         'Learning Algorithms, Applied Machine Learning, Python Programming, '
         'Machine Learning Software, Network Model, Algorithms, Computer '
         'Programming, 

## bitsandbytes parameters

- **bnb_4bit_compute_dtype** (torch.dtype or str, optional, defaults to torch.float32) — This sets the computational type which might be different than the input time. For example, inputs might be fp32, but computation can be set to bf16 for speedups.

- **load_in_4bit** (bool, optional, defaults to False) — This flag is used to enable 4-bit quantization by replacing the Linear layers with FP4/NF4 layers from bitsandbytes.

- **bnb_4bit_quant_type** (str, optional, defaults to "fp4") — This sets the quantization data type in the bnb.nn.Linear4Bit layers. Options are FP4 and NF4 data types which are specified by fp4 or nf4.

- **bnb_4bit_use_double_quant** (bool, optional, defaults to False) — This flag is used for nested quantization where the quantization constants from the first quantization are quantized again.

In [None]:
## Quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit= True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype= torch.float16,
    bnb_4bit_use_double_quant= True
)

## Load Model & Tokenizer

In [None]:
## Model-Name
model_name = "meta-llama/Llama-2-7b-chat-hf"

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map= {"":0},
    token = os.environ['HF_READ_TOKEN']
)
model.config.use_cache = False
model.config.pretraining_tp = 1

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
# Load LLaMA tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, token=os.environ['HF_READ_TOKEN'], trust_remote_code=True)
# tokenizer.pad_token = '[PAD]'
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

## Inference Before Training

In [None]:
df = dataset.to_pandas()

In [None]:
title = "Deep Learning"
prompt = f"<s>[INST] Skills related with : {title} [/INST]"
related_skills = df[df.Title==title]['Skills'].values[0]
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer,
                max_length=128, do_sample = True, top_k = 10, no_repeat_ngram_size = 2)
result = pipe(prompt)
dash_line = '-'.join('' for x in range(100))

print(f"Input Prompt : {prompt}")
print(dash_line)
print(f"Skills : {related_skills}")
print(dash_line)
print(result[0]['generated_text'][len(prompt):])

Input Prompt : <s>[INST] Skills related with : Deep Learning [/INST]
---------------------------------------------------------------------------------------------------
Skills :  Machine Learning, Deep Learning, Artificial Neural Networks, Machine Learning Algorithms, Applied Machine Learning, Python Programming, Machine Learning Software, Network Model, Algorithms, Computer Programming, Computer Vision, Network Architecture, Natural Language Processing, Tensorflow, Human Learning, Data Analysis, Data Model, Exploratory Data Analysis, Organizational Development, Process Analysis, Strategy, Computational Logic, Mathematics, Mathematical Theory & Analysis, Linear Algebra, Regression, Calculus
---------------------------------------------------------------------------------------------------
  Deep learning is a subset of machine learning that involves the use of artificial neural networks to analyze and interpret complex data. Unterscheidung between deep learning and other machine-learni

## Training Setup


# QLoRA parameters

- task_type: the task to train for (sequence-to-sequence language modeling in this case)
- inference_mode: whether you’re using the model for inference or not
- r: the dimension of the low-rank matrices
- lora_alpha: the scaling factor for the low-rank matrices
- lora_dropout: the dropout probability of the LoRA layers

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():

        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)
print(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRM

In [None]:
## Lora config
lora_config = LoraConfig(
    r= 32,
    lora_alpha=64,
    lora_dropout=0.1,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], #specific to Llama models.
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, lora_config)
print_trainable_parameters(model)

trainable params: 33554432 || all params: 3533967360 || trainable%: 0.9494833591219133


## Training Argument

In [None]:
CHECKPOINTS_DIR = "results"

training_arguments = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    logging_steps=5,
    learning_rate=1e-4,
    fp16=True,
    max_grad_norm=0.3,
    num_train_epochs=1,
    warmup_ratio=0.05,
    save_strategy="steps",
    group_by_length=True,
    output_dir=CHECKPOINTS_DIR,
    report_to="tensorboard",
    save_safetensors=True,
    lr_scheduler_type="cosine",
    seed=42,
)
model.config.use_cache = False  # silence the warnings. re-enable for inference!

## SFT Trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    eval_dataset= None,
    peft_config=lora_config,
    dataset_text_field="text",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_arguments
)

Detected kernel version 4.15.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


# Train model

In [None]:
trainer.train()

Step,Training Loss
5,3.3193
10,3.4
15,1.9147
20,1.6631
25,1.2781
30,1.2234
35,1.1566


TrainOutput(global_step=39, training_loss=1.9057974081773024, metrics={'train_runtime': 135.8295, 'train_samples_per_second': 4.587, 'train_steps_per_second': 0.287, 'total_flos': 1810096299147264.0, 'train_loss': 1.9057974081773024, 'epoch': 1.0})

# Save trained model

In [None]:
# from huggingface_hub import notebook_login
# notebook_login()

In [None]:
## If you will get the access error then uncomment and run above cell
peft_model_path = "./finetuned-chat-llama2"
tokenizer.save_pretrained(peft_model_path)
trainer.model.save_pretrained(peft_model_path)

## Inference

In [None]:
# Ignore warnings
logging.set_verbosity(logging.CRITICAL)

In [None]:
model.config.use_cache = True
model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): LlamaForCausalLM(
      (model): LlamaModel(
        (embed_tokens): Embedding(32000, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k_proj): lora.Linear4bit(
                (base_layer): Linear4bit(i

In [None]:
title = "Deep Learning"
prompt = f"<s>[INST] Skills related with : {title} [/INST]"
related_skills = df[df.Title==title]['Skills'].values[0]
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer,
                max_length=128, do_sample = True, top_k = 10, no_repeat_ngram_size = 2)
result = pipe(prompt)
dash_line = '-'.join('' for x in range(100))

print(f"Input Prompt : {prompt}")
print(dash_line)
print(f"Skills : {related_skills}")
print(dash_line)
print(result[0]['generated_text'][len(prompt):])

Input Prompt : <s>[INST] Skills related with : Deep Learning [/INST]
---------------------------------------------------------------------------------------------------
Skills :  Machine Learning, Deep Learning, Artificial Neural Networks, Machine Learning Algorithms, Applied Machine Learning, Python Programming, Machine Learning Software, Network Model, Algorithms, Computer Programming, Computer Vision, Network Architecture, Natural Language Processing, Tensorflow, Human Learning, Data Analysis, Data Model, Exploratory Data Analysis, Organizational Development, Process Analysis, Strategy, Computational Logic, Mathematics, Mathematical Theory & Analysis, Linear Algebra, Regression, Calculus
---------------------------------------------------------------------------------------------------
 Deep learning, Machine Learning, Neural Networks, Artificial Neuration, Computer Networking, Data Management, Network Architecture, Statistical Learning
 nobody, probability, statistics, data analysi

In [None]:
del model
del tokenizer
import gc
gc.collect()
gc.collect()

0

## Save full model

In [None]:
# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map={"":0},
)
model = PeftModel.from_pretrained(base_model, peft_model_path)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(peft_model_path)


## Inference
title = "Deep Learning"
prompt = f"<s>[INST] Skills related with : {title} [/INST]"
related_skills = df[df.Title==title]['Skills'].values[0]
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer,
                max_length=128, do_sample = True, top_k = 10, no_repeat_ngram_size = 2)
result = pipe(prompt)
dash_line = '-'.join('' for x in range(100))

print(f"Input Prompt : {prompt}")
print(dash_line)
print(f"Skills : {related_skills}")
print(dash_line)
print(result[0]['generated_text'][len(prompt):])

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Input Prompt : <s>[INST] Skills related with : Deep Learning [/INST]
---------------------------------------------------------------------------------------------------
Skills :  Machine Learning, Deep Learning, Artificial Neural Networks, Machine Learning Algorithms, Applied Machine Learning, Python Programming, Machine Learning Software, Network Model, Algorithms, Computer Programming, Computer Vision, Network Architecture, Natural Language Processing, Tensorflow, Human Learning, Data Analysis, Data Model, Exploratory Data Analysis, Organizational Development, Process Analysis, Strategy, Computational Logic, Mathematics, Mathematical Theory & Analysis, Linear Algebra, Regression, Calculus
---------------------------------------------------------------------------------------------------
 Artificial Neural Networks, Deep Neuro Learning, Neuron, Machine Learning Algorithms, Natural Language Processing, Regression, Computer Networking
, Computational Neuroscience, Data Analysis, General