
<div style="color:#ffffff;
          font-size:50px;
          font-style:italic;
          text-align:left;
          font-family: 'Lucida Bright';
          background:#4686C8;">
  	&nbsp; Sentiment using QLoRA Fine-Tuning from scratch
</div>
<br>   
<div style="
          font-size:20px;
          text-align:left;
          font-family: 'Palatino';
          ">
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Project: Sentiment Analysis using QLoRA Fine-Tuning from scratch<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Author: George Barrinuevo<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Date: 07/24/2025<br>
</div>

<br><div style="color:#ffffff;
          font-size:30px;
          font-style:italic;
          text-align:left;
          font-family: 'Lucida Bright';
          background:#4686C8;">
  	      &nbsp; Project Notes
</div>
<div style="
          font-size:16px;
          text-align:left;
          font-family: 'Cambria';">
    
**Here are my thoughts on this project**
- The purpose of this project is to create a Financial Sentiment Analysis using QLoRA, all from scratch . It uses PyTorch and Python code.
<br>

**Technical Details**

<u>Fine-Tuning</u>

  - This is a technique that uses large pretrained models and weights and then further trains them using domain specific datasets. The result is fast training and specialization on the specific domain.

<u>LoRA</u>

  - Low-Rank Adaptation (LoRA) is a Fine-Tuning method that will save time and money compared with the standard way of training the entire model, while retaining about 90-95% of the performance.
  - This method will only train some of the parameters. It does this by taking the weights W and dividing them in to B and A matrices, which has less number of parameters than the original W. As a result, the training time is reduced.
  - The LoraConfig and PeftConfig are used to implement this feature.

<u>PEFT</u>
  
  - Parameter-Efficient Fine-Tuning (PEFT) provides a way to use LoRA as the method to update the matrices. Here are some common parameters:
    * r - The rank of the update matrices
    * target_modules - The models to apply to the LoRA update matrices. An example is the attention blocks.
    * lora_alpha - A scaling factor used when updating the weights.
  - The LoraConfig and PeftConfig are used to implement this feature.

<u>Quantization</u>

  - This is an optimization technique to speed up the training and inference by using lower precision floating point numbers.
  - Instead of using a 32-bit floats, Quantization uses 16 or 8 or 4 bit floats. This has less precision, but uses less memory and speeds up calculations.
  - It still uses the same number of trainable parameters.
  - De-quantization involves taking say the 4-bit float and temporarily converting it to a 8, 16, or 32 bit float used for the calculation part, then stores the results back to 4-bit float. Integers can also be used.
  - The trade-offs are:
    * More Precision - This has more accuracy, but at the expense of using more memory and more calculations. The result is slower training and inferences, but more precision.
    * Less Precision - This is faster and save memory, but at the expense of being less accurate. The result is faster training and inferences, but less precision.
  - The BitsAndBytesConfig package is used to implement this feature.

<u>Model</u>

  - The pretrained model used is 'google/gemma-2-2b'. This model is already trained. But, we will further train it using a dataset so it can specialize on that dataset.
  - This function is used to load this pretrained model: AutoModelForCausalLM.from_pretrained().

<u>Tokenizer</u>

  - The Tokenizer to use is the one associated to the same model.
  - A Tokenizer is used to convert words in to Token IDs and vice versa.
  - This is the function used to initialize the Tokenizer: AutoTokenizer.from_pretrained().

<u>Dataset</u>

  - A Dataset is needed to further train the pretrained model in order to gain a specialization on a specific task.
  - In this dataset, there is text consisting of financial information. It also has the sentiment rating.
  - This dataset can be downloaded here: https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis
  - This script will auto download this dataset file.
  - Here is an example row from this dataset:<br>
        Index: 8<br>
        Text: 'Kone 's net sales rose by some 14 % year-on-year in the first nine months of 2008.'<br>
        Sentiment: positive<br>

<u>Dataset Preprocessing</u>

  - The preprocessing stage involves re-formatting the dataset in to a format that the model expects.
  - Here is a summary of dataset preprocessing this script will do:
    * Download the Kaggle dataset.
    * Filter dataset for sentiment that have one of these: positive, neutral, negative.
    * Shuffle the dataset.
    * Only the first dataset_size of the dataset is used.
    * The dataset will be divided between training and testing datasets.
    * Take the training dataset and divide that in to training and evaluation datasets.
    * Reformat the datasets to create an input text the model expects. See the sub-section 'Input Text Format' below.
    * Since the index values will not be consecuted integers because rows where removed, we need to renumber the index of the target or truth dataset. In this script, that would be the 'y_true' variable.

<u>Input Text Format</u>

  - This is for training the model. The dataset is NOT in a format the model expects. This format includes instructions, the financial text, and the sentiment truth. This is the input text passed in to the model so that it can learn from it. The generate_prompt() is used for this. So, here is the basic format for the input text:<br>
            Analyze the sentiment of the news headline enclosed in square brackets,<br>
            determine if it is positive, neutral, or negative, and return the answer as<br> 
            the corresponding sentiment label "positive" or "neutral" or "negative".<br>
            \[Kone 's net sales rose by some 14 % year-on-year in the first nine months of 2008.] = positive<br>
  - This is for inference/prediction after the Fine-Tuning training is completed. This format is the same as for the training version above except for the sentiment (e.g. positive, neutral, or negative) part is not used. The generate_test_prompt() is used for this.

<u>Auto-Resume & Caching</u>

  - Caching Model & Weights & Dataset
    * The Model, Weights, Tokenizer, and Dataset are downloaded from the internet. These are saved in to a local cache directory. Once cached, re-running the script will just retrieve this data from the cache saving time. But in Kaggle, if the session is stopped, these caches are deleted. It is recommended to save these cache to a persistent storage.
  - Checkpoints & Auto-Resume
    * The purpose of Checkpoints is to save the trainable weights every few number of steps during training so that when the script re-runs, these data is pulled from the cache and can continue training since the last checkpoint.
    * The auto-resume from the last saved checkpoint is used if the script run is cancelled or crashes.

<u>Multiple GPUs</u>

  - The number of batch sizes (batch_size) must be evenly divisible by the number of GPUs (num_gpu).
  - If this script detects multple GPUs, then will use custom_trainer_multple_gpu(), else will use trainer.train().

<u>Trainer</u>

  - The type of training method used is to use the TrainingArguments() class where the training parameters are specified and to use the SFTTrainer() class. This differs from creating a custom training loop.

<u>Misc</u>

  - Progress Bar
    * A progress bar is used (e.g. 35% completed) when downloading data. The tqdm python package is used for this.
  - API Keys
    * The HuggingFace API Key 'HF_TOKEN' needs to be added to this Jupyter notebook. This can be created in https://huggingface.co site.
    * The Weights and Biases API Key 'WANDB_API_KEY' needs to be added to this Jupyter notebook. This can be created in https://wandb.ai/ site.
</div>

In [1]:
# Changable user parameters.

load_model = False         # True: Load the previously saved model and weights.
save_model = True          # True: After model has been trained will ave the model and weights.
num_epochs = 10            # Default: 10
dataset_size = 300         # Default: 300
batch_size = 2             # Default: 2, must divide evenly with the number of GPUs, e.g. dataset_size % num_gpu = 0.
save_every_num_epochs = 5  # Default: 5, save checkpoint every epoch intervals, used only when using multiple GPUs.
                           # For single GPUs, it will save at every epoch.

base_path = '/kaggle'     # This /kaggle path will persist between session restarts.
import os
os.environ["BASE_PATH"] = base_path
output_trained_weights_dir = base_path + "/trained_weights"
saved_model_dir = base_path + '/saved_model'
cache_dir = base_path + '/cache'   

In [2]:
# Show the amount of RAM and disk space BEFORE starting this script.

def show_ram_disk():
    !free -h
    print(f'='*70)
    !df -h $BASE_PATH
    print(f'='*70)
    !ls -al $BASE_PATH

show_ram_disk()

               total        used        free      shared  buff/cache   available
Mem:            31Gi       910Mi        23Gi       2.0Mi       7.4Gi        30Gi
Swap:             0B          0B          0B
Filesystem      Size  Used Avail Use% Mounted on
overlay         7.9T  6.3T  1.7T  80% /
total 20
drwxr-xr-x 5 root root 4096 Jul 25 06:49 .
drwxr-xr-x 1 root root 4096 Jul 25 06:49 ..
drwxr-xr-x 3 root root 4096 Jul 25 06:49 input
drwxr-xr-x 3 root root 4096 Jul 25 06:49 lib
drwxr-xr-x 3 root root 4096 Jul 25 06:49 working


In [3]:
!pip install -q -U bitsandbytes
!pip install -q -U transformers
!pip install -q -U accelerate
!pip install -q -U datasets
!pip install -q -U trl
!pip install -q -U peft
!pip install -q -U huggingface_hub

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.9/72.9 MB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m99.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m0:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m76.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m41.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [4]:
import numpy as np
import pandas as pd
import os
import warnings
from tqdm import tqdm
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import Dataset
from peft import LoraConfig, PeftConfig
from trl import SFTTrainer
from trl import setup_chat_format
from torch.utils.data import DataLoader
from torch.optim import AdamW
from transformers import (AutoModelForCausalLM, 
                          AutoTokenizer, 
                          BitsAndBytesConfig, 
                          TrainingArguments, 
                          pipeline, 
                          logging)
from sklearn.metrics import (accuracy_score, 
                             classification_report, 
                             confusion_matrix)
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import seaborn as sns
from kaggle_secrets import UserSecretsClient
import kagglehub
import logging

2025-07-25 06:52:29.316848: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753426349.539797      36 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753426349.603334      36 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [5]:
# os.environ["TOKENIZERS_PARALLELISM"] = "false"
num_gpu = torch.cuda.device_count()
print(f'num_gpu: {num_gpu}')

if batch_size % num_gpu != 0:
    print(f'The number of GPUs ({num_gpu}) divide evenly with batch_size ({batch_size}), exiting.')
    exit(1)

if num_gpu == 1:
    os.environ["TOKENIZERS_PARALLELISM"] = "false"
else:
    os.environ["TOKENIZERS_PARALLELISM"] = "true"

num_gpu: 1


In [6]:
print(f"pytorch version {torch.__version__}")
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"working on {device}")

warnings.filterwarnings("ignore")

user_secrets = UserSecretsClient()
os.environ["HF_TOKEN"] = user_secrets.get_secret("HF_TOKEN")

pytorch version 2.6.0+cu124
working on cuda:0


In [7]:
def generate_prompt(data_point):
    return f"""
            Analyze the sentiment of the news headline enclosed in square brackets, 
            determine if it is positive, neutral, or negative, and return the answer as 
            the corresponding sentiment label "positive" or "neutral" or "negative".

            [{data_point["text"]}] = {data_point["sentiment"]}
            """.strip()

def generate_test_prompt(data_point):
    return f"""
            Analyze the sentiment of the news headline enclosed in square brackets, 
            determine if it is positive, neutral, or negative, and return the answer as 
            the corresponding sentiment label "positive" or "neutral" or "negative".

            [{data_point["text"]}] = """.strip()

def evaluate(y_true, y_pred):
    mapping = {'positive': 2, 'neutral': 1, 'none':1, 'negative': 0}
    def map_func(x):
        return mapping.get(x, 1)
    
    y_true = np.vectorize(map_func)(y_true)
    y_pred = np.vectorize(map_func)(y_pred)

    class_labels = ['negative', 'neutral', 'positive']

    accuracy = accuracy_score(y_true=y_true, y_pred=y_pred)
    print(f'Accuracy: {accuracy:.3f}')
    
    unique_labels = set(y_true) 
    
    for label in unique_labels:
        label_indices = [i for i in range(len(y_true)) 
                         if y_true[i] == label]
        label_y_true = [y_true[i] for i in label_indices]
        label_y_pred = [y_pred[i] for i in label_indices]
        accuracy = accuracy_score(label_y_true, label_y_pred)
        print(f'Accuracy for label `{class_labels[label]}`: {accuracy:.3f}')

def predict(predictor_model, tokenizer):
    level1 = logging.getLogger("transformers").level
    logging.getLogger("transformers").setLevel(logging.ERROR)
    
    y_pred = []
    
    pipe = pipeline(task="text-generation", 
            model=predictor_model, 
            tokenizer=tokenizer, 
            max_new_tokens = 1, 
            temperature = 0.1,
            use_cache=False,
    )

    display_every_num_secs = 2
    for i in tqdm(range(len(X_test)), mininterval=display_every_num_secs):
        prompt = X_test.iloc[i]["text"]
        result = pipe(prompt)
        answer = result[0]['generated_text'].split("=")[-1]
        if "positive" in answer:
            y_pred.append("positive")
        elif "negative" in answer:
            y_pred.append("negative")
        elif "neutral" in answer:
            y_pred.append("neutral")
        else:
            y_pred.append("none")

    logging.getLogger("transformers").setLevel(level1)
    return y_pred

def test_prediction(index):
    pipe = pipeline(task="text-generation", # text-generation
            model=model, 
            tokenizer=tokenizer, 
            max_new_tokens = 1, 
            temperature = 0.1,
            use_cache=False,
    )
    
    prompt = X_test.iloc[index]["text"]
    result = pipe(prompt)
    pred_answer = result[0]['generated_text'].split("=")[-1]
    true_answer = y_true[index]
    print(f'prompt: {prompt}')
    print(f'result: {result}')
    print(f'pred_answer: {pred_answer}')
    print(f'true_answer: {true_answer}')

def save_checkpoint(model, optimizer, epoch, loss, filepath):
    checkpoint = {
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': loss,
    }
    torch.save(checkpoint, filepath)

def load_checkpoint(filepath, model, optimizer):
    if os.path.exists(filepath):
        checkpoint = torch.load(filepath)
        model.load_state_dict(checkpoint['model_state_dict'], strict=False)
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        last_epoch = checkpoint['epoch']
        loss = checkpoint['loss']
    else:
        last_epoch = 0

    return last_epoch

def custom_trainer_multple_gpu():
    global train_data
    global output_trained_weights_dir

    def tokenize_fn(example):
        tokens = tokenizer(
            example["text"], 
            padding="max_length", 
            truncation=True, 
            max_length=128
        )
        tokens["labels"] = tokens["input_ids"].copy()
        return tokens
        
    
    train_data = train_data.map(tokenize_fn, batched=True)
    train_data.set_format(type="torch", columns=["input_ids", "attention_mask", "labels"])
   
    train_dataloader = DataLoader(
        train_data,
        batch_size=batch_size,
        shuffle=True,
        collate_fn=None,
    )
   
    optimizer = AdamW(model.parameters(), lr=5e-5)
    
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(f'device: {device}')
    _ = model.to(device)
    os.makedirs(output_trained_weights_dir, exist_ok=True)

    checkpoint_file = os.path.join(output_trained_weights_dir, f"model_weights.pth")
    print(f'checkpoint_file: {checkpoint_file}')

    last_epoch = load_checkpoint(checkpoint_file, model, optimizer)       # The first epoch number is '1'.
    print(f'Last epoch trained: {last_epoch}, total epochs to train: {num_epochs}, saving checkpoint every {save_every_num_epochs} epochs.')
    last_epoch += 1
    
    for epoch in range(last_epoch, num_epochs+1):
        epoch_loss = 0.0
        for step, batch in enumerate(tqdm(train_dataloader, desc=f"Epoch {epoch}")):
            batch = {k: v.to(device) for k, v in batch.items()}
    
            outputs = model(
                input_ids=batch["input_ids"],
                attention_mask=batch["attention_mask"],
                labels=batch["labels"]
            )
    
            loss = outputs.loss    
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            epoch_loss += loss.item()

        if epoch % save_every_num_epochs == 0:
            save_checkpoint(model, optimizer, epoch, loss.item(), checkpoint_file)
            print(f"Saved checkpoint at: {checkpoint_file}")
        
        avg_epoch_loss = epoch_loss / len(train_dataloader)
        # print(f"Epoch {epoch} finished - Avg Loss: {avg_epoch_loss:.4f}")

def run_trainer():
    if not load_model:
        level1 = logging.getLogger("transformers").level
        # logging.getLogger("transformers").setLevel(logging.INFO)
        try:
            if num_gpu > 1:
                custom_trainer_multple_gpu()
            else:
                trainer.train(resume_from_checkpoint=True)
        except ValueError:
            if num_gpu == 1:
                trainer.train()   
        except Exception as e:
            print(f"An error occurred: {e}")
            print(f'Exception type: {type(e)}')
            print(f'Exception args: {e.args}')
            
        logging.getLogger("transformers").setLevel(level1)

In [8]:
filename = kagglehub.dataset_download("sbhatti/financial-sentiment-analysis") + '/data.csv'
print("Filename to dataset files:", filename)

df = pd.read_csv(filename, encoding="utf-8", encoding_errors="replace")
df.columns = ["text", "sentiment"]
pd.set_option('display.max_colwidth', 200)
df.head()

Filename to dataset files: /kaggle/input/financial-sentiment-analysis/data.csv


Unnamed: 0,text,sentiment
0,"The GeoSolutions technology will leverage Benefon 's GPS solutions by providing Location Based Search Technology , a Communities Platform , location relevant multimedia content and a new and power...",positive
1,"$ESI on lows, down $1.50 to $2.50 BK a real possibility",negative
2,"For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .",positive
3,"According to the Finnish-Russian Chamber of Commerce , all the major construction companies of Finland are operating in Russia .",neutral
4,"The Swedish buyout firm has sold its remaining 22.4 percent stake , almost eighteen months after taking the company public in Finland .",neutral


In [9]:
sentiment = ["positive", "neutral", "negative"]
temp_df = df[df.sentiment.isin(sentiment)]     # Filter
temp_df = temp_df.sample(frac=1, random_state=42).reset_index(drop=True)    # Shuffle
temp_df = df.head(dataset_size)    # Want only dataset_size rows.

X_train, X_test = train_test_split(
    temp_df, test_size=0.2, random_state=42, shuffle=True
)

X_train, X_eval = train_test_split(
    X_train, test_size=0.25, random_state=42, shuffle=True
)

X_train = X_train.reset_index(drop=True)

eval_idx = [idx for idx in df.index if idx not in list(X_train.index) + list(X_test.index)]    # Select indices NOT in X_train or X_test.
X_eval = df[df.index.isin(eval_idx)]    # Get the dataframe of eval_idx indices.
X_eval = (X_eval
          .groupby('sentiment', group_keys=False)                              # Group the rows be values under the 'sentiment' column.
          .apply(lambda x: x.sample(n=50, random_state=10, replace=True)))     # Randomly sample 50 rows from each group. If < 50, then duplicate the rows til you have 50 per group.
X_train = X_train.reset_index(drop=True)    # Renumber the index column in ascending order.

In [10]:
model_name = 'google/gemma-2-2b'

compute_dtype = getattr(torch, "float16")

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=compute_dtype,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map=device,
    torch_dtype=compute_dtype,
    quantization_config=bnb_config, 
    attn_implementation='eager',
    cache_dir=cache_dir,
)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(model_name, max_seq_length=512)
tokenizer.pad_token_id = tokenizer.eos_token_id

config.json:   0%|          | 0.00/818 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/481M [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/168 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/46.4k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/4.24M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/636 [00:00<?, ?B/s]

In [11]:
X_train = pd.DataFrame(X_train.apply(generate_prompt, axis=1), columns=["text"])
X_eval = pd.DataFrame(X_eval.apply(generate_prompt, axis=1), columns=["text"])

y_true = X_test.sentiment
y_true = y_true.reset_index(drop=True)    # Renumber the index column.
X_test = pd.DataFrame(X_test.apply(generate_test_prompt, axis=1), columns=["text"])

train_data = Dataset.from_pandas(X_train)
eval_data = Dataset.from_pandas(X_eval)

In [12]:
# Predict before the Fine-Tuning.
y_pred = predict(model, tokenizer)

100%|██████████| 60/60 [00:07<00:00,  7.57it/s]


In [13]:
# Evaluate before the Fine-Tuning
evaluate(y_true, y_pred)

Accuracy: 0.417
Accuracy for label `negative`: 0.250
Accuracy for label `neutral`: 0.618
Accuracy for label `positive`: 0.071


In [14]:
peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",],   # For 'google/gemma-2-2b' model.
)

training_arguments = TrainingArguments(
    output_dir=output_trained_weights_dir,    # Where checkpoints and weights will be saved.
    num_train_epochs=num_epochs,     
    per_device_train_batch_size=batch_size,   # Batch-size. The SFTTrainer() will build a dataloader internally.
    gradient_accumulation_steps=8,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit",
    logging_steps=25, 
    learning_rate=1e-4,
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=False,
    lr_scheduler_type="cosine",
    report_to="tensorboard",
    eval_strategy="epoch",                   # For checkpoint, must match save_strategy=
    save_strategy="epoch",                   # Save checkpoints at each epoch.
    save_total_limit=3,                      # Keep only last 3 checkpoints.
    # save_steps=20,                           # Save checkpoint after every save_steps= batch.
    load_best_model_at_end=True,             # Optional: loads best checkpoint after training
    resume_from_checkpoint=True,             # This triggers checkpoint auto-resume.
)

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    eval_dataset=eval_data,
    peft_config=peft_config,
)

Adding EOS to train dataset:   0%|          | 0/180 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/180 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/180 [00:00<?, ? examples/s]

Adding EOS to eval dataset:   0%|          | 0/150 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/150 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/150 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [15]:
print(f'load_model: {load_model}')
print(f'save_model: {save_model}')

load_model: False
save_model: True


In [16]:
if load_model:
    model.load_state_dict(torch.load(saved_model_dir))
    model.eval()

In [17]:
%%time

print(f'num_gpu: {num_gpu}')
run_trainer()

num_gpu: 1


Epoch,Training Loss,Validation Loss
1,No log,1.47055
2,No log,1.211083
3,1.840900,1.031574
4,1.840900,1.032261
5,0.926200,1.080405
6,0.926200,1.123204
7,0.694800,1.230017
8,0.694800,1.286817
9,0.502000,1.303989
10,0.502000,1.308968


config.json:   0%|          | 0.00/818 [00:00<?, ?B/s]

CPU times: user 12min 42s, sys: 2min 14s, total: 14min 57s
Wall time: 14min 58s


In [18]:
if save_model:
    !pwd
    trainer.save_model(saved_model_dir)
    tokenizer.save_pretrained(saved_model_dir)

/kaggle/working


In [19]:
# Predict after the Fine-Tuning.
y_trained_pred = predict(model, tokenizer)

100%|██████████| 60/60 [00:08<00:00,  6.76it/s]


In [20]:
# Evaluate after the Fine-Tuning
evaluate(y_true, y_trained_pred)

Accuracy: 0.717
Accuracy for label `negative`: 0.667
Accuracy for label `neutral`: 0.735
Accuracy for label `positive`: 0.714


In [21]:
test_prediction(42)

Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['cache_implementation']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The following generation flags are not valid and may be ignored: ['cache_implementation']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


prompt: Analyze the sentiment of the news headline enclosed in square brackets, 
            determine if it is positive, neutral, or negative, and return the answer as 
            the corresponding sentiment label "positive" or "neutral" or "negative".

            [$LULU #Outlook #Q1 2013 #lost revenue in the range of $12 million to $17 million -not good] =
result: [{'generated_text': 'Analyze the sentiment of the news headline enclosed in square brackets, \n            determine if it is positive, neutral, or negative, and return the answer as \n            the corresponding sentiment label "positive" or "neutral" or "negative".\n\n            [$LULU #Outlook #Q1 2013 #lost revenue in the range of $12 million to $17 million -not good] = negative'}]
pred_answer:  negative
true_answer: negative


In [22]:
show_ram_disk()

               total        used        free      shared  buff/cache   available
Mem:            31Gi       4.2Gi       413Mi        17Mi        26Gi        26Gi
Swap:             0B          0B          0B
Filesystem      Size  Used Avail Use% Mounted on
overlay         7.9T  6.3T  1.6T  80% /
total 32
drwxr-xr-x 8 root root 4096 Jul 25 07:09 .
drwxr-xr-x 1 root root 4096 Jul 25 06:49 ..
drwxr-xr-x 4 root root 4096 Jul 25 06:52 cache
drwxr-xr-x 3 root root 4096 Jul 25 06:49 input
drwxr-xr-x 3 root root 4096 Jul 25 06:49 lib
drwxr-xr-x 2 root root 4096 Jul 25 07:09 saved_model
drwxr-xr-x 6 root root 4096 Jul 25 07:09 trained_weights
drwxr-xr-x 3 root root 4096 Jul 25 06:49 working


### 