# **Reload test for: LLama 3.1-8B (base version)**

## **2 Labels only, Scale (Logits/Probability)**

# **Note:**

* This Notebook runs fine-tuning of a Llama model using the **base version** of Llama 3.1 (as apparently base versions are recommended for fine-tuning).
* Also, it uses **`AutoModelForSequenceClassification`**. According to GPT:
    * "The AutoModelForSequenceClassification class in Hugging Face is specifically tailored for classification tasks. When using this model, the underlying LLaMA weights are fine-tuned with a small classifier layer on top (e.g., a linear layer with softmax for multiple classes).
    Inputs to the model are tokenized text, and the model outputs probabilities for the predefined classes.
The classifier does not "understand" instructions in the same way a generative LLM like GPT does. Instead, it learns from examples during fine-tuning, using the training data to associate input patterns with output labels."
    * Instructions are nevertheless handed over by simply concatenating them: `System Prompt: 'Instruction'; Comment: 'the comment'; Reply: 'the reply'`
 
See the other notebook for fine-tuning of the **instruction based model** using **`AutoModelForCausalLM`** 

# **Llama**


### Big Picture Overview of Parameter Efficient Fine Tuning Methods like LoRA and QLoRA Fine Tuning for Sequence Classification

**The Essence of Fine-tuning**
- LLMs are pre-trained on vast amounts of data for broad language understanding.
- Fine-tuning is crucial for specializing in specific domains or tasks, involving adjustments with smaller, relevant datasets.

**Model Fine-tuning with PEFT: Exploring LoRA and QLoRA**
- Traditional fine-tuning is resource-intensive; PEFT (Parameter Efficient Fine-tuning) makes the process faster and less demanding.
- Focus on two PEFT methods: LoRA and QLoRA.

**The Power of PEFT**
- PEFT modifies only a subset of the LLM's parameters, enhancing speed and reducing memory demands, making it suitable for less powerful devices.

**LoRA: Efficiency through Adapters**
- **Low-Rank Adaptation (LoRA):** Injects small trainable adapters into the pre-trained model.
- **Equation:** For a weight matrix $W$, LoRA approximates $W = W_0 + BA$, where $W_0$ is the original weight matrix, and $BA$ represents the low-rank modification through trainable matrices $B$ and $A$.
- Adapters learn task nuances while keeping the majority of the LLM unchanged, minimizing overhead.

**QLoRA: Compression and Speed**
- **Quantized LoRA (QLoRA):** Extends LoRA by quantizing the model’s weights, further reducing size and enhancing speed.
- **Innovations in QLoRA:**
  1. **4-bit Quantization:** Uses a 4-bit data type, NormalFloat (NF4), for optimal weight quantization, drastically reducing memory usage.
  2. **Low-Rank Adapters:** Fine-tuned with 16-bit precision to effectively capture task-specific nuances.
  3. **Double Quantization:** Reduces quantization constants from 32-bit to 8-bit, saving additional memory without accuracy loss.
  4. **Paged Optimizers:** Manages memory efficiently during training, optimizing for large tasks.

**Why PEFT Matters**
- **Rapid Learning:** Speeds up model adaptation.
- **Smaller Footprint:** Eases deployment with reduced model size.
- **Edge-Friendly:** Fits better on devices with limited resources, enhancing accessibility.

**Conclusion**
- PEFT methods like LoRA and QLoRA revolutionize LLM fine-tuning by focusing on efficiency, facilitating faster adaptability, smaller models, and broader device compatibility.

***

### Fine-tuning for Sentiment Analysis Classification:


#### 1. Text Generation with Sentiment Label as part of text
- **Approach**: Train the model to generate text that naturally appends the sentiment label at the end.
- **Input**: "TSLA slashes model Y prices ======"
- **Output**: "TSLA slashes model Y prices ====== Bearish"
- **Use Case**: This method is useful for applications requiring continuous text output that includes embedded sentiment analysis, such as interactive chatbots or automated content creation tools.


#### 2. Sequence Classification Head
- **Approach**: Add a sequence classification head (linear layer) on top of the LLaMa Model transformer. This setup is similar to GPT-2 and focuses on classifying the sentiment based on the last relevant token in the sequence.
    - **Token Positioning**:
        - **With pad_token_id**: The model identifies and ignores padding tokens, using the last non-padding token for classification.
        - **Without pad_token_id**: It defaults to the last token in each sequence.
        - **inputs_embeds**: If embeddings are directly passed (without input_ids), the model cannot identify padding tokens and takes the last embedding in each sequence as the input for classification.
- **Input**: Specific sentences (e.g., "TSLA slashes Model Y prices").
- **Output**: Direct sentiment classification (e.g., "Bearish").
- **Training Objective**: Minimize cross-entropy loss between the predicted and the actual sentiment labels.

https://huggingface.co/docs/transformers/main/en/model_doc/llama

### Peft Configs
* Bits and bytes config for quantization
* Lora config for lora

### Going to use Hugginface Transformers trainer class: Main componenents
* Hugging face dataset (for train + eval)
* Data collater
* Compute Metrics
* Class weights since we use custom trainer and also custom weighted loss..
* trainingArgs: like # epochs, learning rate, weight decay etc..




In [1]:
# install packages
!pip install -U bitsandbytes
!pip install -U transformers
!pip install -U accelerate
!pip install -U peft
!pip install -U trl
!pip install pyarrow==18.1.0
!pip install evaluate
!pip install --upgrade wandb
!pip install adapter-transformers

Collecting transformers
  Using cached transformers-4.51.1-py3-none-any.whl.metadata (38 kB)
Using cached transformers-4.51.1-py3-none-any.whl (10.4 MB)
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.47.1
    Uninstalling transformers-4.47.1:
      Successfully uninstalled transformers-4.47.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
adapters 1.1.0 requires transformers~=4.47.1, but you have transformers 4.51.1 which is incompatible.[0m[31m
[0mSuccessfully installed transformers-4.51.1
Collecting transformers~=4.47.1 (from adapters->adapter-transformers)
  Using cached transformers-4.47.1-py3-none-any.whl.metadata (44 kB)
Using cached transformers-4.47.1-py3-none-any.whl (10.1 MB)
Installing collected packages: transformers
  Attempting uninstall: transformers
    Fo

In [2]:
# import packages

import numpy as np
import pandas as pd
import os
import random
import evaluate
import functools # ??
from tqdm import tqdm
import bitsandbytes as bnb
import wandb

import torch
import torch.nn as nn
import torch.nn.functional as F

from datasets import Dataset, DatasetDict
from peft import LoraConfig, PeftConfig, prepare_model_for_kbit_training, get_peft_model

from trl import SFTTrainer
from trl import setup_chat_format

import transformers
from transformers import (AutoModelForCausalLM,
                          AutoModelForSequenceClassification,
                        AutoTokenizer,
                        AutoModel,
                        AutoConfig,
                          BitsAndBytesConfig, 
                          TrainingArguments, 
                            Trainer,
                            DataCollatorWithPadding,
                          pipeline, 
                          logging)

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (accuracy_score, 
                             classification_report, 
                             confusion_matrix,
                            f1_score, balanced_accuracy_score)
from peft import PeftModel

In [3]:

torch.manual_seed(42)
np.random.seed(42)
random.seed(42)

In [4]:
import torch
torch.cuda.empty_cache()
torch.cuda.is_available()
#torch.cuda.device_count()

True

## **Authenticate for Hugging Face**

In [5]:
# Hugging face access

from huggingface_hub import login
with open("../../../../login/hf_key.txt", 'r') as f: 
    HF_TOKEN = str(f.read())
    
login(token = HF_TOKEN)

In [6]:
# wandb

with open("../../../../login/wandb.txt", 'r') as f: 
    WB_TOKEN = str(f.read())

wandb.login(key=WB_TOKEN)


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/jovyan/.netrc
[34m[1mwandb[0m: Currently logged in as: [33melena-solar[0m ([33melena-solar-university-of-konstanz[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

## **Data**

In [7]:
# loading the data
import pandas as pd
data = pd.read_csv("../../../../data/labeled_data.csv")
data = data[["label", "body_parent", "body_child", "msg_id_parent", "msg_id_child", "subreddit", "datetime", "exact_time"]].sort_values(by = "exact_time").reset_index(drop = True)

# keep integer labels
data['target'] = data['label']

# for readability, recode labels
int_to_label = {2: "agree", 1 : "neutral", 0 : "disagree"}
data.replace({"label": int_to_label}, inplace = True)

label_dict = {0 : 'no_disagreement', 1: 'disagree'}

data

Unnamed: 0,label,body_parent,body_child,msg_id_parent,msg_id_child,subreddit,datetime,exact_time,target
0,neutral,"I live in rural Saskatchewan, Canada. We have ...",I'm in NE USA we've had 3 in two years...all e...,cnddov1,cndj2gv,climate,03/01/2015 23:18,1420327135,1
1,neutral,"I live in rural Saskatchewan, Canada. We have ...",One hundred year flood just means a one in one...,cnddov1,cndkpy7,climate,04/01/2015 00:10,1420330231,1
2,neutral,Convince her of what? That it's happening or t...,That anthropocentric climate change is actuall...,cndnlrd,cndnsxt,climate,04/01/2015 01:45,1420335952,1
3,disagree,I think this prediction is about as valid as s...,It's January. Literally no one said it would b...,cndl5x4,cndybsy,climate,04/01/2015 08:01,1420358465,0
4,disagree,"Mann hasn't *been* honest in decades, so I'm c...",There have been a dozen re-constructions of Ma...,cne462t,cne89ej,climate,04/01/2015 17:45,1420393544,0
...,...,...,...,...,...,...,...,...,...
42889,neutral,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,gyo197v,gyotff1,Republican,19/05/2021 12:36,1621427788,1
42890,agree,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",gynfsu4,gyp3u39,democrats,19/05/2021 13:56,1621432578,2
42891,agree,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",gyn6nzm,gyp5vzw,democrats,19/05/2021 14:11,1621433471,2
42892,agree,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",gyp71o7,gyp7en6,BlackLivesMatter,19/05/2021 14:21,1621434116,2


In [8]:
# ensure order by time --> no leackage of info

data = data.sort_values(by = "exact_time", ascending = True).reset_index(drop = True)
data

Unnamed: 0,label,body_parent,body_child,msg_id_parent,msg_id_child,subreddit,datetime,exact_time,target
0,neutral,"I live in rural Saskatchewan, Canada. We have ...",I'm in NE USA we've had 3 in two years...all e...,cnddov1,cndj2gv,climate,03/01/2015 23:18,1420327135,1
1,neutral,"I live in rural Saskatchewan, Canada. We have ...",One hundred year flood just means a one in one...,cnddov1,cndkpy7,climate,04/01/2015 00:10,1420330231,1
2,neutral,Convince her of what? That it's happening or t...,That anthropocentric climate change is actuall...,cndnlrd,cndnsxt,climate,04/01/2015 01:45,1420335952,1
3,disagree,I think this prediction is about as valid as s...,It's January. Literally no one said it would b...,cndl5x4,cndybsy,climate,04/01/2015 08:01,1420358465,0
4,disagree,"Mann hasn't *been* honest in decades, so I'm c...",There have been a dozen re-constructions of Ma...,cne462t,cne89ej,climate,04/01/2015 17:45,1420393544,0
...,...,...,...,...,...,...,...,...,...
42889,neutral,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,gyo197v,gyotff1,Republican,19/05/2021 12:36,1621427788,1
42890,agree,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",gynfsu4,gyp3u39,democrats,19/05/2021 13:56,1621432578,2
42891,agree,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",gyn6nzm,gyp5vzw,democrats,19/05/2021 14:11,1621433471,2
42892,agree,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",gyp71o7,gyp7en6,BlackLivesMatter,19/05/2021 14:21,1621434116,2


In [9]:
# adapt true labels
labels_2 = []
target_new = []

for idx, row in data.iterrows():
    if row['label'] in ['neutral', 'agree']:
        labels_2.append('no_disagreement')
        target_new.append(0)
    else:
        labels_2.append('disagree')
        target_new.append(1)
        
data['label_2'] = labels_2
data['target'] = target_new
data

Unnamed: 0,label,body_parent,body_child,msg_id_parent,msg_id_child,subreddit,datetime,exact_time,target,label_2
0,neutral,"I live in rural Saskatchewan, Canada. We have ...",I'm in NE USA we've had 3 in two years...all e...,cnddov1,cndj2gv,climate,03/01/2015 23:18,1420327135,0,no_disagreement
1,neutral,"I live in rural Saskatchewan, Canada. We have ...",One hundred year flood just means a one in one...,cnddov1,cndkpy7,climate,04/01/2015 00:10,1420330231,0,no_disagreement
2,neutral,Convince her of what? That it's happening or t...,That anthropocentric climate change is actuall...,cndnlrd,cndnsxt,climate,04/01/2015 01:45,1420335952,0,no_disagreement
3,disagree,I think this prediction is about as valid as s...,It's January. Literally no one said it would b...,cndl5x4,cndybsy,climate,04/01/2015 08:01,1420358465,1,disagree
4,disagree,"Mann hasn't *been* honest in decades, so I'm c...",There have been a dozen re-constructions of Ma...,cne462t,cne89ej,climate,04/01/2015 17:45,1420393544,1,disagree
...,...,...,...,...,...,...,...,...,...,...
42889,neutral,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,gyo197v,gyotff1,Republican,19/05/2021 12:36,1621427788,0,no_disagreement
42890,agree,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",gynfsu4,gyp3u39,democrats,19/05/2021 13:56,1621432578,0,no_disagreement
42891,agree,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",gyn6nzm,gyp5vzw,democrats,19/05/2021 14:11,1621433471,0,no_disagreement
42892,agree,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",gyp71o7,gyp7en6,BlackLivesMatter,19/05/2021 14:21,1621434116,0,no_disagreement


In [10]:
# make text

def create_training_data(data):

    result = []

    for idx, row in data.iterrows():
        #system_prompt = """You are a classification Chatbot. Given a comment and a reply, you classify whether the reply disagrees with the comment, or not. You only reply with either "disagree" or "no_disagreement" and nothing else."""
        comment = row["body_parent"]
        reply = row["body_child"]
        label = row["label_2"]
        target = row["target"]
        result.append({'comment' : comment, 'reply': reply, 'label' : label, 'target' : target})
    
    return result

# save data
df = pd.DataFrame(create_training_data(data))
df

Unnamed: 0,comment,reply,label,target
0,"I live in rural Saskatchewan, Canada. We have ...",I'm in NE USA we've had 3 in two years...all e...,no_disagreement,0
1,"I live in rural Saskatchewan, Canada. We have ...",One hundred year flood just means a one in one...,no_disagreement,0
2,Convince her of what? That it's happening or t...,That anthropocentric climate change is actuall...,no_disagreement,0
3,I think this prediction is about as valid as s...,It's January. Literally no one said it would b...,disagree,1
4,"Mann hasn't *been* honest in decades, so I'm c...",There have been a dozen re-constructions of Ma...,disagree,1
...,...,...,...,...
42889,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,no_disagreement,0
42890,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",no_disagreement,0
42891,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",no_disagreement,0
42892,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",no_disagreement,0


In [11]:
df['prompt'] = None

def make_prompt(row):

    prompt = "Comment: " + row["comment"] + "; Reply: " + row["reply"]

    return prompt



df['prompt'] = df.apply(lambda row: make_prompt(row), axis = 1)
df

Unnamed: 0,comment,reply,label,target,prompt
0,"I live in rural Saskatchewan, Canada. We have ...",I'm in NE USA we've had 3 in two years...all e...,no_disagreement,0,"Comment: I live in rural Saskatchewan, Canada...."
1,"I live in rural Saskatchewan, Canada. We have ...",One hundred year flood just means a one in one...,no_disagreement,0,"Comment: I live in rural Saskatchewan, Canada...."
2,Convince her of what? That it's happening or t...,That anthropocentric climate change is actuall...,no_disagreement,0,Comment: Convince her of what? That it's happe...
3,I think this prediction is about as valid as s...,It's January. Literally no one said it would b...,disagree,1,Comment: I think this prediction is about as v...
4,"Mann hasn't *been* honest in decades, so I'm c...",There have been a dozen re-constructions of Ma...,disagree,1,"Comment: Mann hasn't *been* honest in decades,..."
...,...,...,...,...,...
42889,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,no_disagreement,0,Comment: Not trying to spark an argument but a...
42890,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",no_disagreement,0,Comment: Y'all saw Guilianis hail Mary right? ...
42891,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",no_disagreement,0,Comment: >Why don't I see ads holding Republic...
42892,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",no_disagreement,0,Comment: How about ... no? This is strange. Co...


### Train/Test Split

Make train/val/test split by time order!

In [12]:
# Split the DataFrame
train_size = 0.8
eval_size = 0.1

# Determine splitting indexes (ordered by time)
train_end = int(train_size * len(df))
eval_end = train_end + int(eval_size * len(df))

# Split the data
X_train = df[:train_end]
X_eval = df[train_end:eval_end]
X_test = df[eval_end:]

### Convert from Pandas DataFrame to Hugging Face Dataset
* Also let's shuffle the training set.
* We put the components train,val,test into a DatasetDict so we can access them later with HF trainer.
* Later we will add a tokenized dataset

In [13]:
X_train_dataset = Dataset.from_pandas(X_train.drop('label', axis = 1))
X_eval_dataset = Dataset.from_pandas(X_eval.drop('label', axis = 1))
X_test_dataset = Dataset.from_pandas(X_test.drop('label', axis = 1))

X_test_dataset

Dataset({
    features: ['comment', 'reply', 'target', 'prompt'],
    num_rows: 4290
})

Shuffle training data --> apparently this helps with performance

In [14]:
X_train_dataset_shuffle = X_train_dataset.shuffle(seed = 42)

In [15]:
dataset = DatasetDict({
    'train' : X_train_dataset_shuffle,
    'val' : X_eval_dataset,
    'test' : X_test_dataset
})
dataset

DatasetDict({
    train: Dataset({
        features: ['comment', 'reply', 'target', 'prompt'],
        num_rows: 34315
    })
    val: Dataset({
        features: ['comment', 'reply', 'target', 'prompt'],
        num_rows: 4289
    })
    test: Dataset({
        features: ['comment', 'reply', 'target', 'prompt'],
        num_rows: 4290
    })
})

Check distributions

In [16]:
X_train.target.value_counts(normalize = True)

target
0    0.59933
1    0.40067
Name: proportion, dtype: float64

### Class Weights

* Since our classes are not balanced let's calculate class weights based on inverse value counts
* Convert to pytorch tensor since we will need it

In [17]:
# invert the weights
class_weights = (1/X_train.target.value_counts(normalize = True).sort_index()).to_list()

# make a tensor
class_weights = torch.tensor(class_weights)

# make them sum to one
class_weights = class_weights/class_weights.sum()
class_weights

tensor([0.4007, 0.5993])


## **Load the Model**

Apparently, meta recommends the base version of the model for finetuning [source](https://www.youtube.com/watch?v=YJNbgusTSF0)

* load model with 4bit quantization (as specified in bits and bytes) and prepare model for peft training

In [18]:
model_name = "meta-llama/Llama-3.1-8B" 


quantization_config = BitsAndBytesConfig(
    load_in_4bit = True, # enable 4 bit quantization
    bnb_4bit_quant_type = 'nf4', # information theoretically optimal dtype for normally distributed weights
    bnb_4bit_use_double_quant = True, # quantize quantized weights
    bnb_4bit_compute_dtype = torch.bfloat16 # optimized fp format for ML
)


model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    #config =  model_config, # Ensure this matches the number of labels you used during training
    quantization_config = quantization_config
)



`low_cpu_mem_usage` was None, now default to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


## Reimport

1. base mode
2. then fin etuned adapters
3. merge

In [21]:


adapter_path = "final_Llama_3.1_8B_saved_model_2labels_scale"

# Load the LoRA adapter on top of it
model = PeftModel.from_pretrained(model, adapter_path)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("final_Llama_3.1_8B_saved_model_2labels_scale", add_prefix_space=True)

tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id  
model.config.use_cache = False 
model.config.pretraining_pt = 1



NameError: name 'model' is not defined

In [40]:
# move model to gpu
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

model.to(device) 

PeftModelForSequenceClassification(
  (base_model): LoraModel(
    (model): LlamaForSequenceClassification(
      (model): LlamaModel(
        (embed_tokens): Embedding(128256, 4096)
        (layers): ModuleList(
          (0-31): 32 x LlamaDecoderLayer(
            (self_attn): LlamaSdpaAttention(
              (q_proj): lora.Linear4bit(
                (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=16, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=16, out_features=4096, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
                (lora_magnitude_vector): ModuleDict()
           

## Tokenize Dataset

In [53]:
MAX_LEN = 512
col_to_delete = ['comment', 'reply', 'prompt']

def llama_preprocessing_function(examples):
    return tokenizer(examples['prompt'], truncation=True, max_length=MAX_LEN)

tokenized_datasets = dataset.map(llama_preprocessing_function, batched=True, remove_columns=col_to_delete)
tokenized_datasets = tokenized_datasets.rename_column("target", "label")
tokenized_datasets.set_format("torch")

Map:   0%|          | 0/34315 [00:00<?, ? examples/s]

Map:   0%|          | 0/4289 [00:00<?, ? examples/s]

Map:   0%|          | 0/4290 [00:00<?, ? examples/s]

In [54]:
tokenized_datasets

DatasetDict({
    train: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 34315
    })
    val: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 4289
    })
    test: Dataset({
        features: ['label', 'input_ids', 'attention_mask'],
        num_rows: 4290
    })
})

## **Evaluation**

In [25]:
X_test

Unnamed: 0,comment,reply,label,target,prompt
38604,It's so nice having a FLOTUS who's facial expr...,Melanoma's squinty cat face always looked to m...,no_disagreement,0,Comment: It's so nice having a FLOTUS who's fa...
38605,Because Mitch McConnell indicated he's voting ...,I think it's worth it because the more we air ...,disagree,1,Comment: Because Mitch McConnell indicated he'...
38606,How about some stimulus checks and a decent st...,"You get this was an executive action, not legi...",disagree,1,Comment: How about some stimulus checks and a ...
38607,Satire feels appropriate. I'd like one dose of...,Are you saying they didn't know or understand ...,disagree,1,Comment: Satire feels appropriate. I'd like on...
38608,I actually didn't want to upload these particu...,To be fair they are just reporting what Brexit...,no_disagreement,0,Comment: I actually didn't want to upload thes...
...,...,...,...,...,...
42889,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,no_disagreement,0,Comment: Not trying to spark an argument but a...
42890,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",no_disagreement,0,Comment: Y'all saw Guilianis hail Mary right? ...
42891,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",no_disagreement,0,Comment: >Why don't I see ads holding Republic...
42892,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",no_disagreement,0,Comment: How about ... no? This is strange. Co...


In [41]:
def make_predictions(model,df_test):
    
    model.eval()
    # Convert summaries to a list
    sentences = df_test.prompt.tolist()
    
    # Define the batch size
    batch_size = 32  # You can adjust this based on your system's memory capacity
    
    # Initialize an empty list to store the model outputs
    all_outputs = []
    
    # Process the sentences in batches
    for i in tqdm(range(0, len(sentences), batch_size)):
      # Get the batch of sentences
      batch_sentences = sentences[i:i + batch_size]
    
      # Tokenize the batch
      inputs = tokenizer(batch_sentences, return_tensors="pt", padding=True, truncation=True, max_length=512)
    
      # Move tensors to the device where the model is (e.g., GPU or CPU)
      inputs = {k: v.to('cuda' if torch.cuda.is_available() else 'cpu') for k, v in inputs.items()}
    
      # Perform inference and store the logits
      with torch.no_grad():
          outputs = model(**inputs)
          all_outputs.append(outputs['logits'])
          
          
    final_outputs = torch.cat(all_outputs, dim=0)
    probabilities = F.softmax(final_outputs, dim=1)
    print(probabilities) 
    predicted_labels = probabilities.argmax(dim=1)  
    certainty_scores = probabilities.max(dim=1).values 
    
    df_test['predictions_label_ft'] = predicted_labels.cpu().numpy()
    df_test['predictions_score_ft'] = certainty_scores.cpu().numpy()
    df_test['predictions_ft'] = df_test['predictions_label_ft'].apply(lambda l:label_dict[l])
    return df_test 
  


make_predictions(model, X_test)

100%|██████████| 135/135 [00:39<00:00,  3.39it/s]

tensor([[0.9878, 0.0123],
        [0.6069, 0.3931],
        [0.3408, 0.6592],
        ...,
        [0.9341, 0.0657],
        [0.9878, 0.0120],
        [0.9521, 0.0478]], device='cuda:0', dtype=torch.float16)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_test['predictions_label_ft'] = predicted_labels.cpu().numpy()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_test['predictions_score_ft'] = certainty_scores.cpu().numpy()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_test['predictions_ft'] = df_test['predictions_label_ft'].apply(lambda l

Unnamed: 0,comment,reply,label,target,prompt,predictions_label_ft,predictions_score_ft,predictions_ft
38604,It's so nice having a FLOTUS who's facial expr...,Melanoma's squinty cat face always looked to m...,no_disagreement,0,Comment: It's so nice having a FLOTUS who's fa...,0,0.987793,no_disagreement
38605,Because Mitch McConnell indicated he's voting ...,I think it's worth it because the more we air ...,disagree,1,Comment: Because Mitch McConnell indicated he'...,0,0.606934,no_disagreement
38606,How about some stimulus checks and a decent st...,"You get this was an executive action, not legi...",disagree,1,Comment: How about some stimulus checks and a ...,1,0.659180,disagree
38607,Satire feels appropriate. I'd like one dose of...,Are you saying they didn't know or understand ...,disagree,1,Comment: Satire feels appropriate. I'd like on...,0,0.505859,no_disagreement
38608,I actually didn't want to upload these particu...,To be fair they are just reporting what Brexit...,no_disagreement,0,Comment: I actually didn't want to upload thes...,0,0.607422,no_disagreement
...,...,...,...,...,...,...,...,...
42889,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,no_disagreement,0,Comment: Not trying to spark an argument but a...,1,0.807129,disagree
42890,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",no_disagreement,0,Comment: Y'all saw Guilianis hail Mary right? ...,0,0.979492,no_disagreement
42891,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",no_disagreement,0,Comment: >Why don't I see ads holding Republic...,0,0.934082,no_disagreement
42892,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",no_disagreement,0,Comment: How about ... no? This is strange. Co...,0,0.987793,no_disagreement


In [54]:
#X_test.to_csv("output/Llama_3.1_8B_ft_X_test_2labels_scale.csv", index = False)

In [40]:
#X_test = pd.read_csv("output/Llama_3.1_8B_ft_X_test_2labels_scale.csv")

Unnamed: 0,system_prompt,comment,reply,label,target,prompt,predictions_label_initial,predictions_score_initial,predictions_initial,predictions_label_ft,predictions_score_ft,predictions_ft
0,You are a classification Chatbot. Given a comm...,It's so nice having a FLOTUS who's facial expr...,Melanoma's squinty cat face always looked to m...,no_disagreement,0,System Prompt: You are a classification Chatbo...,0,0.960898,no_disagreement,0,0.983496,no_disagreement
1,You are a classification Chatbot. Given a comm...,Because Mitch McConnell indicated he's voting ...,I think it's worth it because the more we air ...,disagree,1,System Prompt: You are a classification Chatbo...,0,0.942642,no_disagreement,0,0.507598,no_disagreement
2,You are a classification Chatbot. Given a comm...,How about some stimulus checks and a decent st...,"You get this was an executive action, not legi...",disagree,1,System Prompt: You are a classification Chatbo...,0,0.960131,no_disagreement,1,0.785662,disagree
3,You are a classification Chatbot. Given a comm...,Satire feels appropriate. I'd like one dose of...,Are you saying they didn't know or understand ...,disagree,1,System Prompt: You are a classification Chatbo...,0,0.967641,no_disagreement,1,0.769629,disagree
4,You are a classification Chatbot. Given a comm...,I actually didn't want to upload these particu...,To be fair they are just reporting what Brexit...,no_disagreement,0,System Prompt: You are a classification Chatbo...,1,0.828229,disagree,0,0.824243,no_disagreement
...,...,...,...,...,...,...,...,...,...,...,...,...
4285,You are a classification Chatbot. Given a comm...,Not trying to spark an argument but a legitima...,Keeping in mind that the Palestinians killed m...,no_disagreement,0,System Prompt: You are a classification Chatbo...,1,0.951807,disagree,1,0.692977,disagree
4286,You are a classification Chatbot. Given a comm...,Y'all saw Guilianis hail Mary right? Get his s...,"Same I want these assholes in jail, full stop....",no_disagreement,0,System Prompt: You are a classification Chatbo...,0,0.617278,no_disagreement,0,0.989079,no_disagreement
4287,You are a classification Chatbot. Given a comm...,>Why don't I see ads holding Republicans accou...,"Yeah, I agree with the goal of this post but n...",no_disagreement,0,System Prompt: You are a classification Chatbo...,0,0.951116,no_disagreement,0,0.887011,no_disagreement
4288,You are a classification Chatbot. Given a comm...,How about ... no? This is strange. Community o...,"I know, it feels strange too. We wouldn't hold...",no_disagreement,0,System Prompt: You are a classification Chatbo...,0,0.984794,no_disagreement,0,0.993187,no_disagreement


In [35]:
def get_performance_metrics(df_test, pred_col):
  y_test = df_test.label
  y_pred = df_test[pred_col]

  print("Confusion Matrix:")
  print(confusion_matrix(y_test, y_pred))

  print("\nClassification Report:")
  print(classification_report(y_test, y_pred))

  print("Balanced Accuracy Score:", balanced_accuracy_score(y_test, y_pred))
  print("Accuracy Score:", accuracy_score(y_test, y_pred))

In [42]:
get_performance_metrics(X_test, 'predictions_ft')

Confusion Matrix:
[[1227  397]
 [ 410 2256]]

Classification Report:
                 precision    recall  f1-score   support

       disagree       0.75      0.76      0.75      1624
no_disagreement       0.85      0.85      0.85      2666

       accuracy                           0.81      4290
      macro avg       0.80      0.80      0.80      4290
   weighted avg       0.81      0.81      0.81      4290

Balanced Accuracy Score: 0.8008767124047022
Accuracy Score: 0.8118881118881119


In [33]:
# Compare the model config
print(model.config)


LlamaConfig {
  "_attn_implementation_autoset": true,
  "_name_or_path": "meta-llama/Llama-3.1-8B",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 131072,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pad_token_id": 128001,
  "pretraining_pt": 1,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_storage": "uint8",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold