https://www.datacamp.com/tutorial/fine-tuning-llama-3-1

source code from  
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

In [1]:
import transformers
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
    {"role": "user", "content": "Who are you?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


{'role': 'assistant', 'content': "Arrr, ye be wantin' to know who I be?  Well, matey, I be Captain Chat, a swashbucklin' pirate chatbot, sailin' the seven seas o' knowledge and plunderin' the riches o' information fer ye landlubbers! Me treasure be a vast hoard o' wisdom, and me trusty cutlass be me wit, ready to slice through yer questions and serve ye the answers ye seek! So hoist the sails and set course fer adventure, me hearty!"}


In [2]:
#!pip install wandb --upgrade

In [3]:
import wandb

#from kaggle_secrets import UserSecretsClient
#user_secrets = UserSecretsClient()

#wb_token = user_secrets.get_secret("wandb")

#wandb.login(key=wb_token)
wandb.login()
run = wandb.init(
    project='Fine-tune llama-3.1-8b-it on Sentiment Analysis Dataset', 
    job_type="training", 
    anonymous="allow"
)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33msdfsdfrr[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [4]:
import numpy as np
import pandas as pd
import os
from tqdm import tqdm
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import Dataset
from peft import LoraConfig, PeftConfig
from trl import SFTTrainer
from trl import setup_chat_format
from transformers import (AutoModelForCausalLM, 
                          AutoTokenizer, 
                          BitsAndBytesConfig, 
                          TrainingArguments, 
                          pipeline, 
                          logging)
from sklearn.metrics import (accuracy_score, 
                             classification_report, 
                             confusion_matrix)
from sklearn.model_selection import train_test_split


download `Combined Data.csv.zip` from  

https://www.kaggle.com/datasets/suchintikasarkar/sentiment-analysis-for-mental-health


In [5]:
import pandas as pd
#df = pd.read_csv("/kaggle/input/sentiment-analysis-for-mental-health/Combined Data.csv",index_col = "Unnamed: 0")
df = pd.read_csv("Combined Data.csv",index_col = "Unnamed: 0")
df.loc[:,'status'] = df.loc[:,'status'].str.replace('Bi-Polar','Bipolar')
df = df[(df.status != "Personality disorder") & (df.status != "Stress") & (df.status != "Suicidal")]
df.head()

Unnamed: 0,statement,status
0,oh my gosh,Anxiety
1,"trouble sleeping, confused mind, restless hear...",Anxiety
2,"All wrong, back off dear, forward doubt. Stay ...",Anxiety
3,I've shifted my focus to something else but I'...,Anxiety
4,"I'm restless and restless, it's been a month n...",Anxiety


In [6]:
# Shuffle the DataFrame and select only 3000 rows
df = df.sample(frac=1, random_state=85).reset_index(drop=True).head(3000)

# Split the DataFrame
train_size = 0.8
eval_size = 0.1

# Calculate sizes
train_end = int(train_size * len(df))
eval_end = train_end + int(eval_size * len(df))

# Split the data
X_train = df[:train_end]
X_eval = df[train_end:eval_end]
X_test = df[eval_end:]

# Define the prompt generation functions
def generate_prompt(data_point):
    return f"""
            Classify the text into Normal, Depression, Anxiety, Bipolar, and return the answer as the corresponding mental health disorder label.
text: {data_point["statement"]}
label: {data_point["status"]}""".strip()

def generate_test_prompt(data_point):
    return f"""
            Classify the text into Normal, Depression, Anxiety, Bipolar, and return the answer as the corresponding mental health disorder label.
text: {data_point["statement"]}
label: """.strip()

# Generate prompts for training and evaluation data
X_train.loc[:,'text'] = X_train.apply(generate_prompt, axis=1)
X_eval.loc[:,'text'] = X_eval.apply(generate_prompt, axis=1)

# Generate test prompts and extract true labels
y_true = X_test.loc[:,'status']
X_test = pd.DataFrame(X_test.apply(generate_test_prompt, axis=1), columns=["text"])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train.loc[:,'text'] = X_train.apply(generate_prompt, axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_eval.loc[:,'text'] = X_eval.apply(generate_prompt, axis=1)


In [7]:
X_train.status.value_counts()

status
Normal        1028
Depression     938
Anxiety        258
Bipolar        176
Name: count, dtype: int64

In [8]:
train_data = Dataset.from_pandas(X_train[["text"]])
eval_data = Dataset.from_pandas(X_eval[["text"]])

In [9]:
train_data['text'][3]

'Classify the text into Normal, Depression, Anxiety, Bipolar, and return the answer as the corresponding mental health disorder label.\ntext: I am so sad. Everything in my work life is going fine, but my personal life is a wreck. No one ever takes me seriously because I am the funny friend. I do not want to talk to anyone anymore. I just want to die sometimes. Please help me. I have never had this feeling in my entire life. Why am I so sad\nlabel: Depression'

In [10]:
#base_model_name = "/kaggle/input/llama-3.1/transformers/8b-instruct/1"
base_model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    device_map="auto",
    torch_dtype="float16",
    quantization_config=bnb_config, 
)

model.config.use_cache = False
model.config.pretraining_tp = 1

tokenizer = AutoTokenizer.from_pretrained(base_model_name)

tokenizer.pad_token_id = tokenizer.eos_token_id

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [11]:
def predict(test, model, tokenizer):
    y_pred = []
    categories = ["Normal", "Depression", "Anxiety", "Bipolar"]
    
    for i in tqdm(range(len(test))):
        prompt = test.iloc[i]["text"]
        pipe = pipeline(task="text-generation", 
                        model=model, 
                        tokenizer=tokenizer, 
                        max_new_tokens=2, 
                        temperature=0.1)
        
        result = pipe(prompt)
        answer = result[0]['generated_text'].split("label:")[-1].strip()
        
        # Determine the predicted category
        for category in categories:
            if category.lower() in answer.lower():
                y_pred.append(category)
                break
        else:
            y_pred.append("none")
    
    return y_pred

y_pred = predict(X_test, model, tokenizer)

Device set to use cuda:1                                                                                                                    | 0/300 [00:00<?, ?it/s]
Device set to use cuda:1                                                                                                            | 1/300 [00:00<01:34,  3.17it/s]
Device set to use cuda:1                                                                                                            | 2/300 [00:00<00:57,  5.16it/s]
Device set to use cuda:1                                                                                                            | 3/300 [00:00<00:47,  6.31it/s]
Device set to use cuda:1                                                                                                            | 4/300 [00:00<00:42,  6.95it/s]
Device set to use cuda:1                                                                                                            | 5/300 [00:00<00:42,  6.87it/s]
Device set

In [12]:
def evaluate(y_true, y_pred):
    labels = ["Normal", "Depression", "Anxiety", "Bipolar"]
    mapping = {label: idx for idx, label in enumerate(labels)}
    
    def map_func(x):
        return mapping.get(x, -1)  # Map to -1 if not found, but should not occur with correct data
    
    y_true_mapped = np.vectorize(map_func)(y_true)
    y_pred_mapped = np.vectorize(map_func)(y_pred)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_true=y_true_mapped, y_pred=y_pred_mapped)
    print(f'Accuracy: {accuracy:.3f}')
    
    # Generate accuracy report
    unique_labels = set(y_true_mapped)  # Get unique labels
    
    for label in unique_labels:
        label_indices = [i for i in range(len(y_true_mapped)) if y_true_mapped[i] == label]
        label_y_true = [y_true_mapped[i] for i in label_indices]
        label_y_pred = [y_pred_mapped[i] for i in label_indices]
        label_accuracy = accuracy_score(label_y_true, label_y_pred)
        print(f'Accuracy for label {labels[label]}: {label_accuracy:.3f}')
        
    # Generate classification report
    class_report = classification_report(y_true=y_true_mapped, y_pred=y_pred_mapped, target_names=labels, labels=list(range(len(labels))))
    print('\nClassification Report:')
    print(class_report)
    
    # Generate confusion matrix
    conf_matrix = confusion_matrix(y_true=y_true_mapped, y_pred=y_pred_mapped, labels=list(range(len(labels))))
    print('\nConfusion Matrix:')
    print(conf_matrix)

evaluate(y_true, y_pred)

Accuracy: 0.787
Accuracy for label Normal: 0.748
Accuracy for label Depression: 0.930
Accuracy for label Anxiety: 0.519
Accuracy for label Bipolar: 0.533

Classification Report:
              precision    recall  f1-score   support

      Normal       0.91      0.75      0.82       143
  Depression       0.70      0.93      0.80       115
     Anxiety       0.67      0.52      0.58        27
     Bipolar       0.89      0.53      0.67        15

    accuracy                           0.79       300
   macro avg       0.79      0.68      0.72       300
weighted avg       0.81      0.79      0.78       300


Confusion Matrix:
[[107  32   4   0]
 [  4 107   3   1]
 [  4   9  14   0]
 [  2   5   0   8]]


In [13]:
import bitsandbytes as bnb

def find_all_linear_names(model):
    cls = bnb.nn.Linear4bit
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, cls):
            names = name.split('.')
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names:  # needed for 16 bit
        lora_module_names.remove('lm_head')
    return list(lora_module_names)
modules = find_all_linear_names(model)
modules

['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'o_proj', 'down_proj', 'q_proj']

In [18]:
output_dir="llama-3.1-fine-tuned-model"

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=modules,
)

training_arguments = TrainingArguments(
    output_dir=output_dir,                    # directory to save and repository id
    num_train_epochs=1,                       # number of training epochs
    per_device_train_batch_size=1,            # batch size per device during training
    gradient_accumulation_steps=8,            # number of steps before performing a backward/update pass
    gradient_checkpointing=True,              # use gradient checkpointing to save memory
    optim="paged_adamw_32bit",
    logging_steps=1,                         
    learning_rate=2e-4,                       # learning rate, based on QLoRA paper
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,                        # max gradient norm based on QLoRA paper
    max_steps=-1,
    warmup_ratio=0.03,                        # warmup ratio based on QLoRA paper
    group_by_length=False,
    lr_scheduler_type="cosine",               # use cosine learning rate scheduler
    report_to="wandb",                  # report metrics to w&b
    eval_strategy="steps",              # save checkpoint every epoch
    eval_steps = 0.2
)

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=train_data,
    eval_dataset=eval_data,
    peft_config=peft_config,
    #dataset_text_field="text",
    tokenizer=tokenizer,
    #max_seq_length=512,
    #packing=False,
    #dataset_kwargs={
    #"add_special_tokens": False,
    #"append_concat_token": False,
    #}
)

  trainer = SFTTrainer(


Converting train dataset to ChatML:   0%|          | 0/2400 [00:00<?, ? examples/s]

Applying chat template to train dataset:   0%|          | 0/2400 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/2400 [00:00<?, ? examples/s]

Truncating train dataset:   0%|          | 0/2400 [00:00<?, ? examples/s]

Converting eval dataset to ChatML:   0%|          | 0/300 [00:00<?, ? examples/s]

Applying chat template to eval dataset:   0%|          | 0/300 [00:00<?, ? examples/s]

Tokenizing eval dataset:   0%|          | 0/300 [00:00<?, ? examples/s]

Truncating eval dataset:   0%|          | 0/300 [00:00<?, ? examples/s]

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


In [19]:
trainer.train()



Step,Training Loss,Validation Loss
60,1.8504,2.154059
120,2.1548,2.131257
180,1.7358,2.115781
240,2.0378,2.10787
300,2.2301,2.104931



Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Llama-3.1-8B-Instruct is restricted and you are not in the authorized list. Visit https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct to ask for access. - silently ignoring the lookup for the file config.json in meta-llama/Meta-Llama-3.1-8B-Instruct.


TrainOutput(global_step=300, training_loss=2.182258568207423, metrics={'train_runtime': 1365.8223, 'train_samples_per_second': 1.757, 'train_steps_per_second': 0.22, 'total_flos': 1.6079931622514688e+16, 'train_loss': 2.182258568207423})

In [20]:
wandb.finish()
model.config.use_cache = True

0,1
eval/loss,█▅▃▁▁
eval/mean_token_accuracy,▁▄▇██
eval/runtime,▂▃▁▆█
eval/samples_per_second,▇▆█▃▁
eval/steps_per_second,████▁
train/epoch,▁▁▂▂▂▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▆▇▇█████
train/global_step,▁▁▂▂▂▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▇▇▇▇██
train/grad_norm,▃█▂▄▄▃▄▃▅▅▂▂▃▃▂▃▃▂▃▄▂▄▃▃▃▂▂▂▃▄▁▁▃▁▄▁▂▂▃▃
train/learning_rate,▃█████████▇▇▇▇▇▆▆▅▅▅▄▄▄▃▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁
train/loss,██▆▄▄▃▄▃▄▆▅▄▅▆▁▅▅▃▄▄▄▃▄▃▅▄▄▆▄▄▄▃▃▄▅▅▄▃▂▂

0,1
eval/loss,2.10493
eval/mean_token_accuracy,0.55353
eval/runtime,50.6463
eval/samples_per_second,5.923
eval/steps_per_second,0.75
total_flos,1.6079931622514688e+16
train/epoch,1.0
train/global_step,300.0
train/grad_norm,0.18428
train/learning_rate,0.0


In [21]:
# Save trained model and tokenizer
trainer.save_model(output_dir)
tokenizer.save_pretrained(output_dir)


Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Llama-3.1-8B-Instruct is restricted and you are not in the authorized list. Visit https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct to ask for access. - silently ignoring the lookup for the file config.json in meta-llama/Meta-Llama-3.1-8B-Instruct.


('llama-3.1-fine-tuned-model/tokenizer_config.json',
 'llama-3.1-fine-tuned-model/special_tokens_map.json',
 'llama-3.1-fine-tuned-model/tokenizer.json')

In [22]:
y_pred = predict(X_test, model, tokenizer)
evaluate(y_true, y_pred)

Device set to use cuda:1                                                                                                                    | 0/300 [00:00<?, ?it/s]
Device set to use cuda:1                                                                                                            | 1/300 [00:00<01:17,  3.88it/s]
Device set to use cuda:1                                                                                                            | 2/300 [00:00<01:01,  4.87it/s]
Device set to use cuda:1                                                                                                            | 3/300 [00:00<00:56,  5.30it/s]
Device set to use cuda:1                                                                                                            | 4/300 [00:00<00:53,  5.57it/s]
Device set to use cuda:1                                                                                                            | 5/300 [00:00<00:55,  5.34it/s]
Device set

Accuracy: 0.923
Accuracy for label Normal: 0.972
Accuracy for label Depression: 0.930
Accuracy for label Anxiety: 0.630
Accuracy for label Bipolar: 0.933

Classification Report:
              precision    recall  f1-score   support

      Normal       0.91      0.97      0.94       143
  Depression       0.95      0.93      0.94       115
     Anxiety       0.81      0.63      0.71        27
     Bipolar       1.00      0.93      0.97        15

    accuracy                           0.92       300
   macro avg       0.92      0.87      0.89       300
weighted avg       0.92      0.92      0.92       300


Confusion Matrix:
[[139   3   1   0]
 [  5 107   3   0]
 [  7   3  17   0]
 [  1   0   0  14]]



