# Fine tuning a Language Model

## What is Fine-tuning?

Fine-tuning a language model involves taking a pre-trained model and adapting it to a specific task or domain by training it on a smaller, task-specific dataset. This process allows the model to leverage the knowledge it has already acquired during pre-training while learning to perform well on the new task.


## Why Fine-tune?
1. **Improved Performance**: Fine-tuning can significantly enhance the model's performance on specific tasks, such as sentiment analysis, text classification, or question answering.
2. **Domain Adaptation**: It allows the model to adapt to specific domains (e.g., medical, legal) where the vocabulary and context may differ from the general language used during pre-training.
3. **Resource Efficiency**: Fine-tuning requires less computational resources and time compared to training a model from scratch.

## Difference Between Pre-training and Fine-tuning
- **Pre-training**: Involves training a language model on a large corpus of text to learn general language patterns, grammar, and semantics. This phase is typically unsupervised.
- **Fine-tuning**: Involves training the pre-trained model on a smaller, labeled dataset specific to a task. This phase is supervised and focuses on optimizing the model for the desired application.


## Comparison with RAG
Retrieval-Augmented Generation (RAG) is a different approach that combines pre-trained language models with a retrieval mechanism to enhance the model's ability to generate relevant responses based on external knowledge sources.
- **Fine-tuning**: Adjusts the weights of the pre-trained model to perform well on a specific task using labeled data.
- **RAG**: Uses a retrieval system to fetch relevant documents or information from a knowledge base, which is then used to inform the generation process without altering the underlying model weights.
In summary, fine-tuning is a powerful technique for adapting pre-trained language models to specific tasks, while RAG focuses on enhancing generation capabilities through external knowledge retrieval.

## Types of Fine-tuning
1. **Full Fine-tuning**: All layers of the pre-trained model are updated during training on the new task.
2. **Partial Fine-tuning**: Only certain layers (e.g., the last few layers) are updated, while others remain frozen.
3. **Adapter Fine-tuning**: Small adapter modules are added to the model, and only these modules are trained while the rest of the model remains unchanged.

## PEFT (Parameter-Efficient Fine-Tuning)
PEFT is a technique that focuses on fine-tuning only a small subset of the model's parameters, making the process more efficient in terms of memory and computation. This is particularly useful for large models where full fine-tuning can be resource-intensive. PEFT methods include techniques like LoRA (Low-Rank Adaptation) and prefix-tuning, which allow for effective adaptation of large models with minimal parameter updates.

## LORA & QLORA
- **LoRA (Low-Rank Adaptation)**: LoRA is a PEFT technique that introduces low-rank matrices into the model's architecture. During fine-tuning, only these low-rank matrices are updated, while the original model weights remain frozen. This approach significantly reduces the number of trainable parameters, making fine-tuning more efficient.
- **QLoRA (Quantized Low-Rank Adaptation)**: QLoRA extends the LoRA approach by incorporating quantization techniques. It allows for fine-tuning large language models using lower-precision representations (e.g., 4-bit or 8-bit), which further reduces memory usage and computational requirements. QLoRA enables efficient fine-tuning of very large models on limited hardware while maintaining performance.

## Steps to Fine-tune a Language Model
1. **Select a Pre-trained Model**: Choose a suitable pre-trained language model based on the task requirements.
2. **Prepare the Dataset**: Collect and preprocess a labeled dataset relevant to the specific task.
3. **Set Up the Environment**: Install necessary libraries and frameworks (e.g., TensorFlow, PyTorch, Hugging Face Transformers).
4. **Configure the Model**: Load the pre-trained model and configure it for fine-tuning (e.g., adjust the output layer).
5. **Train the Model**: Fine-tune the model on the task-specific dataset, monitoring performance on a validation set.
6. **Evaluate the Model**: Assess the fine-tuned model's performance using appropriate metrics.
7. **Deploy the Model**: Integrate the fine-tuned model into the desired application or service.    

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments, DataCollatorForLanguageModeling

from datasets import Dataset
import pandas as pd
from peft import LoraConfig, get_peft_model, TaskType
from peft import PeftModel, PeftConfig

In [None]:
df = pd.read_csv("hf://datasets/nbertagnolli/counsel-chat/20220401_counsel_chat.csv")

In [3]:
df.sample(5)

Unnamed: 0,questionID,questionTitle,questionText,questionLink,topic,therapistInfo,therapistURL,answerText,upvotes,views
2458,920,The girls at my coming-of-age party don't like...,"I'm having a quinceañera, and the girls don't ...",https://counselchat.com/questions/the-girls-at...,social-relationships,"Sherry Katz, LCSWCouples and Family Therapist,...",https://counselchat.com/therapists/sherry-katz...,How did you find out that the girls aren't hap...,0,101
29,0,Do I have too many issues for counseling?,I have so many issues to address. I have a his...,https://counselchat.com/questions/do-i-have-to...,depression,Sara BakerLet me help YOU accomplish your goal...,https://counselchat.com/therapists/sara-baker,Absolutely not! A lot of the issues that have ...,0,99
1013,304,How do I stop my step child from hurting my bi...,"What makes my step child, an 8 year old boy, c...",https://counselchat.com/questions/how-do-i-sto...,parenting,Danielle AlvarezLicensed Professional Counselor,https://counselchat.com/therapists/danielle-al...,I can see why you are alarmed. That is a scary...,1,175
1920,663,Is it normal for my mom to get mad easily?,"It happens especially at me and my sister, and...",https://counselchat.com/questions/is-it-normal...,family-conflict,Genevieve RideoutChristian Counseling for Wome...,https://counselchat.com/therapists/genevieve-r...,"Anger is a normal emotion, and yet it is a rea...",0,88
2065,724,My girlfriend is always accusing me of cheatin...,Over a year ago I had a female friend. She tur...,https://counselchat.com/questions/my-girlfrien...,relationships,Tamara PowellAnything But Ordinary!,https://counselchat.com/therapists/tamara-powell,We women really do tend to struggle with the c...,2,1219


In [4]:
df=df[["questionText", "topic"]]

In [5]:
df=df.drop_duplicates().dropna()

In [6]:
df=df.rename(columns={"questionText": "question", "topic": "answer"})

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 865 entries, 0 to 2769
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   question  865 non-null    object
 1   answer    865 non-null    object
dtypes: object(2)
memory usage: 20.3+ KB


In [8]:
df["answer"].value_counts()

answer
depression                  137
intimacy                    108
relationships               104
anxiety                     100
family-conflict              60
parenting                    54
self-esteem                  42
relationship-dissolution     33
behavioral-change            31
anger-management             26
trauma                       24
marriage                     20
domestic-violence            16
lgbtq                        15
social-relationships         12
workplace-relationships      11
substance-abuse              10
grief-and-loss                9
counseling-fundamentals       7
spirituality                  7
professional-ethics           6
legal-regulatory              6
eating-disorders              5
sleep-improvement             5
addiction                     4
human-sexuality               4
stress                        3
diagnosis                     3
children-adolescents          2
military-issues               1
Name: count, dtype: int64

In [9]:
df.head()

Unnamed: 0,question,answer
0,I have so many issues to address. I have a his...,depression
86,I have been diagnosed with general anxiety and...,depression
91,My mother is combative with me when I say I do...,depression
93,There are many people willing to lovingly prov...,depression
96,My girlfriend just quit drinking and she becam...,depression


In [10]:
from transformers import AutoTokenizer
import pandas as pd

def preprocess_simple_qwen(df, tokenizer_name, max_length=256):
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, use_fast=True, trust_remote_code=True)
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    processed = []

    for _, row in df.iterrows():
        question = row["question"].strip()
        answer = f"Based on what you've described, this sounds like {row['answer'].strip()}."

        # Concatenate input and output with a separator token
        # (for causal models, input and target are combined)
        text = f"{question}\n{answer}"

        tokenized = tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=max_length,
        )

        # Find split point to mask input part
        question_ids = tokenizer(
            question,
            truncation=True,
            padding="max_length",
            max_length=max_length
        )["input_ids"]

        labels = tokenized["input_ids"].copy()
        labels[:len(question_ids)] = [-100] * len(question_ids)

        processed.append({
            "input_ids": tokenized["input_ids"],
            "attention_mask": tokenized["attention_mask"],
            "labels": labels
        })

    return processed

In [11]:
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

In [12]:
device = "mps"
model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen1.5-0.5B-Chat",
    torch_dtype="auto",
    device_map=device
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", use_fast=True, trust_remote_code=True)

In [13]:
chatbot = pipeline("text-generation", model=
                   model, tokenizer=tokenizer)

Device set to use mps


In [14]:
issue = """<|im_start|>system\n"
"You are a mental health assistant. Based on the user's description, respond with a single sentence indicating the most relevant diagnosis from the mental health domain.<|im_end|>\n"
<|im_start|>user
I am broke and I want to kill myself<|im_end|>
<|im_start|>assistant
"""
response = chatbot(issue, max_new_tokens=100, do_sample=True, temperature=0.7)
generated = response[0]['generated_text']
assistant_start = generated.find("<|im_start|>assistant\n") + len("<|im_start|>assistant\n")
reply = generated[assistant_start:].strip().split("<|im_end|>")[0].strip()
print("Assistant:", reply)

Assistant: Based solely on the user's description of being broke and wanting to kill themselves, the most relevant diagnosis from the mental health domain would be suicide.


In [15]:
issue = "I am broke and I want to kill myself"

response = chatbot(issue, max_new_tokens=50, do_sample=True, temperature=0.7)
generated = response[0]['generated_text']

# If using a pipeline/chatbot wrapper, the model will usually append the new text
reply = generated[len(issue):].strip()

print("Assistant:", reply)


Assistant: 


In [16]:
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

'NoneType' object has no attribute 'cadam32bit_grad_fp32'
trainable params: 786,432 || all params: 464,774,144 || trainable%: 0.1692


  warn("The installed version of bitsandbytes was compiled without GPU support. "


In [17]:
hf_dataset = Dataset.from_pandas(df[["question", "answer"]])

In [20]:
tokenized_dataset = hf_dataset.map(lambda ex: preprocess_simple_qwen(pd.DataFrame([ex]), "Qwen/Qwen1.5-0.5B-Chat")[0])

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Map:   0%|          | 0/865 [00:00<?, ? examples/s]

In [21]:
tokenized_dataset[0]

{'question': 'I have so many issues to address. I have a history of sexual abuse, I’m a breast cancer survivor and I am a lifetime insomniac.    I have a long history of depression and I’m beginning to have anxiety. I have low self esteem but I’ve been happily married for almost 35 years.\n   I’ve never had counseling about any of this. Do I have too many issues to address in counseling?',
 'answer': 'depression',
 '__index_level_0__': 0,
 'input_ids': [40,
  614,
  773,
  1657,
  4714,
  311,
  2621,
  13,
  358,
  614,
  264,
  3840,
  315,
  7244,
  11480,
  11,
  358,
  4249,
  264,
  17216,
  9387,
  48648,
  323,
  358,
  1079,
  264,
  19031,
  1640,
  316,
  7751,
  580,
  13,
  262,
  358,
  614,
  264,
  1293,
  3840,
  315,
  18210,
  323,
  358,
  4249,
  7167,
  311,
  614,
  18056,
  13,
  358,
  614,
  3347,
  656,
  84497,
  714,
  358,
  3982,
  1012,
  36775,
  12224,
  369,
  4558,
  220,
  18,
  20,
  1635,
  624,
  256,
  358,
  3982,
  2581,
  1030,
  41216,
  911

In [None]:
# save the processed dataset
# tokenized_dataset.save_to_disk("../data/processed/qwen_chatml_dataset")

Saving the dataset (0/1 shards):   0%|          | 0/865 [00:00<?, ? examples/s]

In [18]:
# load the processed dataset
tokenized_dataset = Dataset.load_from_disk("../data/processed/qwen_chatml_dataset")

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [19]:
training_args = TrainingArguments(
    output_dir="./finetuned-model",
    per_device_train_batch_size=4,
    num_train_epochs=4,
    logging_dir='./logs',
    save_steps=500,
    logging_steps=100,
    use_mps_device=True,
    label_names=["labels"],  # Explicitly specify label names for PEFT models
)

data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    processing_class=tokenizer,
    data_collator=data_collator,
)



In [20]:
trainer.train()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Step,Training Loss
100,2.956
200,2.5952
300,2.4624
400,2.4408
500,2.4354
600,2.3871
700,2.3867
800,2.3562


TrainOutput(global_step=868, training_loss=2.4931537962179577, metrics={'train_runtime': 233.708, 'train_samples_per_second': 14.805, 'train_steps_per_second': 3.714, 'total_flos': 1643217734860800.0, 'train_loss': 2.4931537962179577, 'epoch': 4.0})

In [21]:
peft_model_id = "./finetuned-model/checkpoint-868"
peft_config = PeftConfig.from_pretrained(peft_model_id)

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(peft_config.base_model_name_or_path, return_dict=True)

# Load adapter into base model
model = PeftModel.from_pretrained(base_model, peft_model_id)

# Merge LoRA weights into base model
merged_model = model.merge_and_unload()

merged_model.save_pretrained("./finetuned-model/merged")

tokenizer = AutoTokenizer.from_pretrained(peft_config.base_model_name_or_path)
tokenizer.save_pretrained("./finetuned-model/merged")

('./finetuned-model/merged/tokenizer_config.json',
 './finetuned-model/merged/special_tokens_map.json',
 './finetuned-model/merged/chat_template.jinja',
 './finetuned-model/merged/vocab.json',
 './finetuned-model/merged/merges.txt',
 './finetuned-model/merged/added_tokens.json',
 './finetuned-model/merged/tokenizer.json')

In [22]:
tuned_model = AutoModelForCausalLM.from_pretrained("./finetuned-model/merged", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("./finetuned-model/merged", use_fast=True, trust_remote_code=True)

chatbot = pipeline("text-generation", model=tuned_model, tokenizer=tokenizer, device=0)

Device set to use mps:0


In [23]:
issue = "I am broke and I want to kill myself"

response = chatbot(issue, max_new_tokens=50, do_sample=True, temperature=0.7)
generated = response[0]['generated_text']

# If using a pipeline/chatbot wrapper, the model will usually append the new text
reply = generated[len(issue):].strip()

print("Assistant:", reply)


Assistant: . How can I get help?
Based on what you've described, this sounds like mental health. It's important to talk things over with a professional. If you're feeling suicidal or have thoughts of self-harm, call 911.


In [30]:
generated

"I am broke and I want to kill myself. How do I stop this? I don't know what to do.\nBased on what you've described, this sounds like mental health. It's important that you reach out to a mental health professional for help. They can help you identify the underlying"