# Care_Plan: Train Fietje model

**Author:** Eva Rombouts  
**Date:** 2024-07-28  
**Version:** 0.2

### Description
This notebook is almost fully copied from: [Optimizing Phi-2: A Deep Dive into Fine-Tuning Small Language Models](https://medium.com/thedeephub/optimizing-phi-2-a-deep-dive-into-fine-tuning-small-language-models-9d545ac90a99), by Praveen Yerneni. Thank you!!
It trains the chat version of [Fietje](https://huggingface.co/BramVanroy/fietje-2-chat), an adapated version of microsoft/phi-2, trained on Dutch texts.

## Setup

In [None]:
# #Install the required packages
!pip install -q bitsandbytes flash_attn datasets peft
# !pip install einops  bitsandbytes accelerate peft flash_attn
# !pip uninstall -y transformers
# !pip install git+https://github.com/huggingface/transformers
# !pip install --upgrade torch

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, Trainer
from peft import prepare_model_for_kbit_training, LoraConfig, get_peft_model

import os
from google.colab import drive
from sklearn.model_selection import train_test_split
from datasets import Dataset
import pandas as pd
import time

In [None]:
model_name = "BramVanroy/fietje-2b-chat"
model_finetuned = "fietje_zorgplan_magweg" #Name of the model to save to HuggingFace hub
commit_message = "Fietje finetuned"
random_seed = 6
sample_size = 100

## Load model and tokenizer
Model: [fietje-2-chat](https://huggingface.co/BramVanroy/fietje-2-chat)

The model is loaded in `4-bit` which is the "Quantization" part of QLORA. The memory footprint of this is much smaller then the default.


In [None]:
# Configuration to load model in 4-bit quantized
bnb_config = BitsAndBytesConfig(load_in_4bit=True,
                                bnb_4bit_quant_type='nf4',
                                bnb_4bit_compute_dtype='float16',
                                #bnb_4bit_compute_dtype=torch.bfloat16,
                                bnb_4bit_use_double_quant=True)


#Loading the model with compatible settings
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='auto',
                                             quantization_config=bnb_config,
                                             attn_implementation='flash_attention_2',
                                             trust_remote_code=True)

# Setting up the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name,
                                          add_eos_token=True,
                                          trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.truncation_side = 'left'

print(f"Memory footprint: {model.get_memory_footprint() / 1e9} GB")

## Prepare data

ToDo: Publish dataset on HF. [link text](*https://*)

In [None]:
# Mount Google Drive
drive.mount('/content/drive')
# Change directory to your project folder
os.chdir('/content/drive/My Drive/Colab Notebooks/GenCareAI/scripts')

# # Retrieve Hugging Face token
# HF_TOKEN = userdata.get('HF_TOKEN')

# Load and preprocess data
# Load the dataset
df_careplans = pd.read_csv('../data/df_careplans.csv')
# Sample the dataset, rename columns and keep only input & output column
df = (df_careplans
      .sample(sample_size,random_state=random_seed)
      .rename(columns={'notes': 'input', 'careplan': 'output'})
      [['input','output']]
)

# Split data into train, and test sets
train_df, test_df = train_test_split(df, test_size=0.15, random_state=random_seed)

# Convert to Hugging Face Dataset objects
train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

In [None]:
print(train_dataset)
print(test_dataset)

In [None]:
#Function that creates a prompt from input and output and tokenizes it
def collate_and_tokenize(examples):

    careplan = examples["output"][0].replace('"', r'\"')
    notes = examples["input"][0]

    #Merging into one prompt for tokenization and training
    prompt = f'''###System:
Lees de gegeven rapportages en schrijf een zorgplan volgens de instructies.
###Rapportages:
{notes}
###Instructies:
Schrijf een zorgplan voor de drie belangrijkste zorgproblemen op basis van de rapportages.
Formaat: Probleem, Doel, Acties
###Zorgplan:
{careplan}
'''

    #Tokenize the prompt
    encoded = tokenizer(
        prompt,
        return_tensors="np",
        padding="max_length",
        truncation=True,
        ## Very critical to keep max_length at 1024.
        ## Anything more will lead to OOM on T4
        max_length=2048,
    )

    encoded["labels"] = encoded["input_ids"]
    return encoded

In [None]:
#We will just keep the input_ids and labels that we add in function above.
columns_to_remove = ["input","output", "__index_level_0__"]

#tokenize the training and test datasets
tokenized_dataset_train = train_dataset.map(collate_and_tokenize,
                                            batched=True,
                                            batch_size=1,
                                            remove_columns=columns_to_remove)
tokenized_dataset_test = test_dataset.map(collate_and_tokenize,
                                          batched=True,
                                          batch_size=1,
                                          remove_columns=columns_to_remove)


In [None]:
#Check if tokenization looks good
input_ids = tokenized_dataset_test[1]['input_ids']

decoded = tokenizer.decode(input_ids, skip_special_tokens=True)
print(decoded)

## Train model

### Prepare model

In [None]:
#Accelerate training models on larger batch sizes, we can use a fully sharded data parallel model.
from accelerate import FullyShardedDataParallelPlugin, Accelerator
from torch.distributed.fsdp.fully_sharded_data_parallel import FullOptimStateDictConfig, FullStateDictConfig

fsdp_plugin = FullyShardedDataParallelPlugin(
    state_dict_config=FullStateDictConfig(offload_to_cpu=True, rank0_only=False),
    optim_state_dict_config=FullOptimStateDictConfig(offload_to_cpu=True, rank0_only=False),
)

accelerator = Accelerator(fsdp_plugin=fsdp_plugin)

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param:.2f}"
    )

In [None]:
print_trainable_parameters(model)

#gradient checkpointing to save memory
model.gradient_checkpointing_enable()

# Freeze base model layers and cast layernorm in fp32
model = prepare_model_for_kbit_training(model, use_gradient_checkpointing=True)
print(model)

In [None]:
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=[
    'q_proj',
    'k_proj',
    'v_proj',
    'dense',
    'fc1',
    'fc2',
    ], #print(model) will show the modules to use
    bias="none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

lora_model = get_peft_model(model, config)
print_trainable_parameters(lora_model)

lora_model = accelerator.prepare_model(lora_model)

### Train and save model


In [None]:
training_args = TrainingArguments(
    output_dir='./results',  # Output directory for checkpoints and predictions
    report_to='none',
    overwrite_output_dir=True, # Overwrite the content of the output directory
    per_device_train_batch_size=2,  # Batch size for training
    per_device_eval_batch_size=2,  # Batch size for evaluation
    gradient_accumulation_steps=5, # number of steps before optimizing
    gradient_checkpointing=True,   # Enable gradient checkpointing
    gradient_checkpointing_kwargs={"use_reentrant": False},
    warmup_steps=50,  # Number of warmup steps
    #max_steps=1000,  # Total number of training steps
    num_train_epochs=2,  # Number of training epochs
    learning_rate=5e-5,  # Learning rate
    weight_decay=0.01,  # Weight decay
    optim="paged_adamw_8bit", #Keep the optimizer state and quantize it
    fp16=True, #Use mixed precision training
    #For logging and saving
    logging_dir='./logs',
    logging_strategy="steps",
    logging_steps=100,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=2,  # Limit the total number of checkpoints
    eval_strategy="steps",
    eval_steps=100,
    load_best_model_at_end=True, # Load the best model at the end of training
)

trainer = Trainer(
    model=lora_model,
    train_dataset=tokenized_dataset_train,
    eval_dataset=tokenized_dataset_test,
    args=training_args,
)

#Disable cache to prevent warning, reenable for inference
model.config.use_cache = False

start_time = time.time()  # Record the start time
trainer.train()  # Start training
end_time = time.time()  # Record the end time

training_time = end_time - start_time  # Calculate total training time

print(f"Training completed in {training_time} seconds.")

#Save model to hub to ensure we save our work.
lora_model.push_to_hub(model_finetuned,
                  use_auth_token=True,
                  commit_message=commit_message,
                  private=True)


#Terminate the session so we do not incur cost
from google.colab import runtime
runtime.unassign()

# Run Inference

**Note**: Ensure to stop your session and reconnect and reload the model before running the code below.

First we will run inference without the trained weights and check the output.

In [None]:
#Setup a prompt that we can use for testing

new_prompt = """##System:
Lees de gegeven rapportages en schrijf een zorgplan volgens de instructies.
###Rapportages:
- Mevrouw toonde vanochtend meer verwardheid en desoriëntatie. Begeleiding aangepast en extra ondersteuning geboden bij het uitvoeren van dagelijkse activiteiten.
- Mevrouw had behoefte aan extra rust na de ochtendactiviteiten. Tijd genomen voor ontspanning en rustgevende omgeving gecreëerd.
- Mevrouw had moeite met het herkennen van bekende gezichten. Begeleiding gegeven bij het omgaan met desoriëntatie en geheugenverlies.
- Mevrouw Vos was vanochtend rustiger en minder verward. Positieve interacties gehad tijdens het ontbijt en ondersteuning geboden bij het aankleden.
- Mevrouw had last van benauwdheid door haar astma. Astma-inhalator toegediend en ademhaling gecontroleerd. Bewustzijn en saturatie genormaliseerd.
- Mevrouw Vos had extra begeleiding nodig bij het vinden van haar kamer. Vertrouwde zorghandelingen herhaald en geheugenondersteuning geboden.
- Mevrouw was vanochtend erg onrustig en verward. Extra aandacht besteed aan rustgevende activiteiten en familiarisatie om haar te kalmeren.
- Mevrouw had moeite met het herinneren van recente gebeurtenissen. Geheugentraining en cognitieve stimulatie toegepast om haar geheugen te ondersteunen.
- Mevrouw Vos vertoonde tekenen van vermoeidheid en desoriëntatie voor het slapengaan. Begeleiding geboden bij het naar bed gaan en rustige slaapomgeving gecreëerd.
###Instructies:
Schrijf een zorgplan voor de drie belangrijkste zorgproblemen op basis van de rapportages.
Formaat: Probleem, Doel, Acties
###Zorgplan:
"""


In [None]:
inputs = tokenizer(new_prompt, return_tensors="pt",
                   return_attention_mask=False,
                   padding=True, truncation=True)

inputs.to('cuda')

outputs = model.generate(**inputs, repetition_penalty=1.0,
                              max_length=1000)
result = tokenizer.batch_decode(outputs, skip_special_tokens=True)

print(result[0])

In [None]:
from peft import PeftModel, PeftConfig

#Load the model weights from hub
model_id = "ekrombouts/fietje_zorgplan"
trained_model = PeftModel.from_pretrained(model, model_id)

#Run inference
outputs = trained_model.generate(**inputs, max_length=1000)
text = tokenizer.batch_decode(outputs,skip_special_tokens=True)[0]
print(text)


In [None]:
index = 3
prompt = f'''###System:
Lees de gegeven rapportages en schrijf een zorgplan volgens de instructies.
###Rapportages:
{test_dataset["input"][index]}
###Instructies:
Schrijf een zorgplan voor de drie belangrijkste zorgproblemen op basis van de rapportages.
Formaat: Probleem, Doel, Acties
'''

inputs = tokenizer(prompt, return_tensors="pt",
                   return_attention_mask=False,
                   padding=True, truncation=True)

inputs.to('cuda')
outputs = trained_model.generate(**inputs, max_length=2048)
text = tokenizer.batch_decode(outputs,skip_special_tokens=True)[0]
print(text)

In [None]:
prompt = f'''
###Rapportages:
Maandag
S/ Ik voel me zo ziek
O/ Mw ziet er grauw uit. Controles gedaan, heeft koorts
P/ Arts waarschuwen
Dinsdag
Arts is geweest, heeft haar antibiotica gegeven bij een longontsteking
Woensdag
Mw is erg onrustig. Het ziet er niet goed uit. Familie gebeld, die zullen komen.
Donderdag
Reutelende ademhaling, lage saturatie. Arts is geweest, mw is in de laatste fase. Krijgt wisselligging ter preventie decubitus en morfine en dormicum voor de benauwdheid en onrust.
###Instructies:
Schrijf een zorgplan voor de drie belangrijkste zorgproblemen op basis van de rapportages.
Formaat: Probleem, Doel, Acties
'''

inputs = tokenizer(prompt, return_tensors="pt",
                   return_attention_mask=False,
                   padding=True, truncation=True)

inputs.to('cuda')
outputs = trained_model.generate(**inputs, max_length=1000)
text = tokenizer.batch_decode(outputs,skip_special_tokens=True)[0]
print(text)