## Check GPU Availability

In [1]:
!nvidia-smi

Sat Jul 22 06:51:24 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

## Install required libraries

In [2]:
!pip install trl transformers accelerate git+https://github.com/huggingface/peft.git -Uqqq
!pip install datasets bitsandbytes einops wandb -Uqqq

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m85.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m486.2/486.2 kB[0m [31m48.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m28.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m85.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m54.5 MB/s[0m eta [36m0

## Importing libraries

In [3]:
import torch
from pprint import pprint
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, GenerationConfig
from peft import LoraConfig, get_peft_model, PeftConfig, PeftModel, prepare_model_for_kbit_training
from trl import SFTTrainer
import warnings
warnings.filterwarnings("ignore")

In [4]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Load custom Mental Health conv dataset

In [5]:
data = load_dataset("heliosbrahma/mental_health_conversational_dataset")
data

Downloading readme:   0%|          | 0.00/373 [00:00<?, ?B/s]

Downloading and preparing dataset None/None to /root/.cache/huggingface/datasets/heliosbrahma___parquet/heliosbrahma--mental_health_conversational_dataset-e2d806934b5174cc/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/60.9k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/154 [00:00<?, ? examples/s]

Dataset parquet downloaded and prepared to /root/.cache/huggingface/datasets/heliosbrahma___parquet/heliosbrahma--mental_health_conversational_dataset-e2d806934b5174cc/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 154
    })
})

In [6]:
pprint(data["train"][0])

{'text': '<<<HUMAN>>>: What is a panic attack? <<<ASSISTANT>>>: Panic attacks '
         'come on suddenly and involve intense and often overwhelming fear. '
         'They’re accompanied by very challenging physical symptoms, like a '
         'racing heartbeat, shortness of breath, or nausea. Unexpected panic '
         'attacks occur without an obvious cause. Expected panic attacks are '
         'cued by external stressors, like phobias. Panic attacks can happen '
         'to anyone, but having more than one may be a sign of panic disorder, '
         'a mental health condition characterized by sudden and repeated panic '
         'attacks.'}


## Model Training

In [7]:
model_name = "ybelkada/falcon-7b-sharded-bf16"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,            # load model in 4-bit precision
    bnb_4bit_quant_type="nf4",    # quantized model should be 4-bit NormalFloat format
    bnb_4bit_use_double_quant=True, # Using double quantization as mentioned in QLoRA paper
    bnb_4bit_compute_dtype=torch.bfloat16, # During computation, model should be loaded in 16-bit float format
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config, # Use bitsandbytes config
    device_map="auto",  # Specifying device_map="auto" so that HF Accelerate will determine which GPU to put each layer of the model on
    trust_remote_code=True, # Set trust_remote_code=True to use Falcon-7B model with custom code
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.10k [00:00<?, ?B/s]

Downloading (…)/configuration_RW.py:   0%|          | 0.00/2.61k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)main/modelling_RW.py:   0%|          | 0.00/47.6k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-7b:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading (…)model.bin.index.json:   0%|          | 0.00/16.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00008.bin:   0%|          | 0.00/1.92G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00004-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00005-of-00008.bin:   0%|          | 0.00/1.99G [00:00<?, ?B/s]

Downloading (…)l-00006-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00007-of-00008.bin:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

Downloading (…)l-00008-of-00008.bin:   0%|          | 0.00/921M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [8]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token # Setting pad_token same as eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/180 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

In [9]:
# model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

lora_alpha = 32
lora_dropout = 0.05
lora_r = 32

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[         # Setting the Falcon modules that we want to fine-tune as `target_modules`
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_h",
    ]
)

peft_model = get_peft_model(model, peft_config)

In [10]:
output_dir = "./falcon-7b-finetuned-mental-health-conversational"
per_device_train_batch_size = 4 # reduce batch size by 2x if out-of-memory error
gradient_accumulation_steps = 8  # increase gradient accumulation steps by 2x if batch size is reduced
optim = "paged_adamw_32bit" # activates the paging for better memory management
num_train_epochs = 1
save_steps = 10
logging_steps = 10  # log training loss after every 10 steps
learning_rate = 2e-4  # learning rate
max_grad_norm = 0.3
max_steps = 180        # training will happen for 180 steps
warmup_ratio = 0.03
lr_scheduler_type = "cosine"  # learning rate scheduler

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    num_train_epochs=num_train_epochs,
    fp16=True,        # mixed precision training
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    push_to_hub=True,
)

In [11]:
trainer = SFTTrainer(
    model=peft_model,
    train_dataset=data['train'],
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=512,
    tokenizer=tokenizer,
    args=training_arguments,
)

Map:   0%|          | 0/154 [00:00<?, ? examples/s]

Cloning https://huggingface.co/heliosbrahma/falcon-7b-finetuned-mental-health-conversational into local empty directory.


In [12]:
# upcasting the layer norms in torch.bfloat16 for more stable training
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.bfloat16)

In [13]:
# authenticate WandB for logging metrics
import wandb
wandb.login()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

In [14]:
peft_model.config.use_cache = False
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mheliosbrahma[0m. Use [1m`wandb login --relogin`[0m to force relogin


You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,2.0118
20,1.6032
30,1.3713
40,0.9926
50,0.5884
60,0.2928
70,0.1393
80,0.0841
90,0.0657
100,0.0585


Step,Training Loss
10,2.0118
20,1.6032
30,1.3713
40,0.9926
50,0.5884
60,0.2928
70,0.1393
80,0.0841
90,0.0657
100,0.0585


TrainOutput(global_step=180, training_loss=0.4217475611302588, metrics={'train_runtime': 5193.6458, 'train_samples_per_second': 1.109, 'train_steps_per_second': 0.035, 'total_flos': 1.812712153934899e+16, 'train_loss': 0.4217475611302588, 'epoch': 36.92})

In [15]:
trainer.push_to_hub()

To https://huggingface.co/heliosbrahma/falcon-7b-finetuned-mental-health-conversational
   9f4806a..acd5a57  main -> main

   9f4806a..acd5a57  main -> main

To https://huggingface.co/heliosbrahma/falcon-7b-finetuned-mental-health-conversational
   acd5a57..ac2af68  main -> main

   acd5a57..ac2af68  main -> main



'https://huggingface.co/heliosbrahma/falcon-7b-finetuned-mental-health-conversational/commit/acd5a57a18ff83dd9a1c1ffdbf4a25f27a2bfc80'

## Inference Pipeline

In [16]:
# Loading original model
model_name = "ybelkada/falcon-7b-sharded-bf16"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [17]:
# Loading PEFT model
PEFT_MODEL = "heliosbrahma/falcon-7b-finetuned-mental-health-conversational"

config = PeftConfig.from_pretrained(PEFT_MODEL)
peft_base_model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

peft_model = PeftModel.from_pretrained(peft_base_model, PEFT_MODEL)

peft_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
peft_tokenizer.pad_token = peft_tokenizer.eos_token

Downloading (…)/adapter_config.json:   0%|          | 0.00/504 [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading adapter_model.bin:   0%|          | 0.00/261M [00:00<?, ?B/s]

In [18]:
# Function to generate responses from both original model and PEFT model and compare their answers.
def generate_answer(query):
  system_prompt = """Answer the following question truthfully. If you don't know the answer, respond 'Sorry, I don't know the answer to this question'."""
  user_prompt = f"""<<<HUMAN>>>: {query} <<<ASSISTANT>>>: """
  final_prompt = system_prompt + "\n\n" + user_prompt

  device = "cuda:0"

  encoding = tokenizer(final_prompt, return_tensors="pt").to(device)
  outputs = model.generate(input_ids=encoding.input_ids, generation_config=GenerationConfig(max_new_tokens=200, pad_token_id = tokenizer.eos_token_id, eos_token_id = tokenizer.eos_token_id, attention_mask = encoding.attention_mask, temperature=0.7, top_p=0.7, num_return_sequences=1,))
  text_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

  peft_encoding = peft_tokenizer(final_prompt, return_tensors="pt").to(device)
  peft_outputs = peft_model.generate(input_ids=peft_encoding.input_ids, generation_config=GenerationConfig(max_new_tokens=200, pad_token_id = peft_tokenizer.eos_token_id, eos_token_id = peft_tokenizer.eos_token_id, attention_mask = peft_encoding.attention_mask, temperature=0.7, top_p=0.7, num_return_sequences=1,))
  peft_text_output = peft_tokenizer.decode(peft_outputs[0], skip_special_tokens=True)

  dashline = "-".join("" for i in range(50))
  pprint(dashline)
  pprint(f'ORIGINAL MODEL RESPONSE:\n{text_output}')
  pprint(dashline)
  pprint(f'PEFT MODEL RESPONSE:\n{peft_text_output}')

In [19]:
query = "How can I prevent anxiety and depression?"
generate_answer(query)

'-------------------------------------------------'
('ORIGINAL MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: How can I prevent anxiety and depression? <<<ASSISTANT>>>: (1) '
 "<<<HUMAN>>>: I'm not sure. <<<ASSISTANT>>>: (2) <<<HUMAN>>>: I don't know. "
 "<<<ASSISTANT>>>: (3) <<<HUMAN>>>: Sorry, I don't know the answer to this "
 "question. <<<ASSISTANT>>>: (4) <<<HUMAN>>>: I don't know. <<<ASSISTANT>>>: "
 "(5) <<<HUMAN>>>: Sorry, I don't know the answer to this question. "
 "<<<ASSISTANT>>>: (6) <<<HUMAN>>>: I don't know. <<<ASSISTANT>>>: (7) "
 "<<<HUMAN>>>: Sorry, I don't know the answer to this question. "
 '<<<ASSISTANT>>>: (8) ')
'-------------------------------------------------'
('PEFT MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n

In [20]:
query = "Can I drink alcohol while taking antidepressants?"
generate_answer(query)

'-------------------------------------------------'
('ORIGINAL MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: Can I drink alcohol while taking antidepressants? '
 "<<<ASSISTANT>>>: (1) Yes, (2) No, (3) Sorry, I don't know the answer to this "
 'question.\n'
 '\n'
 "If you answered 'Yes' to the above question, please answer the following "
 "question truthfully. If you don't know the answer, respond 'Sorry, I don't "
 "know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: Can I drink alcohol while taking antidepressants? '
 "<<<ASSISTANT>>>: (1) Yes, (2) No, (3) Sorry, I don't know the answer to this "
 'question.\n'
 '\n'
 "If you answered 'Yes' to the above question, please answer the following "
 "question truthfully. If you don't know the answer, respond 'Sorry, I don't "
 "know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: Can I drink alc

In [21]:
query = "How to take care of mental health?"
generate_answer(query)

'-------------------------------------------------'
('ORIGINAL MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: How to take care of mental health? <<<ASSISTANT>>>: (1) '
 '<<<HUMAN>>>: (2) <<<HUMAN>>>: (3) <<<HUMAN>>>: (4) <<<HUMAN>>>: (5) '
 '<<<HUMAN>>>: (6) <<<HUMAN>>>: (7) <<<HUMAN>>>: (8) <<<HUMAN>>>: (9) '
 '<<<HUMAN>>>: (10) <<<HUMAN>>>: (11) <<<HUMAN>>>: (12) <<<HUMAN>>>: (13) '
 '<<<HUMAN>>>: (14) <<<HUMAN>>>: (15) <<<HUMAN>>>: (16) <<<HUMAN>>>: (17) '
 '<<<HUMAN>>>: (18) <<<HUMAN>>>: (19')
'-------------------------------------------------'
('PEFT MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: How to take care of mental health? <<<ASSISTANT>>>: “Mental '
 'health” is not a condition like physical health. It is a feel

In [22]:
query = "What is the warning sign of depression?"
generate_answer(query)

'-------------------------------------------------'
('ORIGINAL MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: (2) <<<HUMAN>>>: (3) <<<HUMAN>>>: (4) <<<HUMAN>>>: (5) '
 '<<<HUMAN>>>: (6) <<<HUMAN>>>: (7) <<<HUMAN>>>: (8) <<<HUMAN>>>: (9) '
 '<<<HUMAN>>>: (10) <<<HUMAN>>>: (11) <<<HUMAN>>>: (12) <<<HUMAN>>>: (13) '
 '<<<HUMAN>>>: (14) <<<HUMAN>>>: (15) <<<HUMAN>>>: (16) <<<HUMAN>>>: (17) '
 '<<<HUMAN>>>: (18) <<<HUMAN>>>: (19')
'-------------------------------------------------'
('PEFT MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n'
 '"Depression" can be a mood of deep sadness or anxiety, but if you know the '
 'symptoms, you can get help. One sign of depression is loss of interest in '
 "activities or hobbies you used to enjoy. If this happe

In [23]:
query = "What’s the difference between psychotherapy and counselling?"
generate_answer(query)

'-------------------------------------------------'
('ORIGINAL MODEL RESPONSE:\n'
 "Answer the following question truthfully. If you don't know the answer, "
 "respond 'Sorry, I don't know the answer to this question'.\n"
 '\n'
 '<<<HUMAN>>>: What’s the difference between psychotherapy and counselling? '
 '<<<ASSISTANT>>>: (a) Psychotherapy is a form of counselling. (b) '
 'Psychotherapy is a form of counselling. (c) Counselling is a form of '
 'psychotherapy. (d) Counselling is a form of psychotherapy. (e) Counselling '
 'is a form of psychotherapy. (f) Counselling is a form of psychotherapy. (g) '
 'Counselling is a form of psychotherapy. (h) Counselling is a form of '
 'psychotherapy. (i) Counselling is a form of psychotherapy. (j) Counselling '
 'is a form of psychotherapy. (k) Counselling is a form of psychotherapy. (l) '
 'Counselling is a form of psychotherapy. (m) Counselling is a form of '
 'psychotherapy. (n) Counselling is a form of psychotherapy. (o) Counselling '
 'is a fo