# Finetune Your Chatbot on a Single Node Xeon SPR 

NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook will introduce how to finetune your chatbot on the customized data on a single node Xeon SPR.

## Prepare Environment

Recommend to use Python 3.9 or higher version.

In [None]:
!pip install intel-extension-for-transformers
!git clone https://github.com/intel/intel-extension-for-transformers.git
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements.txt

## Prepare the Dataset
We select 3 kind of datasets to conduct the finetuning process for different tasks.

1. Text Generation (General domain instruction): We use the [Alpaca dataset](https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). In Alpaca, researchers have manually crafted 175 seed tasks to guide `text-davinci-003` in generating 52K instruction data for diverse tasks.

2. Summarization: An English-language dataset [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail, is used for this task.

3. Code Generation: To enhance code performance of LLMs (Large Language Models), we use the [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1).



In [2]:
!curl -OL https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 21.7M  100 21.7M    0     0  29.0M      0 --:--:-- --:--:-- --:--:-- 29.1M


## Finetune Your Chatbot

We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently.

Finetune the model on Alpaca-format dataset to conduct text generation:

In [3]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="/models/llama-7b-hf/")
data_args = DataArguments(train_file="alpaca_data.json")
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True,
)
finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

[INFO|training_args.py:1327] 2023-10-26 22:49:00,225 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1769] 2023-10-26 22:49:00,227 >> PyTorch: setting up devices
[INFO|training_args.py:1480] 2023-10-26 22:49:00,228 >> The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args.py:1327] 2023-10-26 22:49:00,232 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by saf

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading took 0.0 min
2023-10-26 22:49:00,704 - datasets.download.download_manager - INFO - Downloading took 0.0 min
Checksum Computation took 0.0 min
2023-10-26 22:49:00,709 - datasets.download.download_manager - INFO - Checksum Computation took 0.0 min


Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split
2023-10-26 22:49:00,720 - datasets.builder - INFO - Generating train split


Generating train split: 0 examples [00:00, ? examples/s]

Unable to verify splits sizes.
2023-10-26 22:49:01,015 - datasets.utils.info_utils - INFO - Unable to verify splits sizes.
Dataset json downloaded and prepared to /home/tensorflow/.cache/huggingface/datasets/json/default-b2a0fbbcd7ed5552/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.
2023-10-26 22:49:01,017 - datasets.builder - INFO - Dataset json downloaded and prepared to /home/tensorflow/.cache/huggingface/datasets/json/default-b2a0fbbcd7ed5552/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.
Using custom data configuration default-b2a0fbbcd7ed5552
2023-10-26 22:49:01,364 - datasets.builder - INFO - Using custom data configuration default-b2a0fbbcd7ed5552
Loading Dataset Infos from /home/tensorflow/miniconda3/lib/python3.9/site-packages/datasets/packaged_modules/json
2023-10-26 22:49:01,366 - datasets.info - INFO - Loading Dataset Infos from /home/tensorflow/

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

[INFO|modeling_utils.py:3551] 2023-10-26 22:49:19,849 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:3559] 2023-10-26 22:49:19,850 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /models/llama-7b-hf/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:728] 2023-10-26 22:49:19,856 >> loading configuration file /models/llama-7b-hf/generation_config.json
[INFO|configuration_utils.py:768] 2023-10-26 22:49:19,857 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0,
  "transformers_version": "4.32.1"
}



Map:   0%|          | 0/52002 [00:00<?, ? examples/s]

Caching processed dataset at /home/tensorflow/.cache/huggingface/datasets/json/default-b2a0fbbcd7ed5552/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-3385e6533503a56f.arrow
2023-10-26 22:49:22,321 - datasets.arrow_dataset - INFO - Caching processed dataset at /home/tensorflow/.cache/huggingface/datasets/json/default-b2a0fbbcd7ed5552/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-3385e6533503a56f.arrow
2023-10-26 22:50:08,772 - intel_extension_for_transformers.llm.finetuning.finetuning - INFO - Splitting train dataset in train and validation according to `eval_dataset_size`
Caching indices mapping at /home/tensorflow/.cache/huggingface/datasets/json/default-b2a0fbbcd7ed5552/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-a294c087a88aacdd.arrow
2023-10-26 22:50:08,776 - datasets.arrow_dataset - INFO - Caching indices mapping at /home/tensorflow/.cache/huggingface/datasets/json/default-b2a0fbbcd7ed55

trainable params: 19,988,480 || all params: 6,758,404,096 || trainable%: 0.2957573965106688


[INFO|trainer.py:1714] 2023-10-26 22:51:12,473 >> ***** Running training *****
[INFO|trainer.py:1715] 2023-10-26 22:51:12,474 >>   Num examples = 51,502
[INFO|trainer.py:1716] 2023-10-26 22:51:12,475 >>   Num Epochs = 3
[INFO|trainer.py:1717] 2023-10-26 22:51:12,475 >>   Instantaneous batch size per device = 4
[INFO|trainer.py:1720] 2023-10-26 22:51:12,475 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1721] 2023-10-26 22:51:12,476 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:1722] 2023-10-26 22:51:12,476 >>   Total optimization steps = 19,314
[INFO|trainer.py:1723] 2023-10-26 22:51:12,481 >>   Number of trainable parameters = 19,988,480


Step,Training Loss


KeyboardInterrupt: 

Finetune the model on the summarization task:

In [4]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="/models/llama-7b-hf/")
data_args = DataArguments(dataset_name="cnn_dailymail", dataset_config_name="3.0.0")
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True
)
finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

[INFO|training_args.py:1327] 2023-10-26 22:53:50,922 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by safetensors please feel free to open an issue at https://github.com/huggingface/safetensors!
[INFO|training_args.py:1769] 2023-10-26 22:53:50,924 >> PyTorch: setting up devices
[INFO|training_args.py:1480] 2023-10-26 22:53:50,925 >> The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args.py:1327] 2023-10-26 22:53:50,927 >> Found safetensors installation, but --save_safetensors=False. Safetensors should be a preferred weights saving format due to security and performance reasons. If your model cannot be saved by saf

Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

[INFO|modeling_utils.py:3551] 2023-10-26 22:54:08,116 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:3559] 2023-10-26 22:54:08,117 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /models/llama-7b-hf/.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:728] 2023-10-26 22:54:08,123 >> loading configuration file /models/llama-7b-hf/generation_config.json
[INFO|configuration_utils.py:768] 2023-10-26 22:54:08,123 >> Generate config GenerationConfig {
  "_from_model_config": true,
  "bos_token_id": 0,
  "eos_token_id": 1,
  "pad_token_id": 0,
  "transformers_version": "4.32.1"
}



KeyError: 'instruction'

Finetune the model on the code generation task:

In [None]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="/models/llama-7b-hf/")
data_args = DataArguments(dataset_name="theblackcat102/evol-codealpaca-v1")
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True
)
finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)