# Finetune Your Chatbot on a Single Node Xeon SPR 

NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook will introduce how to finetune your chatbot on the customized data on a single node Xeon SPR.

## Prepare Environment

Install intel extension for transformers:

In [None]:
!pip install intel-extension-for-transformers

Install Requirements:

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git

In [None]:
%cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements.txt
%cd ../../../

## Prepare the Dataset
We select 3 kind of datasets to conduct the finetuning process for different tasks.

1. Text Generation (General domain instruction): We use the [Alpaca dataset](https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). In Alpaca, researchers have manually crafted 175 seed tasks to guide `text-davinci-003` in generating 52K instruction data for diverse tasks.

2. Summarization: An English-language dataset [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail) containing just over 300k unique news articles as written by journalists at CNN and the Daily Mail, is used for this task.

3. Code Generation: To enhance code performance of LLMs (Large Language Models), we use the [theblackcat102/evol-codealpaca-v1](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1).



## Finetune Your Chatbot

We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently.

Finetune the model on Alpaca-format dataset to conduct text generation:

In [1]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="meta-llama/Llama-2-7b-hf")
data_args = DataArguments(train_file="shortened_finance_alpaca.json", validation_split_percentage=1)
training_args = TrainingArguments(
    output_dir='./tmp',
    do_train=True,
    do_eval=True,
    num_train_epochs=1,
    overwrite_output_dir=True,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True,
    learning_rate=5e-5,
)
finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

2024-07-07 13:13:01.011953: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-07 13:13:01.770192: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-07 13:13:01.972084: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-07 13:13:01.972172: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-07 13:13:02.153124: I tensorflow/core/platform/cpu_feature_gua

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

[INFO|modeling_utils.py:4280] 2024-07-07 13:13:52,163 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4288] 2024-07-07 13:13:52,165 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:917] 2024-07-07 13:13:52,257 >> loading configuration file generation_config.json from cache at /home/u5967164adf7529c9c911b5ad430e65f/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/generation_config.json
[INFO|configuration_utils.py:962] 2024-07-07 13:13:52,258 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9
}

Loadin

trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.06220594176090199


[INFO|trainer.py:2078] 2024-07-07 13:13:52,925 >> ***** Running training *****
[INFO|trainer.py:2079] 2024-07-07 13:13:52,926 >>   Num examples = 1,188
[INFO|trainer.py:2080] 2024-07-07 13:13:52,927 >>   Num Epochs = 1
[INFO|trainer.py:2081] 2024-07-07 13:13:52,928 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:2084] 2024-07-07 13:13:52,929 >>   Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|trainer.py:2085] 2024-07-07 13:13:52,929 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2086] 2024-07-07 13:13:52,930 >>   Total optimization steps = 74
[INFO|trainer.py:2087] 2024-07-07 13:13:52,932 >>   Number of trainable parameters = 4,194,304


Step,Training Loss


[INFO|trainer.py:2329] 2024-07-07 14:32:16,611 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3410] 2024-07-07 14:32:16,616 >> Saving model checkpoint to ./tmp
[INFO|tokenization_utils_base.py:2513] 2024-07-07 14:32:16,651 >> tokenizer config file saved in ./tmp/tokenizer_config.json
[INFO|tokenization_utils_base.py:2522] 2024-07-07 14:32:16,653 >> Special tokens file saved in ./tmp/special_tokens_map.json
2024-07-07 14:32:16,674 - finetuning.py - intel_extension_for_transformers.transformers.llm.finetuning.finetuning - INFO - *** Evaluate After Training***
[INFO|trainer.py:3719] 2024-07-07 14:32:16,680 >> ***** Running Evaluation *****
[INFO|trainer.py:3721] 2024-07-07 14:32:16,681 >>   Num examples = 12
[INFO|trainer.py:3724] 2024-07-07 14:32:16,681 >>   Batch size = 4


***** eval metrics *****
  epoch                   =     0.9933
  eval_loss               =     0.5934
  eval_ppl                =       1.81
  eval_runtime            = 0:00:10.90
  eval_samples            =         12
  eval_samples_per_second =        1.1
  eval_steps_per_second   =      0.275


In [4]:
!pip install huggingface_hub


Defaulting to user installation because normal site-packages is not writeable


Upload finetuned model on Hugging Face hub

In [None]:
from huggingface_hub import HfApi, upload_folder

repo_name = "madanarnav/finance-llama-v4"


model_path = "./tmp"

api = HfApi()

api.create_repo(repo_name, private=False)

upload_folder(
    folder_path=model_path,
    repo_id=repo_name,
    repo_type="model",
    commit_message="Initial commit of the fine-tuned model"
)

