# **Single Node Finetuning of Tiny LLama using Intel Xeon SPR**

## Prepare Environment

Run the following line by line in the terminal

In [None]:
conda create -n itrex-1 python=3.10 -y
conda activate itrex-1
pip install intel-extension-for-transformers
git clone https://github.com/eternalflame02/Single-Node-FInetuning-of-Tiny-LLama-using-Intel-Xeon-SPR.git
cd ./Single-Node-FInetuning-of-Tiny-LLama-using-Intel-Xeon-SPR/Fine Tuning/
pip install -r requirements.txt
huggingface-cli login
python3 -m pip install jupyter ipykernel
python3 -m ipykernel install --name neural-chat--user

Create a token in https://huggingface.co/settings/tokens insert them in the huggingface longin interface.

Run rest of the cell.

In [None]:
%cd ./Single-Node-FInetuning-of-Tiny-LLama-using-Intel-Xeon-SPR/Fine Tuning/
!pip install -r requirements.txt
%cd ../../../

## Prepare the Dataset

Text Generation (General domain instruction): We use the Alpaca dataset(https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, alpaca_data.json. In Alpaca, researchers have manually crafted 175 seed tasks to guide text-davinci-003 in generating 52K instruction data for diverse tasks.

In [None]:
!curl https://github.com/tatsu-lab/stanford_alpaca/raw/main/alpaca_data.json

## Finetune Your Chatbot
We employ the LoRA approach to finetune the LLM efficiently.

Finetune the TinyLlama on Alpaca-format dataset to conduct text generation:

In [None]:
# Imports
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model

import os

# Define model arguments
model_args = ModelArguments(model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0")

# Define data arguments
data_args = DataArguments(train_file="alpaca_data.json", validation_split_percentage=1)

# Define training arguments
training_args = TrainingArguments(
    output_dir='tinyllama',
    overwrite_output_dir=True,
    do_train=True,
    do_eval=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=2,
    learning_rate=4e-5,  # Adjust learning rate
    weight_decay=0.01,  # Add weight decay
    num_train_epochs=1,  # Increase epochs for better training
    save_steps=500,
    save_total_limit=2,
    logging_steps=100,
    evaluation_strategy="steps",
    load_best_model_at_end=True,
    seed=42,  # Set random seed
    bf16=True,
    '''resume_from_checkpoint=True'''  # Add this to load from latest checkpoint incase the finetuning fails after creation of a checkpoint.
)

# Define finetuning arguments
finetune_args = FinetuningArguments()

# Create finetuning configuration
finetune_cfg = TextGenerationFinetuningConfig(
    model_args=model_args,
    data_args=data_args,
    training_args=training_args,
    finetune_args=finetune_args,
)

# Start the finetuning process
finetune_model(finetune_cfg)

After Finetuning create a duplicate of adapter_config.json and rename it to config.json

## Deploying Chatbot

Customize chatbot with the new llm model.

In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot  # Import function to build a chatbot
from intel_extension_for_transformers.neural_chat import PipelineConfig  # Import configuration for the chatbot pipeline
from intel_extension_for_transformers.neural_chat.config import LoadingModelConfig  # Import configuration for loading the model

# Create a pipeline configuration specifying the model and loading options
config = PipelineConfig(
    model_name_or_path="./tinyllama",
    loading_config=LoadingModelConfig(
        peft_path="./tinyllama"  # Path to the PEFT model (fine-tuned model)
    )
)

# Build a chatbot instance using the defined configuration
chatbot = build_chatbot(config)

# Generate a response to the user query
query = "Tell me about AI."
response = chatbot.predict(query=query)

# Print the generated response
print(response)