# QuickStart: Intel® Extension For Transformers*: NeuralChat on 4th Generation Intel® Xeon® Scalable Processors

## Prepare Environment

Follow the README to install the necessary requirements to run this tutorial. In summary, you will need to install the following:
-  Intel(R) Extension for Transformers* from source (to get latest updates)
-  NeuralChat requirements
-  Retrieval Plugin Requirements
-  Audio Plugin (TTS and ASR) Requirements
  

Check hardware

In [None]:
!lscpu

## Building a simple chatbot


In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig
config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1')
chatbot = build_chatbot(config)
response = chatbot.predict(query="Tell me about Intel Xeon Scalable Processors.")
print(response)

## Optimizing your chatbot
Enable mixed precision with bfloat16

In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig
from intel_extension_for_transformers.transformers import MixedPrecisionConfig
config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1',
                        optimization_config=MixedPrecisionConfig(dtype='bfloat16'))
chatbot = build_chatbot(config)
response = chatbot.predict(query="Tell me about Intel Xeon Scalable Processors.")
print(response)

INT4 weight only quantization

In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig
from intel_extension_for_transformers.transformers import WeightOnlyQuantConfig
from intel_extension_for_transformers.neural_chat.config import LoadingModelConfig
config = PipelineConfig(model_name_or_path="Intel/neural-chat-7b-v3-1",
                        optimization_config=WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4_fullrange"), 
                        loading_config=LoadingModelConfig(use_neural_speed=False))
chatbot = build_chatbot(config)
response = chatbot.predict(query="Tell me about Intel Xeon Scalable Processors.")
print(response)

## Customizing your chatbot
### Plugin: Retrieval
Without retrieval plugin

In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig
config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1')
chatbot = build_chatbot(config)
response = chatbot.predict(query="How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?")
print(response)

With retrieval plugin

In [None]:
!mkdir docs
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.txt
!mv sample.txt ./docs

In [None]:
!cat ./docs/sample.txt

In [None]:
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import plugins
plugins.retrieval.enable=True
plugins.retrieval.args["input_path"]="./docs/sample.txt"
config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1',
                        plugins=plugins)
chatbot = build_chatbot(config)
response = chatbot.predict(query="How many cores does the Intel® Xeon® Platinum 8480+ Processor have in total?")
print(response)

plugins.retrieval.enable=False # disable retrieval

### Plugin: ASR & TTS
Enable voice chat

In [None]:
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav

In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot, PipelineConfig
from intel_extension_for_transformers.neural_chat import plugins
plugins.tts.enable = True
plugins.tts.args["output_audio_path"] = "./response.wav"
plugins.asr.enable = True

config = PipelineConfig(model_name_or_path='Intel/neural-chat-7b-v3-1',
                        plugins=plugins)
chatbot = build_chatbot(config)
result = chatbot.predict(query="./sample.wav")
print(result)

plugins.tts.enable = False
plugins.asr.enable = False

### [Optional]: Fine-tuning

We use the [Alpaca dataset](https://github.com/tatsu-lab/stanford_alpaca) from Stanford University as the general domain dataset to fine-tune the model. This dataset is provided in the form of a JSON file, [alpaca_data.json](https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json). In Alpaca, researchers have manually crafted 175 seed tasks to guide `text-davinci-003` in generating 52K instruction data for diverse tasks.

In [None]:
!curl -OL https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json

Finetune the model on Alpaca-format dataset to conduct text generation.

We employ the [LoRA approach](https://arxiv.org/pdf/2106.09685.pdf) to finetune the LLM efficiently.

In [None]:
from transformers import TrainingArguments
from intel_extension_for_transformers.neural_chat.config import (
    ModelArguments,
    DataArguments,
    FinetuningArguments,
    TextGenerationFinetuningConfig,
)
from intel_extension_for_transformers.neural_chat.chatbot import finetune_model
model_args = ModelArguments(model_name_or_path="Intel/neural-chat-7b-v3-1")
data_args = DataArguments(train_file="alpaca_data.json")
training_args = TrainingArguments(
    output_dir='./finetuned_model_path',
    do_train=True,
    do_eval=True,
    num_train_epochs=3,
    overwrite_output_dir=True,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=1,
    save_strategy="no",
    log_level="info",
    save_total_limit=2,
    bf16=True
)
finetune_args = FinetuningArguments()
finetune_cfg = TextGenerationFinetuningConfig(
            model_args=model_args,
            data_args=data_args,
            training_args=training_args,
            finetune_args=finetune_args,
        )
finetune_model(finetune_cfg)

Load the fine tuned model

In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat.config import LoadingModelConfig

config = PipelineConfig(model_name_or_path="Intel/neural-chat-7b-v3-1",
                      loading_config=LoadingModelConfig(peft_path="./finetuned_model_path"))
chatbot = build_chatbot(config)
response = chatbot.predict(query="Tell me about Intel Xeon Scalable Processors.")
print(response)

### Congratulations! You have completed the NeuralChat quickstart
Visit [notebooks directory](https://github.com/intel/intel-extension-for-transformers/blob/c30353fcb0e5ceab440a7508b5980ccebcac8750/intel_extension_for_transformers/neural_chat/docs/full_notebooks.md) to see more examples