NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook is used to demostrate how to build a talking chatbot on IntelÂ® Data Center GPU Flex Series 170, IntelÂ® Data Center GPU Max Series and IntelÂ® Arcâ„¢ A-Series GPUs.

# Prepare Environment

Install requirements

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git

In [None]:
%cd ./intel-extension-for-transformers/
!pip install -r requirements.txt
%cd ./intel_extension_for_transformers/neural_chat/
!pip install -r requirements_xpu.txt

Install IntelÂ® Extension for Transformers*

In [None]:
%cd ../../
!pip install v .
%cd ..

Refer to [Download oneapi](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) to install oneapi base toolkit, and run the command below.

In [None]:
!source /opt/intel/oneapi/setvars.sh

Refer to [Install IntelÂ® Extension for PyTorch* from source](https://intel.github.io/intel-extension-for-pytorch/index.html#installation) to build xpu version of torch, torchaudio and IntelÂ® Extension for PyTorch*, and install generated wheels using pip.

Install requirements that have denpendency on stock pytorch

In [None]:
!pip install --no-deps peft speechbrain optimum optimum-intel sentence_transformers lm_eval accelerate

Notes: If you face "GLIBCXX_3.4.30" not found issue in conda environment, please remove lib/libstdc++* from conda environment. 

# Prepare the model

Make sure to request access at https://huggingface.co/meta-llama/Llama-2-7b-chat-hf and pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>.

# Inference ðŸ’»

## Text Chat

Giving NeuralChat the textual instruction, it will respond with the textual response.

In [None]:
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import PipelineConfig
config = PipelineConfig(device='xpu')
chatbot = build_chatbot(config)
response = chatbot.predict("Tell me about Intel Xeon Scalable Processors.")
print(response)

## Text Chat With RAG Plugin

User could also leverage NeuralChat RAG plugin to do domain specific chat by feding with some documents like below

In [None]:
!mkdir docs
%cd docs
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/4th Generation IntelÂ® XeonÂ® Scalable Processors Product Specifications.html
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.jsonl
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.txt
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/docs/sample.xlsx
%cd ..

In [None]:
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat import plugins
plugins.retrieval.enable=True
plugins.retrieval.args["input_path"]="./docs/"
config = PipelineConfig(plugins=plugins, device='xpu')
chatbot = build_chatbot(config)
response = chatbot.predict("How many cores does the IntelÂ® XeonÂ® Platinum 8480+ Processor have in total?")
print(response)

## Voice Chat with ATS & TTS Plugin

In the context of voice chat, users have the option to engage in various modes: utilizing input audio and receiving output audio, employing input audio and receiving textual output, or providing input in textual form and receiving audio output.

For the Python API code, users have the option to enable different voice chat modes by setting ASR and TTS plugins enable or disable.

In [None]:
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/speaker_embeddings/spk_embed_default.pt
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav

In [None]:
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat import build_chatbot, plugins
plugins.asr.enable = True
plugins.tts.enable = True
plugins.tts.args["output_audio_path"]="./output_audio.wav"
config = PipelineConfig(plugins=plugins, device='xpu')
chatbot = build_chatbot(config)
result = chatbot.predict(query="./sample.wav")
print(result)

You can display the generated wav file using IPython.