NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook is used to demostrate how to deploy a talking chatbot as a service on Intel® Data Center GPU Flex Series 170, Intel® Data Center GPU Max Series and Intel® Arc™ A-Series GPUs.

# Prepare Environment

Install Intel® Extension for Transformers*

In [5]:
!pip install intel-extension-for-transformers

Refer to [Install Intel® Extension for PyTorch* from source](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/installations/linux.html#install-via-compiling-from-source) to build xpu version of torch, torchaudio and Intel® Extension for PyTorch*, and install generated wheels using pip.

Install Requirements

In [None]:
!git clone https://github.com/intel/intel-extension-for-transformers.git
!cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/
!pip install -r requirements_xpu.txt

Install requirements that have denpendency on stock pytorch

In [None]:
!pip install --no-deps peft speechbrain optimum optimum-intel sentence_transformers lm_eval accelerate

Notes: If you face "GLIBCXX_3.4.30" not found issue in conda environment, please remove lib/libstdc++* from conda environment. 

# Client-Server Architecture for Performance and Scalability

## Quick Start Local Server

❗ Please notice that the server is running on the background. 

In [None]:
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/server/config/neuralchat.yaml

In [None]:
import time
import multiprocessing
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
import nest_asyncio
nest_asyncio.apply()

def start_service():
    server_executor = NeuralChatServerExecutor()
    server_executor(config_file="neuralchat.yaml", log_file="neuralchat.log")
multiprocessing.Process(target=start_service).start()

## Access Text Chat Service 

In [None]:
from intel_extension_for_transformers.neural_chat import TextChatClientExecutor
executor = TextChatClientExecutor()
result = executor(
    prompt="Tell me about Intel Xeon Scalable Processors.",
    server_ip="127.0.0.1", # master server ip
    port=8000 # master server entry point 
    )
print(result.text)

## Access Voice Chat Service

In [None]:
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/speaker_embeddings/spk_embed_default.pt

In [None]:
from neural_chat import VoiceChatClientExecutor
executor = VoiceChatClientExecutor()
result = executor(
    audio_input_path='sample.wav',
    audio_output_path='results.wav',
    server_ip="127.0.0.1", # master server ip
    port=8000 # master server entry point 
    )


In [None]:
import IPython
# Play input audio
print("     Play Input Audio ......")
IPython.display.display(IPython.display.Audio("sample.wav"))
# Play output audio
print("     Play Output Audio ......")
IPython.display.display(IPython.display.Audio("welcome.wav"))
