NeuralChat is a customizable chat framework designed to create user own chatbot within few minutes on multiple architectures. This notebook is used to demostrate how to deploy a talking chatbot as a service on Habana's Gaudi processors(HPU).

# Prepare Environment

In order to streamline the process, users can construct a Docker image employing a Dockerfile, initiate the Docker container, and then proceed to execute inference or finetuning operations.

**IMPORTANT:** Please note Habana's Gaudi processors(HPU) requires docker environment for running. User needs to manually execute below steps to build docker image and run docker container for inference and finetuning on Habana HPU. The Jupyter notebook server should be started in the docker container and then run this Jupyter notebook. 

```bash
git clone https://github.com/intel/intel-extension-for-transformers.git
cd ./intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/docker/
docker build --build-arg UBUNTU_VER=22.04 -f Dockerfile -t neuralchat . --target hpu
docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host neuralchat:latest
```

# Client-Server Architecture for Performance and Scalability

## Quick Start Local Server

❗ Please notice that the server is running on the background. 

In [None]:
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/server/config/neuralchat.yaml

In [None]:
import time
import multiprocessing
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor
import nest_asyncio
nest_asyncio.apply()

def start_service():
    server_executor = NeuralChatServerExecutor()
    server_executor(config_file="neuralchat.yaml", log_file="neuralchat.log")
multiprocessing.Process(target=start_service).start()

## Access Text Chat Service 

In [None]:
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav
!curl -OL https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/speaker_embeddings/spk_embed_default.pt

In [None]:
from neural_chat import TextChatClientExecutor
executor = TextChatClientExecutor()
result = executor(
    prompt="Tell me about Intel Xeon Scalable Processors.",
    server_ip="127.0.0.1", # master server ip
    port=8000 # master server entry point 
    )
print(result.text)

## Access Voice Chat Service

In [None]:
from neural_chat import VoiceChatClientExecutor
executor = VoiceChatClientExecutor()
result = executor(
    audio_input_path='sample.wav',
    audio_output_path='results.wav',
    server_ip="127.0.0.1", # master server ip
    port=8000 # master server entry point 
    )


In [None]:
import IPython
# Play input audio
print("     Play Input Audio ......")
IPython.display.display(IPython.display.Audio("sample.wav"))
# Play output audio
print("     Play Output Audio ......")
IPython.display.display(IPython.display.Audio("welcome.wav"))
