# EndToEnd TalkingBot on PC client (Windows)

> make sure you are running in a conda environment

[Intel® Extension for Transformers Neural Chat](https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/neural_chat) provides a lot of plugins to meet different users' scenarios. In this notebook we will show you how to create a TalkingBot on your local laptop with **Intel CPU** (no GPU needed).

Behind the scene, a TalkingBot is composed of a pipeline of
1. recognize user's prompt audio and convert to text
2. text understanding and question answering by Large Language Models
2. convert answer text to speech

This is a notebook to let you know how to create such a TalkingBot on PC. Make sure that you have at least 50GB disk memory for loading and converting LLM.

## Audio To Text

In [None]:
!curl -O https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample_2.wav
!curl -O https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/speaker_embeddings/spk_embed_default.pt

In [None]:
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio.asr import AudioSpeechRecognition

In [None]:
from IPython.display import Audio
Audio(r"./sample_2.wav", rate=16000)

In [None]:
asr = AudioSpeechRecognition(model_name_or_path="openai/whisper-tiny")

In [None]:
in_text = asr.audio2text(r"./sample_2.wav")
print(in_text)

## LLM

### Optimize the model with quantization to do inference

This conversion will generate a quantized LLM model under `runtime_outs/`. Next time it will load the model directly without re-quantization.

In [None]:
# Get the quantized model
from transformers import AutoTokenizer, TextStreamer
from neural_speed import Model
from intel_extension_for_transformers.transformers import RtnConfig
from intel_extension_for_transformers.transformers import AutoModel

model_name = "meta-llama/Llama-2-7b-chat-hf"    # You can first download the model and replace this model_name with the local path
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
prompt = in_text
inputs = tokenizer(prompt, return_tensors="pt").input_ids


woq_config = RtnConfig(bits=8, compute_dtype="int8", weight_dtype="int8")
streamer = TextStreamer(tokenizer)
model = AutoModel.from_pretrained(model_name, quantization_config=woq_config, trust_remote_code=True)
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=100)   # Change the max_new_tokens here to control the output length
output_text = tokenizer.batch_decode(outputs)[0]

## Text To Speech

This is to convert the output text to audio and saved the output as `output.wav`.

In [None]:
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio.tts import TextToSpeech
tts = TextToSpeech()
result_path = tts.text2speech(output_text[:290], "output.wav")  # Truncate part of the input text as you needed

In [None]:
from IPython.display import Audio
Audio(r"./output.wav", rate=16000)