# EndToEnd TalkingBot on PC client (Windows)

> make sure you are running in a conda environment with Python 3.10

[Intel® Extension for Transformers Neural Chat](https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/neural_chat) provides a lot of plugins to meet different users' scenarios. In this notebook we will show you how to create a TalkingBot on your local laptop with **Intel CPU** (no GPU needed).

Behind the scene, a TalkingBot is composed of a pipeline of
1. recognize user's prompt audio and convert to text
2. text understanding and question answering by Large Language Models
2. convert answer text to speech

This is a notebook to let you know how to create such a TalkingBot on PC.

## Audio To Text

In [None]:
!curl -O https://raw.githubusercontent.com/intel/intel-extension-for-transformers/main/intel_extension_for_transformers/neural_chat/assets/audio/sample.wav

In [None]:
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio.asr import AudioSpeechRecognition

In [None]:
from IPython.display import Audio
Audio(r"./sample.wav", rate=16000)

In [None]:
asr = AudioSpeechRecognition(model_name_or_path="openai/whisper-tiny")

In [None]:
text = asr.audio2text(r"./sample.wav")
print(text)

## LLM

### Directly load given int4 model to do inference

Here for quick demo, we just use a given int4 model to generate text. If you want to convert your int4 model manually, please refer to next cell.

In [None]:
from intel_extension_for_transformers.llm.runtime.graph import Model
model = Model()
model.bin_file = r"mpt_q4_0.bin"
model.init_from_bin("mpt", model.bin_file, max_new_tokens=32, seed=12)
prompt = text
output = model.generate(prompt)
print(output)

### Convert int4 model to do inference

In [None]:
from intel_extension_for_transformers.transformers import AutoModel, WeightOnlyQuantConfig
model_name = r"THUDM/ChatGLM2-6B"
woq_config = WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4")
model = AutoModel.from_pretrained(model_name, quantization_config=woq_config, use_llm_runtime=True, trust_remote_code=True)
prompt = text
output = model.generate(prompt, max_new_tokens=32)

## Text To Speech

In [None]:
from intel_extension_for_transformers.neural_chat.pipeline.plugins.audio.tts import TextToSpeech

In [None]:
tts = TextToSpeech()

In [None]:
result_path = tts.text2speech("Hello there, I am your Talking Bot!", "output.wav")

In [None]:
from IPython.display import Audio
Audio(result_path, rate=16000)