# Voice assistance to fill in a questionnaire
<a href="hhttps://colab.research.google.com/drive/141lOrftQ8a0_QnO83Xa-hZb_7YRmg1Fa?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> ![Maintainer](https://dataroots.io/maintained-rnd.svg)

This notebook aims to show a small pipeline where a voice assistance guides a user through a questionnaire. It uses Whisper, openAI chatGPT and coqui TTS.

## 1. Prerequisites

In [None]:
# Download the code from Github
!git clone https://github.com/datarootsio/onwheels.git
%cd onwheels/voiceassistance/notebooks/

Cloning into 'onwheels'...
remote: Enumerating objects: 1492, done.[K
remote: Counting objects: 100% (373/373), done.[K
remote: Compressing objects: 100% (257/257), done.[K
remote: Total 1492 (delta 201), reused 270 (delta 109), pack-reused 1119[K
Receiving objects: 100% (1492/1492), 62.36 MiB | 16.42 MiB/s, done.
Resolving deltas: 100% (861/861), done.
/content/onwheels/voiceassistance/notebooks


In [None]:
!sudo apt-get install espeak

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  espeak-data libespeak1 libportaudio2 libsonic0
The following NEW packages will be installed:
  espeak espeak-data libespeak1 libportaudio2 libsonic0
0 upgraded, 5 newly installed, 0 to remove and 16 not upgraded.
Need to get 1,382 kB of archives.
After this operation, 3,178 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libportaudio2 amd64 19.6.0-1.1 [65.3 kB]
Get:2 http://archive.ubuntu.com/ubuntu jammy/main amd64 libsonic0 amd64 0.2.0-11build1 [10.3 kB]
Get:3 http://archive.ubuntu.com/ubuntu jammy/universe amd64 espeak-data amd64 1.48.15+dfsg-3 [1,085 kB]
Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libespeak1 amd64 1.48.15+dfsg-3 [156 kB]
Get:5 http://archive.ubuntu.com/ubuntu jammy/universe amd64 espeak amd64 1.48.15+dfsg-3 [64.2 kB]
Fetched 1,382 kB in 2s (562 kB

In [None]:
# Install the requirements
!pip install -r requirements.txt

Collecting guardrails-ai@ git+https://github.com/sophieDataroots/guardrails.git (from -r requirements.txt (line 4))
  Cloning https://github.com/sophieDataroots/guardrails.git to /tmp/pip-install-cbhb09yu/guardrails-ai_d57cc49b9bdf4ece9fff6fd8a234a189
  Running command git clone --filter=blob:none --quiet https://github.com/sophieDataroots/guardrails.git /tmp/pip-install-cbhb09yu/guardrails-ai_d57cc49b9bdf4ece9fff6fd8a234a189
  Resolved https://github.com/sophieDataroots/guardrails.git to commit e80a86759f70d5e4b5e998e3d7e055250bf6c832
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting openai-whisper@ git+https://github.com/openai/whisper.git (from -r requirements.txt (line 6))
  Cloning https://github.com/openai/whisper.git to /tmp/pip-install-cbhb09yu/openai-whisper_4846ea94514045758685311f3f3ec4b6
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-install-cbhb09yu/openai-whisper_4846ea94514045758685311f3f3ec4b6
  Resolv

## 2. Upload a voice message

In [None]:
%cd onwheels/voiceassistance/notebooks/
from google.colab import files

uploaded = files.upload()
input_audio = list(uploaded.keys())[0]

/content/onwheels/voiceassistance/notebooks


Saving input.wav to input.wav


## 3. Load the models

In [None]:
import sys

# setting path
sys.path.append("..")

# helper files
from src.voice_assistant import load_models, create_start_messages

# load the models
whisper_model, tts_models = load_models()

# create the start messages if they don't already exist
audio_files, start_messages = create_start_messages(tts_models)

100%|█████████████████████████████████████| 1.42G/1.42G [00:16<00:00, 91.0MiB/s]


 > Downloading model to ../../data/models/NLPmodels/tts/tts_models--en--ljspeech--vits
 > Model's license - apache 2.0
 > Check https://choosealicense.com/licenses/apache-2.0/ for more info.
 > Using model: vits
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:0
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:None
 | > fft_size:1024
 | > power:None
 | > preemphasis:0.0
 | > griffin_lim_iters:None
 | > signal_norm:None
 | > symmetric_norm:None
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:None
 | > pitch_fmax:None
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:1.0
 | > clip_norm:True
 | > do_trim_silence:False
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Downloading model to ../../data/models/N

##4. Set your parameters

In [None]:
language = "Guess..."  # @param ["English","Nederlands","Francais","Deutsch","Guess..."]
openai_api_key = ""  # @param {type:"string"}
memory = ""  # @param {type:"string"}

## 5. Process the audio

In [None]:
# helper files
from src.voice_assistant import talk
from src.config import Config
import openai

# available languages
config = Config()
lang = config.languages[language] or None

# set openai key
openai.api_key = openai_api_key

# process the input audio
def voice_assistant(
    input_audio: str = "", lang: str = None, chat_history: list = None, memory: str = ""
):
    # welcomes message
    start_message = start_messages[lang or "en"] or None
    chat_history = chat_history or [(None, start_message)]

    # Talk to the voice assistant
    answer_file, _, chat_history, memory = talk(
        input_audio,
        lang,
        chat_history=chat_history,
        memory=memory,
        whisper_model=whisper_model,
        tts_models=tts_models,
    )

    # Also save the variables to remember inside the header of the response
    return answer_file, chat_history, memory


answer_file, chat_history, memory = voice_assistant(input_audio, lang, memory=memory)

Whisper: Guess language...
Got language: nl
Whisper: start transcribing...
Whisper transcribed: Ik sta voor een restaurant. Er zijn vier treden voor de deur en de deur is een meter breed. Het toilet bevindt zich op het eerste getiep.
LLM: extract info...
Coqui TTS: ask questions...
 > Text splitted to sentences.
['Wat is de naam van de locatie?', 'Welk soort eten verkopen ze hier?', 'Kies uit de volgende opties: Afrikaans, Aziatisch, dessert bakkerij, Belgische frituur, Mediteraans, Midden Oosten of Westers?', 'Zijn er vegan opties?']
 > Processing time: 0.43473386764526367
 > Real-time factor: 0.02467942047077892


## 6. Listen to the result

In [None]:
from IPython.display import Audio
from IPython.display import display

response = Audio(answer_file, autoplay=True)
display(response)