Rhasspy is a magnificent voice management software. It's open-source, multi-platform and allows API to:

* Manage mic and speakers
* Runs speech recognition that is based on predefined sentences
* Has a text-to-speech features, although it's less impressive
* Manages wake-up-word

In other words, Rhasspy is a solid foundation for our home assistant.

Simplest way to install it is Docker. First, you need to have Docker on your local machine. Then run the following cell with uncommented last line:

In [3]:
from kaia.infra import ConsoleExecutor, Loc

def run_rhasspy():
    ConsoleExecutor.wait = True
    ConsoleExecutor.execute(
        'docker run -d -p 12101:12101 --name rhasspy '
        '--restart unless-stopped '
        f'-v "{Loc.data_folder/"rhasspy/profiles:/profiles"}" '
        '-v "/etc/localtime:/etc/localtime:ro" '
        #'--device /dev/snd:/dev/snd '
        'rhasspy/rhasspy '
        '--user-profiles /profiles '
        '--profile en'
    )

#run_rhasspy()

This will download everything that is needed.

You only need to do it once, as Rhasspy adds itself to the docker startup and starts atomatically when the system boots (or, in Windows, when Docker starts).

The command _will not_ connect Rhasspy to your microphone or speakers, as we are not intended to use this functionality right now.

You can now open Rhasspy and see what's there. You won't need to configure it manually, as in the following cells we'll configure Rhasspy via api.

In [1]:
from IPython.display import HTML

ADDRESS = 'http://127.0.0.1:12101'

HTML(f'<a href="{ADDRESS}" target="_blank">Open Rhasspy</a>')

Open the link, set "Kaldi" for "Speech-to-text" and "Fsticuffs" for "Intent recognition". Save and restart, then click "Download" on the top of the page. This will configure Rhasspy for the functionality we need.

It is sure possible to achieve programmatically via API, but I failed to do it fast and decided not to dive into this topic.

Now let's try Rhasspy in action. First, let's reproduce steps from the previous notebooks and create and audio file.

In [2]:
from kaia.persona.dub.languages.en import Template, CardinalDub, PluralAgreement, DubbingPack
from ipywidgets import Audio

template = Template(
    'It is {hours} {hours_word} and {minutes} {minutes_word}',
    hours = CardinalDub(0, 24),
    hours_word = PluralAgreement('hours', 'hour', 'hours'),
    minutes = CardinalDub(0, 60),
    minutes_word = PluralAgreement('minutes', 'minute', 'minutes')
).with_name('Intent')

pack = DubbingPack.from_zip(Loc.temp_folder/'demos/dubbing/sample_pack', 'files/sample_dubbing.zip')
dubber = pack.create_dubber()

s = template.to_str(dict(hours=15, minutes=32))
print(s)
fname = dubber.dub_string(s, template)
Audio.from_file(fname, autoplay = False)

NameError: name 'Loc' is not defined

Run these cells to configure Rhasspy:

In [9]:
from kaia.persona.dub.languages.en import RhasspyAPI

api = RhasspyAPI.create(ADDRESS, [template])
api.train()
api.recognize(fname).to_str()

'It is fifteen hours and thirty two minutes'

So, it recognizes the file correctly.

Then, we can use the test on the larger scale. TestingTools generate lots of variants for the template with different values.

In [10]:
from kaia.persona.dub.languages.en import TestingTools

test = TestingTools([template], 100)
test.samples[0]

Sample(intent_obj=<kaia.persona.dub.core.templates.template.Template object at 0x7f6d78ad7f70>, s='It is twenty four hours and twenty seven minutes', true_intent='Intent', true_value={'minutes': 27, 'hours': 24}, recognition_obj=None, parsed_intent=None, parsed_value=None, failure=False, match_intent=False, match_keys=False, match_values=False, match=False)

Now, we can create our API around Rhasspy:

Rhasspy can only recognize the sentences is was trained for. So, to run tests, we need to first train Rhasspy on our sentence. After this, we will run tests for every of our 3 versions of dubbing:

In [11]:
import pandas as pd

def make_test():
    dfs = []
    for i in range(3):
        df = TestingTools.samples_to_df(test.test_voice(pack.create_dubber(option_index=i), api))
        df['option_index'] = i
        dfs.append(df)

    df = pd.concat(dfs)
    df.to_parquet('files/test_on_sample.parquet')

#make_test()

In [12]:
df = pd.read_parquet('files/test_on_sample.parquet')
df.groupby(['option_index'])[['match','match_values','match_keys','match_intent']].mean()

Unnamed: 0_level_0,match,match_values,match_keys,match_intent
option_index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,0.979167,0.979167,1.0,1.0
1,1.0,1.0,1.0,1.0
2,0.979167,0.979167,1.0,1.0


The results are decent. Since there is only one intent, there is no intent misrecognition. This label needs to be tested in other demos.