# My dialog system

**Dialog system steps**

1. [Imports](#Imports) 
2. [Data binding](#Data_binding) 
3. [Input](#Input)
4. [Semantic parsing](#Semantic_parsing)
5. [Output](#Output)
6. [Dialog manager](#Dialog_manager)
    

### <a id="Imports">1. Imports</a>

In [2]:
from google.cloud import speech #speech recognition from Google Cloud
import io #provides Python’s main facilities for dealing with various types of inputs and outputs 
import requests #uses the API to access data from the web  
import os #for interacting with the operating system - in our case, python files are used in the code 
import sounddevice as sd #for recording an audio file
import numpy as np #for working with numerical data
from scipy.io.wavfile import write #sample rate for the recorded file 
import json #for using the API as a json structure 
from gtts import gTTS #text-to-speech module
from playsound import playsound #part of the text-to-speech module (playing the converted to speech text) 
from eliza import eliza #natural language processing computer program 
from file_update_vaccinations import file_update_vaccinations #updates the data from the API - for vaccines
from file_update_recovered import file_update_recovered #updates the data from the API - for number of recovered 
import emorec #trained model for emotion recognition 

In [2]:
#!dir

In [3]:
#cd..

### <a id="Data_binding">2. Data binding</a>

**Data integration function** - two python files are used here, which take and update the data of the two APIs: first one for vaccinations and second one for the number of recovered people from the previous day (the APIs are provided by Robert Koch Institut)

In [4]:
file_update_vaccinations()
vaccinations = open('vaccinations.json')
vaccinations = json.load(vaccinations)

Web site exists
File Updated 


In [5]:
vaccinations

{'data': {'administeredVaccinations': 100191598,
  'vaccinated': 53530526,
  'vaccination': {'biontech': 37215398,
   'moderna': 4364941,
   'astraZeneca': 9203256,
   'janssen': 2746931},
  'delta': 110658,
  'quote': 0.644,
  'secondVaccination': {'vaccinated': 49408003,
   'vaccination': {'biontech': 38349686,
    'moderna': 4894304,
    'astraZeneca': 3417082,
    'janssen': 2746931},
   'delta': 183678,
   'quote': 0.594},
  'latestDailyVaccinations': {'date': '2021-08-24T00:00:00.000Z',
   'vaccinated': 110658,
   'firstVaccination': 110658,
   'secondVaccination': 170068},
  'indication': {'age': None,
   'job': None,
   'medical': None,
   'nursingHome': None,
   'secondVaccination': {'age': None,
    'job': None,
    'medical': None,
    'nursingHome': None}},
  'states': {'BW': {'name': 'Baden-WÃ¼rttemberg',
    'administeredVaccinations': 13064549,
    'vaccinated': 6883761,
    'vaccination': {'biontech': 4816692,
     'moderna': 544609,
     'astraZeneca': 1170684,
     'j

In [6]:
file_update_recovered()
recovered = open('recovered.json')
recovered = json.load(recovered)

Web site exists
File Updated 


In [7]:
recovered

{'data': [{'recovered': 1, 'date': '2020-01-02T00:00:00.000Z'},
  {'recovered': 2, 'date': '2020-01-23T00:00:00.000Z'},
  {'recovered': 2, 'date': '2020-01-28T00:00:00.000Z'},
  {'recovered': 2, 'date': '2020-01-29T00:00:00.000Z'},
  {'recovered': 4, 'date': '2020-01-31T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-01T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-03T00:00:00.000Z'},
  {'recovered': 4, 'date': '2020-02-04T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-06T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-07T00:00:00.000Z'},
  {'recovered': 2, 'date': '2020-02-11T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-17T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-18T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-20T00:00:00.000Z'},
  {'recovered': 1, 'date': '2020-02-24T00:00:00.000Z'},
  {'recovered': 3, 'date': '2020-02-25T00:00:00.000Z'},
  {'recovered': 7, 'date': '2020-02-26T00:00:00.000Z'},
  {'recovered': 23, 'date': '2020-02-27T

### <a id="Input">3. Input</a>

**Define the parameters of the audio file that will be recorded**

In [8]:
sr = 16000  # Sample rate
seconds = 5  # Duration of recording
filename = 'myfile.wav'

**Record a file** - will count as input

In [9]:
def record_file():
    data = sd.rec(int(seconds * sr), samplerate=sr, channels=1)
    sd.wait()  # Wait until recording is finished
    # Convert `data` to 16 bit integers:
    y = (np.iinfo(np.int16).max * (data/np.abs(data).max())).astype(np.int16) 
    write(filename, sr, y)

**Initialise Google Cloud** - for the module speech-to-text / speech recognition; the credentials can be accessed after creating a business Google account

In [10]:
def init_google():
    credentials=r"C:\Users\user\s_dyalog\Google Credentials.json" 
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=credentials

In [11]:
init_google()

**Normalize the transcribed inputs** - there should be no difference between lowercase and uppercase letters

In [12]:
def normalize(in_s):
    return in_s.lower()

**Speech recognition from Google Cloud** - the function transcribes the recorded input and returns it as text

In [13]:
def transcribe():
    client = speech.SpeechClient()
    with io.open(filename, "rb") as audio_file:
        content = audio_file.read()
    audio = speech.RecognitionAudio(content = content)
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        language_code="de-DE",
    )
    response = client.recognize(config=config, audio=audio)
    for result in response.results:
        for index, alternative in enumerate(result.alternatives):
            print("Transcript {}: {}".format(index, alternative.transcript))
            return alternative.transcript
            

**Input function** - an audio file containing natural language will be recorded, transcribed and returned as text

In [14]:
def speech_input():
    record_file()
    text = transcribe()
    return text

In [15]:
def do_input():
    return speech_input()

### <a id="Semantic_parsing">4. Semantic parsing</a>

**Dictionaries** - phrases defined for extracting the keywords from the input, searching in the json data and generating the output

In [16]:
phrases = {'hello':'Hallo! Ich bin deine neue Freundin, das Dialogsystem! Was möchtest du wissen?', 
        'continue':'Kann ich dir weiterhelfen?', 
        'goodbye':'Vielen Dank für deinen Besuch!', 
        'done':'tschüss'}

states_d = {'schleswig':'SH', 'hamburg':'HH', 'berlin':'BE', 'bayern':'BY', 
            'niedersachsen': 'NI', 'bremen': 'HB', 
            'nordrhein':'NW', 'hessen':'HE', 'rheinland':'RP', 'baden':'BW', 
            'saarland': 'SL', 'brandenburg':'BB', 'mecklenburg':'MV', 'sachsen':'SN',
            'anhalt':'ST', 'thüringen':'TH', 'deutschland':'DE', 'hier':'DE'}
state_names = {'SH':'Schleswig-Hostein', 'HH':'Hamburg', 'BE':'Berlin', 'BY':'Bayern', 
            'NI':'Niedersachsen', 'HB':'Bremen', 
            'NW': 'Nordrhein Westfalen', 'HE':'Hessen', 'RP':'Rheinland Pfalz', 'BW':'Baden Württemberg', 
            'SL':'Saarland', 'BB':'Brandenburg', 'MV':'Mecklenburg Vorpommern', 
            'SN': 'Sachsen', 'ST':'Sachsen-Anhalt', 'TH':'Thüringen', 'DE':'Deutschland'}
vaccines_d = {'biontech':'biontech', 'biontec':'biontech', 
              'moderna':'moderna', 
              'janssen':'janssen', 'jansen':'janssen',
              'delta':'delta',
              'astraZeneca':'astraZeneca', 'astra':'astraZeneca', 'zeneca':'astraZeneca'}
    
vaccine_names = {'biontech':'Biontech', 'moderna':'Moderna', 'janssen':'Janssen', 'delta':'Delta',
              'astraZeneca':'Astra Zeneca'}

recovered_d = {'genesene': 'genesen', 'genesen': 'genesen'}

recovered_days = {'genesen': -1}

emo_dict = {'happiness':'glücklich', 'neutral': 'wie immer', 'anger': 'irritiert', 'sadness': 'traurig', 
            'fear': 'ängstlich', 'boredom':'gelangweilt', 'disgust':'angeekelt'}

**Semantic parsing function** - natural language is translated into logical expressions/meaningful representations; keywords are filtered out of the input, in our case the function checks if in the input there are words related to a type of vaccination, a state from Germany or the number of recovered people (the words have been defined in the dictionaries)

In [17]:
def semantic(input_s):
    semantics = {'state':'', 'vaccine':'', 'recovered':'', 'answer':0}
    for key in recovered_d.keys():
        if key in input_s:
            semantics['recovered'] = recovered_d[key]
            break
    for key in states_d.keys():
        if key in input_s:
            semantics['state'] = states_d[key]
            break
    for key in  vaccines_d.keys():
        if key in input_s:
            semantics['vaccine'] =  vaccines_d[key]
            break
    return semantics

**Interpret the semantic content of the input** - the function verifies if in the input it was asked for a specific type of vaccine (if not, the function returns the total number of vaccines), for a specific state (if not, the function returns the vaccination data from whole Germany), or for the number of recovered people from the previous day. Otherwise the function returns an empty variable

In [18]:
# expects semantics: semantics[0] == bundesland, semantics[1] == vaccine, semantics[2] == recovered
def data(semantics):
    s = semantics['state']
    v = semantics['vaccine']
    r = semantics['recovered']
    
    if r: #number of recovered is asked
        semantics['answer'] = recovered["data"][recovered_days[r]]["recovered"]
    else:
        if s: # state given
            if s != 'DE':
                if v: # and vaccine given
                    semantics['answer'] = vaccinations["data"]["states"][s]['vaccination'][v]
                else: # all vaccines for state
                    semantics['answer'] = vaccinations["data"]["states"][s]['vaccinated']
            else:
                if v: # and vaccine given
                    semantics['answer'] = vaccinations["data"]['vaccination'][v]
                else: # all vaccines for Germany
                    semantics['answer'] = vaccinations['data']['vaccinated']
        else: # no state
            if v: # but vaccine
                semantics['answer'] = vaccinations["data"]['vaccination'][v]
            else: # nothing given
                semantics['answer'] = None
            
    return semantics

**ELIZA** - if in the input it was not asked for the corona related information, the output will be generated with Eliza, an early natural language processing computer program, that simulates conversation by using a "pattern matching" and substitution methodology. It gives the users an illusion of understanding on the part of the program, but has no built in framework for contextualizing events

The function uses a python file with the code and a text file with the phrases that are used for generating the output 

In [19]:
def init_eliza(): #Initialize Eliza
    root = r"C:\Users\user\s_dyalog"
    elz = eliza.Eliza()
    elz.load(root+"\eliza\deutsch.txt")
    return elz

In [20]:
elz = init_eliza()

### <a id="Output">5. Output</a>

**Output function** - on the basis of the interpretation done in the semantic parsing module, the function returns an answer that counts as an output. If in the input has been asked for information related to corona (vaccines or number of recovered), the function will use the API from RKI for generating the output; otherwise the output will be generated with Eliza.

In [21]:
def output(semantics, inputs, elz):
    ret = ''
    eliza = 0 
    s = semantics['state']
    v = semantics['vaccine']
    r = semantics['recovered']
    a = semantics['answer']
    if r: #number of recovered is asked 
        r = recovered_days[r]
        ret = 'Gestern gab es {} Genesene in Deutschland'.format(a)
    else:
        if s: # state given
            s = state_names[s]
            if v: # and vaccine given
                v = vaccine_names[v]
                ret = 'Die Impfungen für {} mit {} sind {}'.format(s, v, a)
            else: # all vaccines for state
                ret = 'Die Impfungen für {} sind {}'.format(s, a)
        else: # no state
            if v: # but vaccine
                v = vaccine_names[v]
                ret = 'Die Impfungen in Deutschland mit {} sind {}'.format(v, a)
            else:
                if r: #but recovered
                    r = recovered_days[r]
                    ret = 'Gestern gab es {} Genesene in Deutschland'.format(a)
                else: # nothing given
                    ret = elz.respond(inputs)
                    eliza = 1
    return ret, eliza
    

**Text-to-speech** - the function turns the text into speech and plays it

In [22]:
def tts(text):
    
    language = 'de'  #language (ISO Code)

    myobj1 = gTTS(text=text, lang=language, slow=False)
    file1 = str("hello" + ".mp3") #generate speech output

    myobj1.save(file1) #save as mp3 
    
    playsound(file1,True)
    os.remove(file1) #playsound

**Turns the output into speech and plays it** - the output was generated in the previous function "output" as text

In [23]:
def output_s(text):
    print('output: '+text)
    tts(text)

### <a id="Dialog_manager">6. Dialog manager</a>

**Dialog manager function** - responsible for the state and flow of the conversation; all functions are connected here. The dialog system determines also the emotions of the person that records the input, with the program EmoDB, and replies accordingly. The program recognizes only the emotions: happiness, neutral, anger, sadness, fear, boredom and disgust.

In [24]:
def dialogmanager(elz):
    output_s(phrases['hello']) 
    input_s = do_input()
    input_s = normalize(input_s)
    while input_s and input_s != phrases['done']:
        input_s = normalize(input_s)
        emotion = emoRec.classify(filename)[0] 
        emotion_g = emo_dict[emotion]
        if (emotion_g == 'traurig'):
            output_s('ich merke du bist '+emotion_g)
            output_s('mach dir keine Sorgen! ich bin hier, um mit dir zu reden!')
        if (emotion_g == 'glücklich'):
            output_s('ich merke du bist '+emotion_g)
            output_s('wenn du glücklich bist, dann bin ich auch glücklich!')
        if (emotion_g == 'irritiert'):
            output_s('ich merke du bist '+emotion_g)
            output_s('ich kann dir gerne ein paar Atemübungen empfehlen')
        if (emotion_g == 'gelangweilt'):
            output_s('ich merke du bist '+emotion_g)
            output_s('mach lieber einen Spaziergang! wir können uns später unterhalten!')
            break
            
        semantics = semantic(input_s)
        semantics = data(semantics)
        out_string = output(semantics, input_s, elz)[0] 
       
        output_s(out_string)
        
        if (output(semantics, input_s, elz)[1] == 0):  #the question to continue will be asked only for the corona dialog, because Eliza has its own questions
            output_s(phrases['continue']) 
        input_s = do_input()
        if (input_s):
            input_s = normalize(input_s)
        else:
            output_s(phrases['goodbye'])   
    output_s('Tschüss')   

In [25]:
emoRec = emorec.EmoRec() #in order to determine the emotions, the dialog system uses a python file with the code 

**Run the whole program**

In [26]:
dialogmanager(elz)

output: Hallo! Ich bin deine neue Freundin, das Dialogsystem! Was möchtest du wissen?
Transcript 0: jo
output: ich merke du bist traurig
output: mach dir keine Sorgen! ich bin hier, um mit dir zu reden!
output: Ich bin nicht sicher, ob ich dich verstanden habe.
Transcript 0: Stellungnahme
output: ich merke du bist traurig
output: mach dir keine Sorgen! ich bin hier, um mit dir zu reden!
output: Das ist ja interessant.  Sprich bitte weiter.
Transcript 0: Berlin
output: ich merke du bist traurig
output: mach dir keine Sorgen! ich bin hier, um mit dir zu reden!
output: Die Impfungen für Berlin sind 2354228
output: Kann ich dir weiterhelfen?
Transcript 0: Setup
output: ich merke du bist traurig
output: mach dir keine Sorgen! ich bin hier, um mit dir zu reden!
output: Stört es dich, daß wir über dieses Thema sprechen ?
Transcript 0: tschüss
output: Tschüss
