## Speech Recognition

In this seminar we will build a bot which will be able to process voice commands using Google Speech Recognition and our brain.

Sadly, Telegram records voice in OGG format, while GSR works with WAV-encoded data. We will need to convert.

## Step 1. Download and install ffmpeg

### For Ubuntu users (easy mode):
* `sudo apt-get update`
* `sudo apt-get install ffmpeg`
* make sure you can run `ffmpeg` from your console

### For Mac users:
* download binaries from https://ffmpeg.zeranoe.com/builds/macos64/static/ffmpeg-3.4-macos64-static.zip
* unzip somewhere
* find a full path to ffmpeg executable which will look like `/path_to_your_unzip_dir/ffmpeg-3.4-macos64-static/bin/ffmpeg`
* e.g. on my PC it's `/Users/denisantyukhov/Downloads/ffmpeg-3.4-macos64-static/bin/ffmpeg`
* set `FFMPEG_PATH` in `config.py` to your correct path so that it looks like

`FFMPEG_PATH = '/Users/denisantyukhov/Downloads/ffmpeg-3.4-macos64-static/bin/ffmpeg'`

### For Windows users:
* download binaries from https://ffmpeg.zeranoe.com/builds/win64/static/ffmpeg-3.4-win64-static.zip (64 bit)
* unzip somewhere
* find a full path to ffmpeg executable which will probably look like `C:\your_folder\ffmpeg-3.4-win64-static\bin\ffmpeg.exe`
* set `FFMPEG_PATH` in `config.py` to your correct path so that it looks like

`FFMPEG_PATH=r'C:\ffmpeg-3.4-win64-static\bin\ffmpeg.exe'`

## Step 2. Configure your bot

Just set the TOKEN variable with the token that BOT FATHER gave you during lecture 1

## Step 3. Configure Google Speech Recognition

In `gsr_config.yml` set the `KEY` variable with the token you recieved in Slack

## Step 4. Install dependencies

`pip install -r requirements.txt`

## Step 5. Check that it works

In [None]:
from speech2text import speech2text

In [None]:
r = speech2text("check.ogg")

In [None]:
print(r)
assert r == 'проверка микрофона'
print('done')

## Step 5. Run your bot

In [None]:
import asyncio
import logging
import telegram

from telegram.ext import Updater, CommandHandler, MessageHandler, Filters

from speech2text import speech2text
from config import TOKEN, LOG_FILE, DL_DIR


# Enable logging
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# Logging to file
fh = logging.FileHandler(LOG_FILE)
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
# Logging to console
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(formatter)

logger.addHandler(fh)
logger.addHandler(ch)


class Bot:
    def __init__(self):

        self.updater = Updater(TOKEN)
        self.dsp = self.updater.dispatcher

        # register handler functions which define how the bot reacts to events
        self.dsp.add_handler(CommandHandler("start", get_help))
        self.dsp.add_handler(CommandHandler("help", get_help))
        self.dsp.add_handler(MessageHandler(Filters.text, echo))
        self.dsp.add_handler(MessageHandler(Filters.voice, process_audio))
        Filters.audio
        self.dsp.add_error_handler(error)

        logger.info('Im alive!')

    def power_on(self):
        # start the Bot
        self.updater.start_polling()
        self.updater.idle()

def echo(bot, update):
    logger.info('echo recieved message: {}'.format(update.message.text))
    bot.sendMessage(update.message.chat_id, text=update.message.text)


def error(bot, update, error):
    # all uncaught telegram-related exceptions will be rerouted here
    logger.error('Update "%s" caused error "%s"' % (update, error))


def get_help(bot, update):
    logger.info('get_help recieved message: {}'.format(update.message.text))
    help_msg = ('Greetings, {} {}! Name is {}, at your service.\n'
                'My purpose is to demonstrate how ').format(
        update.message.from_user.first_name, update.message.from_user.last_name, bot.name)
    bot.sendMessage(update.message.chat_id, text=help_msg)
    
    
def process_audio(bot, update):
    logger.info('process_audio recieved: {}'.format(''))
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    try:

        voice_file_id = update.message.voice.file_id
        voice_file = bot.get_file(voice_file_id)
        ogg_fname = "{}/{}".format(DL_DIR, "voice.ogg")
        voice_file.download(ogg_fname)
        logger.info('downloaded')
        transcript = speech2text(ogg_fname)
        logger.info('transcription result: {}'.format(transcript))
        bot.sendMessage(update.message.chat_id, text=transcript)

    except Exception as e:
        logger.error(e)

my_bot = Bot()
my_bot.power_on()


## Exercise 1

Now let's make our bot do something useful with our voice commands. Some examples:

In [None]:
from utils import get_rates, get_weather, get_anekdot

In [None]:
print(get_rates())

In [None]:
print(get_weather())

In [None]:
print(get_anekdot())

### First, define an object that will return the most probable action for out requst

In [None]:
import json
import scipy
import pymystem3
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from itertools import chain

### Feel free to modify

In [None]:
class ENGINE_3(object):
    def __init__(self, kbase):
        self.knowledge_base = kbase
        self.lemmatizer = pymystem3.Mystem()
        
        # contains correct output for each class
        self.answers = np.array([t['answer'] for t in self.knowledge_base])
        self.vectorizer = self.prepare_vectorizer()
        self.vectorized_kbase, self.class_indexes = self.vectorize_knowledge_base()
        
    def prepare_vectorizer(self):
        """
        Fits TF-IDF vectorizer using all available text from self.knowledge_base
        Returns TF-IDF vectorizer object
        """
        vectorizer = TfidfVectorizer(ngram_range=(1,2), tokenizer=self.tokenize_and_lemmatize)
        all_texts = []
        for kb in self.knowledge_base:
            all_texts.append(kb['question'])
            all_texts += kb['paraphrased_questions']
        vectorizer.fit(all_texts)
        return vectorizer
    
    def vectorize(self, data):
        """
        Turns a list of N strings into their vector representation using self.vectorizer.
        Returns a a matrix of shape [N, n_features]
        """
        return self.vectorizer.transform(data)
        
    def vectorize_knowledge_base(self):
        """
        Vectorizes all questions using the vectorize function.
        Builds a list containing class number for each question.        
        """
        texts = []
        class_labels = []
        
        for i, t in enumerate(self.knowledge_base):
            texts.append(t['question'])
            texts += t['paraphrased_questions']

            class_labels.append(i)
            class_labels += [i]*len(t['paraphrased_questions'])
        
        
        return self.vectorize(texts), class_labels
    
    def compute_class_scores(self, similarities):
        """
        Accepts an array of similarities of shape (self.class_indexes, )
        Computes scores for classes.
        Returns a dictionary of size (n_classes) that looks like
        {
            0: 0.3,
            1: 0.1,
            2: 0.0,
            class_n_id: class_n_score
            ...
        }
        """
        
        class_scores = dict(zip(range(len(self.answers)), [0]*len(self.answers)))
        
        for ci, sc in zip(self.class_indexes, similarities):
            class_scores[ci] += sc
        return class_scores
        
    def tokenize_and_lemmatize(self, text):
        analysis = self.lemmatizer.analyze(text.strip())
        tokens = []
        for an in analysis:
            if 'analysis' in an:
                try:
                    tokens.append(an['analysis'][0]['lex'])
                except IndexError:
                    tokens.append(an['text'])
        return tokens
    
    def get_top(self, query, top_k=1):
        if isinstance(query, str):
            query = [query]
            
        vectorized_query = self.vectorize(query)
        css = cosine_similarity(vectorized_query, self.vectorized_kbase)[0]
        scores = self.compute_class_scores(css)
        
        sorted_scores = sorted(scores.items(), key= lambda x: x[1])[::-1][:top_k]
        top_classes = np.array([c[0] for c in sorted_scores])
        top_answers = list(self.answers[top_classes])
        return top_answers[0]

### Fill the cells with texts that your bot will react to

In [None]:
FAQ = [{'question':'прогноз погоды', 'answer': get_weather, 'paraphrased_questions':['что там на улице']},
       {'question':'your command here', 'answer': get_anekdot, 'paraphrased_questions':['your_paraphrases_here']},
       {'question':'your command here', 'answer': get_rates, 'paraphrased_questions': ['your_paraphrases_here']}]

In [None]:
eng = ENGINE_3(FAQ)

### Engine returns the most relevant callable function

In [None]:
eng.get_top('дай прогноз')

### Integrate the engine into your bot so that he obeys your commands!

In [None]:
import asyncio
import logging
import telegram

from telegram.ext import Updater, CommandHandler, MessageHandler, Filters

from speech2text import speech2text
from config import TOKEN, LOG_FILE, DL_DIR


# Enable logging
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# Logging to file
fh = logging.FileHandler(LOG_FILE)
fh.setLevel(logging.DEBUG)
fh.setFormatter(formatter)
# Logging to console
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(formatter)

logger.addHandler(fh)
logger.addHandler(ch)


class Bot:
    def __init__(self):

        self.updater = Updater(TOKEN)
        self.dsp = self.updater.dispatcher

        # register handler functions which define how the bot reacts to events
        self.dsp.add_handler(CommandHandler("start", get_help))
        self.dsp.add_handler(CommandHandler("help", get_help))
        self.dsp.add_handler(MessageHandler(Filters.text, echo))
        self.dsp.add_handler(MessageHandler(Filters.voice, process_audio))
        Filters.audio
        self.dsp.add_error_handler(error)

        logger.info('Im alive!')

    def power_on(self):
        # start the Bot
        self.updater.start_polling()
        self.updater.idle()

def echo(bot, update):
    logger.info('echo recieved message: {}'.format(update.message.text))
    bot.sendMessage(update.message.chat_id, text=update.message.text)


def error(bot, update, error):
    # all uncaught telegram-related exceptions will be rerouted here
    logger.error('Update "%s" caused error "%s"' % (update, error))


def get_help(bot, update):
    logger.info('get_help recieved message: {}'.format(update.message.text))
    help_msg = ('Greetings, {} {}! Name is {}, at your service.\n'
                'My purpose is to demonstrate how ').format(
        update.message.from_user.first_name, update.message.from_user.last_name, bot.name)
    bot.sendMessage(update.message.chat_id, text=help_msg)
    
    
def process_audio(bot, update):
    logger.info('process_audio recieved: {}'.format(''))
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    try:

        voice_file_id = update.message.voice.file_id
        voice_file = bot.get_file(voice_file_id)
        ogg_fname = "{}/{}".format(DL_DIR, "voice.ogg")
        voice_file.download(ogg_fname)
        logger.info('downloaded')
        transcript = speech2text(ogg_fname)
        logger.info('transcription result: {}'.format(transcript))

        # your code goes here
        func_output = ''
        
        bot.sendMessage(update.message.chat_id, text=func_output)

    except Exception as e:
        logger.error(e)

my_bot = Bot()
my_bot.power_on()
