# Evaluation of Custom Speech Transcription
This notebook serves to evaluate your Speech-to-Tex transcriptions generated by [GLUE](https://github.com/microsoft/glue).

In [1]:
# Import required packages
import sys
import pandas as pd
import configparser

# Notebook specific functions
from matplotlib import cm, pyplot as plt 

# Custom functions
sys.path.append("../src")
import evaluate as ev

# Notebook configs
%matplotlib inline
%load_ext autoreload
%autoreload 2

## Input Data
Below, you will import the transcription file generated by GLUE in the `--do_transcribe` mode. <br>
The evaluation will be equivalent to the one generated by the `--do_evaluation` mode, which is only printed in the console output. <br>
Here, you will have a consistent view on the results. 

Make sure it has the structure below. If you used GLUE, it will have it either way:
- Comma-separated (.csv)
- UTF-8 encoded
- Columns "text" for reference transcript and "rec" for recognition

In [2]:
# Import transcription file
res = pd.read_csv("../assets/examples/output_files/example_transcriptions_full.csv", sep=",", encoding='utf-8')[['audio', 'text', 'rec']]
res.text.fillna("", inplace=True)
res.rec.fillna("", inplace=True)
res.head()

Unnamed: 0,audio,text,rec
0,BookFlight.wav,I would like to book a flight to Frankfurt.,Aber leicht über Flight Frankfurt.
1,CancelFlight.wav,I want to cancel my journey to Kuala Lumpur,Pur.
2,ChangeFlight.wav,I would like to change my flight to Singapore.,I would like to change my flight?
3,BookSeat.wav,I would like to book a seat on my flight to St...,


### Evaluate
Evaluation of transcription results by comparing them with reference transcripts.
- Calculates metrics such as [Word Error Rate (WER)](https://en.wikipedia.org/wiki/Word_error_rate), Sentence Error Rate (SER), Word Recognition Rate (WRR).
- Implementation based on [github.com/belambert/asr-evaluation](https://github.com/belambert/asr-evaluation).
- See some hints on [how to improve your Custom Speech accuracy](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data).

Generally, we recommend not to take the WER too serious, rather see it as a tool to detect recurring patterns or issues in the speech model. Especially in combination with LUIS, an end-to-end testing is more relevant.

Print Verbosity:
- 0 -> Only summary metrics
- 1 -> Only errors
- 2 -> All

Optional variable: query_keyword.   
This can be used to search for certain words in the reference text.

In [3]:
eva = ev.EvaluateTranscription()

In [4]:
eva.calculate_metrics(res.text.values, res.rec.values, label=res.audio.values, print_verbosiy=1)

REF: [31mI[0m [31mWOULD[0m [31mLIKE[0m [31mTO  [0m [31mBOOK  [0m [31mA   [0m flight [31mTO[0m frankfurt
REC: [31m*[0m [31m*****[0m [31m****[0m [31mABER[0m [31mLEICHT[0m [31mÜBER[0m flight [31m**[0m frankfurt
SENTENCE 2  BookFlight.wav
Correct          =  22.2%    2   (     9)
Errors           =  77.8%    7   (     9)
REF: [31mI[0m [31mWANT[0m [31mTO[0m [31mCANCEL[0m [31mMY[0m [31mJOURNEY[0m [31mTO[0m [31mKUALA[0m [31mLUMPUR[0m
REC: [31m*[0m [31m****[0m [31m**[0m [31m******[0m [31m**[0m [31m*******[0m [31m**[0m [31m*****[0m [31mPUR   [0m
SENTENCE 3  CancelFlight.wav
Correct          =   0.0%    0   (     9)
Errors           = 100.0%    9   (     9)
REF: i would like to change my flight [31mTO[0m [31mSINGAPORE[0m
REC: i would like to change my flight [31m**[0m [31m*********[0m
SENTENCE 4  ChangeFlight.wav
Correct          =  77.8%    7   (     9)
Errors           =  22.2%    2   (     9)
REF: [31mI[0m [31mWOULD[0

(0.7692307692307693, 0.23076923076923078, 1.0)

In [5]:
eva.print_errors(min_count=1)


***DELETIONS:
to                            6
i                             3
would                         2
like                          2
my                            2
want                          1
cancel                        1
journey                       1
kuala                         1
singapore                     1
book                          1
a                             1
seat                          1
on                            1
flight                        1
stuttgart                     1

***SUBSTITUTIONS:
to                   -> aber                            1
book                 -> leicht                          1
a                    -> über                            1
lumpur               -> pur                             1
