# Speech recognition with IBM Watson
In this notebook i experiment with the IBM Watson service using the Python API

In [7]:
from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
import json
import os

In [3]:
authenticator = IAMAuthenticator('ENTER YOUR CODE HERE')

speech_to_text = SpeechToTextV1(
 authenticator=authenticator
)

speech_to_text.set_service_url('ENTER YOUR SERVICE URL HERE')

In [12]:
sub_path = './resources/SpeechtoTextData/'

file_names = [os.path.join(sub_path, f) for f in os.listdir(sub_path)]
file_names = file_names[:-1] ## remove .txt file from list

In [21]:
for file in file_names:
    ext = os.path.basename(file).split('.')[1]

    with open(file, 'rb') as audio_file:
        speech_recognition_results = speech_to_text.recognize(
        audio = audio_file,
        content_type=f'audio/{ext}').get_result()
    print(f'file: {file}')
    print(json.dumps(speech_recognition_results, indent = 2))
    print('--------------------------------------------------------------------------------------')

file: ./Resources_9.1/SpeechtoTextData/367-130732-0000.flac
{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "lobsters and law officers ",
          "confidence": 0.74
        }
      ]
    }
  ]
}
--------------------------------------------------------------------------------------
file: ./Resources_9.1/SpeechtoTextData/367-130732-0001.flac
{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "when is a lobster not a lobster when it is a crayfish ",
          "confidence": 0.87
        }
      ]
    }
  ]
}
--------------------------------------------------------------------------------------
file: ./Resources_9.1/SpeechtoTextData/367-130732-0004.flac
{
  "result_index": 0,
  "results": [
    {
      "final": true,
      "alternatives": [
        {
          "transcript": "a book could be written about this restaurant and then all would not b

## Observations
The IBM Watson service performed quite well.

The audio clips contained speech from multiple speakers including different genders and accents. Additionally, the files prefixed with p232_ had quite a large amount of background noise. Some of these clips were difficult for me to hear correctly. In these cases, the Watson service was robust to noise. Most clips returned >= 90% confidence with only a few scorings lower. The lowest p232_010.wav which returned a confidence score of 19%. 

The most notable observations were:
* ***367-130732-0000.flac*** – This incorrectly return a transcript of “lobster and law officers” when the correct transcript was “Lobsters and lobsters”. If you say “law officers” really fast you can understand how this may have been incorrectly classified.
* ***p232-010.wav*** – The IBM service failed to transcribe the first 2 words (“People look”). This is probably the noisiest clip with inaudible background speech like the sounds of distant conversations in a busy restaurant or café.
* ***p232_009.wav*** – The highest scoring response (98% confident). This clip has some background noise but the speaker is very clear and evenly paced.
