## Speech to Text
---
**Code Adapted from:** [Mitchell Bohman, Nour Zahlan, and Masiur Abik](https://github.com/mchbmn/radio-to-location) and [Joseph Hopkins, Carol, Chiu, Anthony Chapman, Kwamae Delva](https://github.com/delvakwa/police_radio_to_mapping)

In [66]:
import os
import io
import time

import pandas as pd
import numpy as np
from pydub import AudioSegment

# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types

In [67]:
pd.options.display.max_colwidth = 1000

Pool is a class in the multiprocessing package that distributes functionality across multiple processes in a computer. Simply put, it lets the computer assign more than one person to build a fence instead of 1. This dramatically speeds up the time it takes for computationally expensive tasks to run and it called and placed around such tasks. 

In [68]:
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = '/Users/gabrielperezprieto/Documents/Future/GOOGLE/My First Project-9b59743c8b2f.json'
client = speech.SpeechClient()

#### Check Number of Files Under File Path

In [69]:
path = './testing/wav_converted_files/'
wav_files = []
for file in os.listdir(path):
    if file.endswith('.wav'):
        wav_files.append(file)
print(f'Number of Files: {len(wav_files)}')

Number of Files: 17


#### Tests a Specific Audio File

In [70]:
sound_file = AudioSegment.from_file('./testing/wav_converted_files/' + str(wav_files[0]), format="wav")

print(f'Sample Width: {sound_file.sample_width}')
print(f'Channel Count: {sound_file.channels}')
print(f'Duration: {len(sound_file) / 1000}s')
print(f'Sample Rate: {sound_file.frame_rate}')

Sample Width: 2
Channel Count: 2
Duration: 8.587s
Sample Rate: 22050


#### Speech to Text - Google API + Streets Context

In [71]:
# Refresh stored variables from previous notebooks
%store -r

In [86]:
def google_speech_to_text(filepath):
    
    '''Converts audio files under selected folder to text and returns a DataFrame'''

    # Create list to house data on every loop
    list_results = []

    # Loop through all files in path provided
    for n, file in enumerate(os.listdir(path)):
    
        t1 = time.time()
        
        # Select only the ones with extension '.wav'
        if file.endswith('.wav'):
            
            # Instantiate dictionary
            d = {}
            
            # Instantiates a client
            client = speech.SpeechClient()

            # Loads the audio into memory
            with io.open(path + file, 'rb') as audio_file:
                content = audio_file.read()
                audio = types.RecognitionAudio(content=content)

            # Configure recognition parameters
            config = types.RecognitionConfig(
                encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
                sample_rate_hertz=22050,
                language_code='en-US',
                audio_channel_count=2,
                model = 'video',
                speech_contexts= [{'phrases': streets_list}])

            # Detects speech in the audio file
            response = client.recognize(config, audio)

            # Create string to house pieces returned by result
            string = ''
            list_confidence = []

            # Loop through results
            for result in response.results:

                # if lenght is greater than 0
                if len(result.alternatives[0].transcript) > 0:

                    # Append to list
                    string = string + result.alternatives[0].transcript
                    list_confidence.append(result.alternatives[0].confidence)
            
            # Create key/value pair for dictionary                      
            d['transcripts'] = string
            d['confidence'] = np.mean(list_confidence)

            # Append dictionary to list
            list_results.append(d)
            
            # Print RunTime
            print(f'File {n} RunTime: {round(time.time() - t1, 2)}s')
    
    # Create DataFrame with list_results
    df = pd.DataFrame(list_results)

    # Return DataFrame
    return df

#### Create DataFrame from Speech To Text

In [87]:
df = google_speech_to_text('./testing/wav_converted_files/')

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)


File 0 RunTime: 4.5s
File 1 RunTime: 8.73s
File 2 RunTime: 7.07s
File 3 RunTime: 6.43s
File 4 RunTime: 9.22s
File 5 RunTime: 5.64s
File 6 RunTime: 9.01s
File 8 RunTime: 3.89s
File 9 RunTime: 9.02s
File 10 RunTime: 3.21s
File 11 RunTime: 10.2s
File 12 RunTime: 5.86s
File 13 RunTime: 3.76s
File 14 RunTime: 3.68s
File 15 RunTime: 8.4s
File 16 RunTime: 8.6s
File 17 RunTime: 4.6s


In [88]:
df.shape

(17, 2)

In [89]:
df.head()

Unnamed: 0,transcripts,confidence
0,,
1,otherwise I believe is that a smile,0.750689
2,3480 to drinking water heater to Category 5 Laguna Road 22:35 Laguna Road Clayton find Hill School Road Trio Court box 2843 Alpha control,0.838746
3,so this sounds 2.11 button Court 24:11 button Court Santa Rosa Pinecrest Drive Box 28.7 offer three control three 10:49,0.868329
4,seven five six 5150 Burbank Heights Apartment seven seven seven seven Brea Avenue 7777 Bodega Avenues Baskervilles unit are three one zero Nelson Way Virginia Avenue Fox 3143 Bravo for control,0.899073


#### Drop NaN Values / Blank Transcripts

In [90]:
df.dropna(inplace=True)
df.reset_index(drop=True, inplace=True)

In [91]:
df['transcripts'] = df['transcripts'].map(lambda x: x.replace(':','').strip())

In [92]:
df.head()

Unnamed: 0,transcripts,confidence
0,otherwise I believe is that a smile,0.750689
1,3480 to drinking water heater to Category 5 Laguna Road 2235 Laguna Road Clayton find Hill School Road Trio Court box 2843 Alpha control,0.838746
2,so this sounds 2.11 button Court 2411 button Court Santa Rosa Pinecrest Drive Box 28.7 offer three control three 1049,0.868329
3,seven five six 5150 Burbank Heights Apartment seven seven seven seven Brea Avenue 7777 Bodega Avenues Baskervilles unit are three one zero Nelson Way Virginia Avenue Fox 3143 Bravo for control,0.899073
4,report of the leaking water here 2235 live in a river,0.80364


#### Check Total Confidence on Speech to Text

In [93]:
round(np.mean(df['confidence']),5)

0.83165

#### Tokenize Transcripts

In [94]:
df['tokens'] = df['transcripts'].map(lambda x: x.lower().split(' '))

In [95]:
df.head()

Unnamed: 0,transcripts,confidence,tokens
0,otherwise I believe is that a smile,0.750689,"[otherwise, i, believe, is, that, a, smile]"
1,3480 to drinking water heater to Category 5 Laguna Road 2235 Laguna Road Clayton find Hill School Road Trio Court box 2843 Alpha control,0.838746,"[3480, to, drinking, water, heater, to, category, 5, laguna, road, 2235, laguna, road, clayton, find, hill, school, road, trio, court, box, 2843, alpha, control]"
2,so this sounds 2.11 button Court 2411 button Court Santa Rosa Pinecrest Drive Box 28.7 offer three control three 1049,0.868329,"[so, this, sounds, 2.11, button, court, 2411, button, court, santa, rosa, pinecrest, drive, box, 28.7, offer, three, control, three, 1049]"
3,seven five six 5150 Burbank Heights Apartment seven seven seven seven Brea Avenue 7777 Bodega Avenues Baskervilles unit are three one zero Nelson Way Virginia Avenue Fox 3143 Bravo for control,0.899073,"[seven, five, six, 5150, burbank, heights, apartment, seven, seven, seven, seven, brea, avenue, 7777, bodega, avenues, baskervilles, unit, are, three, one, zero, nelson, way, virginia, avenue, fox, 3143, bravo, for, control]"
4,report of the leaking water here 2235 live in a river,0.80364,"[report, of, the, leaking, water, here, 2235, live, in, a, river]"


#### Save Clean DataFrame to .csv

In [96]:
df.to_csv('./data/transcripts.csv', index_label=False)