# Speech to text functions
These functions are different attempts in translating Sophia's audio files into text files

## First attempt: simple `speech_recognition` function

In [25]:
import speech_recognition as sr

r = sr.Recognizer()
audio_file = sr.AudioFile('KZ-and-Amy.wav')
with audio_file as source: 
    r.adjust_for_ambient_noise(source) 
    audio = r.listen(source,timeout=10)
result = r.recognize_google(audio)
with open('KZ-and-Amy.txt',mode ='w') as file: 
    file.write("Recognized text:") 
    file.write("\n") 
    file.write(result) 
    print("ready!")

ready!


Did not work... we need to try breaking the large file into smaller files so the function can translate it.

## Second function: splitting audio into chunks

In [1]:
# importing libraries 
import speech_recognition as sr 

import os 

from pydub import AudioSegment 
from pydub.silence import split_on_silence 



In [26]:
# a function that splits the audio file into chunks 
# and applies speech recognition 
def silence_based_conversion(path): 

    # open the audio file stored in 
    # the local system as a wav file. 
    song = AudioSegment.from_wav(path) 

    # open a file where we will concatenate 
    # and store the recognized text 
    fh = open("recognized.txt", "w+") 
        
    # split track where silence is 0.5 seconds 
    # or more and get chunks 
    chunks = split_on_silence(song, 
        # must be silent for at least 0.5 seconds 
        # or 500 ms. adjust this value based on user 
        # requirement. if the speaker stays silent for 
        # longer, increase this value. else, decrease it. 
        min_silence_len = 500, 

        # consider it silent if quieter than -16 dBFS 
        # adjust this per requirement 
        silence_thresh = -16
    ) 

    # create a directory to store the audio chunks. 
    try: 
        os.mkdir('audio_chunks') 
    except(FileExistsError): 
        pass

    # move into the directory to 
    # store the audio files. 
    os.chdir('audio_chunks') 

    i = 0
    # process each chunk 
    for chunk in chunks: 
            
        # Create 0.5 seconds silence chunk 
        chunk_silent = AudioSegment.silent(duration = 1000) 

        # add 0.5 sec silence to beginning and 
        # end of audio chunk. This is done so that 
        # it doesn't seem abruptly sliced. 
        audio_chunk = chunk_silent + chunk + chunk_silent 

        # export audio chunk and save it in 
        # the current directory. 
        print("saving chunk{0}.wav".format(i)) 
        # specify the bitrate to be 192 k 
        audio_chunk.export("./chunk{0}.wav".format(i), bitrate ='192k', format ="wav") 

        # the name of the newly created chunk 
        filename = 'chunk'+str(i)+'.wav'

        print("Processing chunk "+str(i)) 

        # get the name of the newly created chunk 
        # in the AUDIO_FILE variable for later use. 
        file = filename 

        # create a speech recognition object 
        r = sr.Recognizer() 

        # recognize the chunk 
        with sr.AudioFile(file) as source: 
            # remove this if it is not working 
            # correctly. 
            r.adjust_for_ambient_noise(source) 
            audio_listened = r.listen(source) 

        try: 
            # try converting it to text 
            rec = r.recognize_google(audio_listened) 
            # write the output to the file. 
            fh.write(rec+". ") 

        # catch any errors. 
        except sr.UnknownValueError: 
            print("Could not understand audio") 

        except sr.RequestError as e: 
            print("Could not request results. check your internet connection") 

        i += 1

    os.chdir('..') 

In [27]:
if __name__ == '__main__': 
        
    print('Enter the audio file path') 

    path = input() 

    silence_based_conversion(path) 

Enter the audio file path
/Users/joey/Desktop/Luck/KZ-and-Amy.wav
saving chunk0.wav
Processing chunk 0
Could not understand audio


Function "could not understand audio"

## Third function: breaking into chunks again

In [47]:
# importing libraries 
import speech_recognition as sr 
import os 
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object
r = sr.Recognizer()

# a function that splits the audio file into chunks
# and applies speech recognition
def get_large_audio_transcription(path):
    """
    Splitting the large audio file into chunks
    and apply speech recognition on each of these chunks
    """
    # open the audio file using pydub
    sound = AudioSegment.from_wav(path)  
    # split audio sound where silence is 700 miliseconds or more and get chunks
    chunks = split_on_silence(sound,
        # experiment with this value for your target audio file
        min_silence_len = 1250,
        # adjust this per requirement
        silence_thresh = sound.dBFS-4,
        # keep the silence for 1 second, adjustable as well
        keep_silence=100,
    )
    folder_name = "audio-chunks"
    # create a directory to store the audio chunks
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)
    whole_text = ""
    # process each chunk 
    for i, audio_chunk in enumerate(chunks, start=1):
        # export audio chunk and save it in
        # the `folder_name` directory.
        chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
        audio_chunk.export(chunk_filename, format="wav")
        # recognize the chunk
        with sr.AudioFile(chunk_filename) as source:
            audio_listened = r.record(source)
            # try converting it to text
            try:
                text = r.recognize_google(audio_listened)
            except sr.UnknownValueError as e:
                print("Error:", str(e))
            else:
                text = f"{text.capitalize()}. "
                print(chunk_filename, ":", text)
                whole_text += text
    # return the text for all chunks detected
    return whole_text

get_large_audio_transcription('/Users/joey/Desktop/Luck/KZ-and-Amy.wav')

In [48]:
get_large_audio_transcription('/Users/joey/Desktop/Luck/KZ-and-Amy.wav')

audio-chunks/chunk1.wav : My first question. 
audio-chunks/chunk2.wav : But how do you say christ working through. 
Error: 
Error: 
Error: 
Error: 
audio-chunks/chunk7.wav : On it. 
audio-chunks/chunk8.wav : Butt lake mary lake sorority sucker like external club shows be like this person like has having connections like catholicism the path. 
Error: 
audio-chunks/chunk10.wav : But i like remember it happening so much. 
audio-chunks/chunk11.wav : So cool that. 
audio-chunks/chunk12.wav : Like happen like finds out things about them. 
audio-chunks/chunk13.wav : Another thing i think. 
Error: 
audio-chunks/chunk15.wav : I don't know to confide things and like exclusive friends and that's just like really awesome to see you like catherine doesn't have to like me and only on herself. 
audio-chunks/chunk16.wav : Yeah how do you. 
audio-chunks/chunk17.wav : I think one thing i've always admired about amy but i definitely. 
Error: 
Error: 
audio-chunks/chunk20.wav : Feel like amy is one. 


"My first question. But how do you say christ working through. On it. Butt lake mary lake sorority sucker like external club shows be like this person like has having connections like catholicism the path. But i like remember it happening so much. So cool that. Like happen like finds out things about them. Another thing i think. I don't know to confide things and like exclusive friends and that's just like really awesome to see you like catherine doesn't have to like me and only on herself. Yeah how do you. I think one thing i've always admired about amy but i definitely. Feel like amy is one. "

This has been the best attempt so far... not saying much, however

Information on `split_on_silence` parameters:

- min_silence_len - (in ms) minimum length of a silence to be used for
    a split. default: 1000ms

- silence_thresh - (in dBFS) anything quieter than this will be
    considered silence. default: -16dBFS

- keep_silence - (in ms) amount of silence to leave at the beginning
    and end of the chunks. Keeps the sound from sounding like it is
    abruptly cut off. (default: 100ms)

After adjusting parameters for `split_on_silence`, here are the results:

(1000,16,100) - default
- 9 chunks, horrible translation

(500,16,100)
- 17 chunks, horrible translation

(250,16,100)
- 27 chunks, horrible translation

(1000,8,100)
- 15 chunks, okay translation

(1000,12,100)
- 10, okay translation

(500,8,100)
- 22 chunks, horrible translation

(1500, 8,100)
- 7 chunks, okay translation

(1250,4,100)
- 12 short chunks, bad translation

### I have decided to try another function... this one cannot seem to be tuned. The translations are not accurate at all for this file.

## Breaking into chunks is a good idea... we just need a better source of translation

another API - 