# AI-enhanced Speech Analytics Process

## Implemeting speech recognition
Implementing speech recognition using a TedEd 3 minutes talk titled 'Try something new for 30 days' by Matt Cutts as audio data

In [1]:
import speech_recognition as sr 
from pydub import AudioSegment
from pydub.playback import play

Converting the audio file to wav format and feeding it to Google speech recognition engine to get the transcript.

In [2]:
recognizer = sr.Recognizer()

try:
  audio_full_filename = 'TedEd - Try something new for 30 days.mp3'
  audio_filename = audio_full_filename[0:audio_full_filename.rfind('.')]
  raw_audios_directory = 'audios/raw'
  converted_audios_directory = 'audios/converted'

  # Converting the audio file to wav format
  audio_file = AudioSegment.from_file(f'{raw_audios_directory}/{audio_full_filename}')
  test = audio_file.export(f'{converted_audios_directory}/{audio_filename}.wav', format='wav')

  # Feeding the wav audio to Google speech recognition engine
  with sr.AudioFile(f'{converted_audios_directory}/{audio_filename}.wav') as source:
    audio_data = recognizer.record(source)
    recognized_text = recognizer.recognize_google(audio_data)
    print(f'Recognized text: {recognized_text}')

except sr.UnknownValueError():
  recognizer = sr.Recognizer()

Recognized text: a few years ago i felt like i was stuck in a rat so i decided to follow in the footsteps of the great american philosopher morgan spurlock and try something new for 30 days the idea is actually pretty simple think about something you've always wanted to add your life and try it for the next 30 days it turns out 30 days is just about the right amount of time to add a new habit or subtract the habit like watching the news from your life there's a few things that i learned while doing these 30 day challenges the first was instead of the months flying by forgotten the time was much more memorable this was part of a challenge i did to take a picture everyday for a month and i remember exactly where i was and what i was doing that day i also noticed that as i started to do more and harder 30 day challenges myself confidence grew i went from death dwelling computer nerd to the kind of guy who bikes to work for fun even last year i ended up hiking up mount kilimanjaro the high

## Natural Language Processing

In [3]:
import nltk
from nltk.tokenize import WhitespaceTokenizer
from nltk.corpus import stopwords
from textblob import TextBlob, WordList

# Download nltk models and datasets
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package wordnet to /home/cabrera/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/cabrera/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/cabrera/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

Word tokenizing the transcript

In [9]:
tokenized_recognized_text = recognized_text.split(' ')
tokenized_recognized_text[0: 10]

['a', 'few', 'years', 'ago', 'i', 'felt', 'like', 'i', 'was', 'stuck']

Getting the word and character count

In [11]:
print(f'Word count: {len(tokenized_recognized_text)}')
print(f'Character count: {len(recognized_text)}')

Word count: 471
Character count: 2334


### Lemmatizing words and removing stopwords

In [35]:
english_stopwords = stopwords.words('english')

recognized_text_without_stopwords = ''
recognized_text_stopwords = ''

for word in tokenized_recognized_text:
  if (word in english_stopwords):
    recognized_text_stopwords += ' {}'.format(word)
  else:
    recognized_text_without_stopwords += ' {}'.format(word)

recognized_text_without_stopwords

" years ago felt like stuck rat decided follow footsteps great american philosopher morgan spurlock try something new 30 days idea actually pretty simple think something always wanted add life try next 30 days turns 30 days right amount time add new habit subtract habit like watching news life there's things learned 30 day challenges first instead months flying forgotten time much memorable part challenge take picture everyday month remember exactly day also noticed started harder 30 day challenges confidence grew went death dwelling computer nerd kind guy bikes work fun even last year ended hiking mount kilimanjaro highest mountain africa would never adventurous started 30 day challenges also figured really want something badly enough anything 30 days ever wondered novel every november tense thousands people try write 50,000 word novel scratch 30 days turns right 1667 words day month way secret go sleep written words day might sleep deprived finish novel book next great american novel

Lemmatizing words

In [37]:
def lemmatize_words(text: str) -> TextBlob:
    tb_singularized_words = TextBlob(text).words.singularize()
    return TextBlob(" ".join(tb_singularized_words))

In [43]:
tb_recognized_text_without_stopwords = lemmatize_words(recognized_text_without_stopwords)
tb_recognized_text_stopwords = TextBlob(recognized_text_stopwords)

tb_recognized_text_without_stopwords

TextBlob("year ago felt like stuck rat decided follow footstep great american philosopher morgan spurlock try something new 30 day idea actually pretty simple think something alway wanted add life try next 30 day turn 30 day right amount time add new habit subtract habit like watching news life there ' thing learned 30 day challenge first instead month flying forgotten time much memorable part challenge take picture everyday month remember exactly day also noticed started harder 30 day challenge confidence grew went death dwelling computer nerd kind guy bike work fun even last year ended hiking mount kilimanjaro highest mountain africa would never adventurou started 30 day challenge also figured really want something badly enough anything 30 day ever wondered novel every november tense thousand person try write 50,000 word novel scratch 30 day turn right 1667 word day month way secret go sleep written word day might sleep deprived finish novel book next great american novel wrote month

Lemmatized recognized text and stopwords removed statistics

In [39]:
print(f'Word count: {len(tb_recognized_text_without_stopwords.words)}')
print(f'Character count: {len(tb_recognized_text_without_stopwords)}')

Word count: 236
Character count: 1440


10 most popular words (lematized)

In [42]:
sorted(
  tb_recognized_text_without_stopwords
    .word_counts
    .items(),
  key=lambda item: item[1],
  reverse=True
)[0: 10]

[('day', 15),
 ('30', 11),
 ('like', 5),
 ('challenge', 5),
 ('try', 4),
 ('something', 4),
 ('next', 4),
 ('month', 4),
 ('novel', 4),
 ('life', 3)]

10 most popular stopwords

In [46]:
sorted(
  tb_recognized_text_stopwords
    .word_counts
    .items(),
  key=lambda item: item[1],
  reverse=True
)[0: 10]

[('i', 27),
 ('to', 19),
 ('the', 19),
 ('a', 16),
 ('you', 13),
 ('for', 11),
 ('of', 9),
 ('it', 7),
 ('was', 6),
 ('and', 6)]