<a href="https://colab.research.google.com/github/pedro-de-bastos/Capstone/blob/master/Speech%20to%20Text%20Transcription%20and%20Sentiment%20Analysis%20Capstone%20Main%20Code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prototyping Speech-to-text Transcription and Sentiment Analysis for Instructor Tone Study

## Talk Time Pre-Processing


## Transcription

### Installing All Necessary Libraries

In [None]:
#Transcription Prep - run this cell first
#Note: this cell may ask you to reset the runtime once after it completes

import os
!pip install pydub
!apt install libasound2-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg
!pip install google-cloud-speech
!pip install SpeechRecognition
!pip install google.cloud
!pip install pydub

Reading package lists... Done
Building dependency tree       
Reading state information... Done
libportaudio2 is already the newest version (19.6.0-1).
libportaudiocpp0 is already the newest version (19.6.0-1).
portaudio19-dev is already the newest version (19.6.0-1).
libasound2-dev is already the newest version (1.1.3-5ubuntu0.5).
ffmpeg is already the newest version (7:3.4.6-0ubuntu0.18.04.1).
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 35 not upgraded.
Collecting google.cloud
  Using cached https://files.pythonhosted.org/packages/ba/b1/7c54d1950e7808df06642274e677dbcedba57f75307adf2e5ad8d39e5e0e/google_cloud-0.34.0-py2.py3-none-any.whl
Installing collected packages: google.cloud
Successfully installed google.cloud


In [None]:
#Import necessary libraries
import pandas as pd
import os
from os import path
from pydub import AudioSegment
import glob
import pydub
from google.cloud import speech_v1
from google.cloud.speech_v1 import enums
import io
from google.cloud.speech import types
from google.cloud import storage
from google.cloud import language_v1
from google.cloud.language_v1 import enums


#Note: below, I am passing a .json file containing my Google Credentials to a
#Environment variable. This is basically allowing you to access your Google
#Cloud console through this Python kernel. You'll have to pass your own 
#.json file location to this variable.
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/content/drive/My Drive/Capstone/instructor-langu-1590688665393-6d06375c82af.json'

### Introduction: What I'm Doing and How
In short, I am using Google Cloud for its Speech-to-text API. Google Cloud Speech-to-text has two main processes for transcribing text from an audio file: synchronous and asynchronous. Synchronous transcription is for short talks (below 1min) and asynchronous transcription is for longer talks (above 1 min). Below, I implement both.

### Importing the Windows Over Which the Instructor Spoke from a .csv File and Making Adjustments to the Formatting of that Data

In [None]:
#Note that the output of the CSV file with talk times for each professor/student
#has the timestamp expressed as a string. The function below will be used with 
#the Pandas ".apply" method to go through each string and transform it into a 
#float, expressed in miliseconds since the start of the lesson
def string_to_miliseconds(str):
  mili = 0
  lst = str.split(":")
  mili += float(lst[2])*1000
  mili += float(lst[1])*60*1000
  mili += float(lst[0])*60*60*1000
  return(mili)

def sec_to_milisec(secs):
  return secs*1000

def windows_talk(filename):
  foo = pd.read_csv(filename)
  foo = foo[foo['Names'].notnull()]

  foo.Timestamp = foo["Timestamp"].apply(string_to_miliseconds)
  foo["Duration Secs"] = foo["Duration Secs"].apply(sec_to_milisec)
  foo['End'] = foo["Timestamp"]+foo["Duration Secs"] #Creating a new column that indicates when the person stopped talking

  names_list = foo.Names.unique()
  windows_dict = {}

  for name in names_list:
    individual = foo['Names'] == name
    foo_filtered = foo[individual]
    windows_dict[name] = foo_filtered
  return windows_dict

### Cutting Out Instances of the Professor Speaking in Class


In [None]:
#Audio Pre-Processing: Cutting Audio into Instructor Talk Windows

def cut_audio(src, dst, windows_dict, instructor):                                                                     
  # importing the instructor's talk time windows and importing the audio file                                                          
  test = AudioSegment.from_wav(src)
  
  try:
    windows = windows_dict[instructor]
    
    for i in range(windows.shape[0]):
      test_slice = test[windows.iloc[i,2]:windows.iloc[i,3]]
      end = dst +'_slice_' + str(i) + ".wav"
      test_slice.export(end, format='wav')
  except:
    print("Instructor Name Not Valid")
  

### Joining the Extracted Professor Responses into One Long Audio For Transcription

In [None]:
def join_audios(src, dst):
  audios = []
  for file_name in glob.iglob(src+'_slice_**', recursive=True):
    audios.append(file_name)

  concat_audio = pydub.AudioSegment.empty()

  for audio in audios:
    concat_audio += pydub.AudioSegment.from_wav(audio)

  concat_audio.export(dst, format="wav")

### Defining a Function that Takes a .wav File and Transcribe it Using Google Cloud

In [None]:
#Recognizer for long files (over 1 min)

def sample_long_running_recognize(storage_uri):
    """
    Transcribe a long audio file using asynchronous speech recognition

    Args:
      local_file_path Path to local audio file, e.g. /path/audio.wav
    """

    client = speech_v1.SpeechClient()

    # local_file_path = 'resources/brooklyn_bridge.raw'

    # The language of the supplied audio
    language_code = "en-US"

    # Sample rate in Hertz of the audio data sent
    sample_rate_hertz = 44100

    # Encoding of audio data sent. This sample sets this explicitly.
    # This field is optional for FLAC and WAV audio formats.
    
    config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
    }

    audio = types.RecognitionAudio(uri = storage_uri)

    operation = client.long_running_recognize(config, audio)

    print(u"Waiting for operation to complete...")
    response = operation.result()
    transcript = []

    for result in response.results:
        # First alternative is the most probable result
        alternative = result.alternatives[0]
        transcript.append(u"{}".format(alternative.transcript))
    separator = ""
    transcribed = separator.join(transcript)
    return(transcribed)

### Uploading Audio File to the Cloud and Calling the Transcription Function

In [None]:
def upload_to_cloud(filename, bucketname):
  storage_client = storage.Client.from_service_account_json("/content/drive/My Drive/Capstone/instructor-langu-1590688665393-6d06375c82af.json")
  bucket = storage_client.get_bucket(bucketname)
  blob = bucket.blob(filename)
  blob.upload_from_filename(filename)

## Sentiment Analysis

### Introduction: Google Cloud Sentiment Analysis

Google Cloud has an API that analysis the sentiment in a sentence based on two measures they call the 'score' and the 'magnitude'.

The 'score' of a document's sentiment indicates the overall emotion of a document. In other words, it is the NET valence of emotion expressed in the document.

The magnitude of a document's sentiment indicates how much emotional content is present within the document, and this value is often proportional to the length of the document. In other words, it serves as a measure of the ratio of words in a document that express emotion (versus the words that are neutral in emotion).

In [None]:
def response_sentiment_analysis(string_response):
  text_content = string_response

  # Available types: PLAIN_TEXT, HTML
  type_ = enums.Document.Type.PLAIN_TEXT

  # Optional. If not specified, the language is automatically detected.
  # For list of supported languages:
  # https://cloud.google.com/natural-language/docs/languages
  language = "en"
  document = {"content": text_content, "type": type_, "language": language}

  # Available values: NONE, UTF8, UTF16, UTF32
  encoding_type = enums.EncodingType.UTF8

  client = language_v1.LanguageServiceClient()
  annotations = client.analyze_sentiment(document, encoding_type=encoding_type)
  return(annotations)

## Integration

The "Analyze" function takes a csv file name, an audio file name, and the instructor names, and performs (a) the transcription and (b) the sentiment analysis on that file.

In [None]:
def analyze(csvfile, audiofile, instructor, foldername):
  filename = audiofile.split("/")
  filename = filename[-1]
  text_file = filename[:-4]+".txt"
  print(text_file)
  if not glob.glob("/content/drive/My Drive/Capstone/"+foldername+"/Transcripts/"+text_file):
    windows_dict = windows_talk(csvfile)
    cut_audio(audiofile, filename, windows_dict, instructor)
    dst = "cut_"+ filename
    join_audios(filename, dst)
    upload_to_cloud(dst, "instructor_language")
    transcript = sample_long_running_recognize("gs://instructor_language/"+dst)
    text = open("/content/drive/My Drive/Capstone/"+foldername+"/Transcripts/"+text_file, "w")
    n = text.write(transcript)
    text.close()
  else:
    file = open("/content/drive/My Drive/Capstone/"+foldername+"/Transcripts/"+text_file)
    transcript = file.read().replace("\n", " ")
    file.close()
  sentiment_anly = response_sentiment_analysis(transcript)
  return (sentiment_anly.document_sentiment.magnitude, sentiment_anly.document_sentiment.score)



The "main" function performs the analysis on multiple classes, whose audios and csvs are used as inputs. Note that the function also requires one to input the "instructor" as a string that exactly matches the string used as an identifier in the CSV files.

In [None]:
def main(audios, csvs, instructor, foldername):
  if len(audios)!=len(csvs):
    print("The files are incomplete.")
    return None
  df = pd.DataFrame(index = range(len(audios)), columns=["Class", "Score", "Magnitude"])
  for i in range(len(audios)):
    score = analyze(csvs[i], audios[i], instructor, foldername)
    df.iloc[i, 0] = audios[i].split("/")[-1]
    df.iloc[i,2] = score[0]
    df.iloc[i,1] = score[1]
  return df

In [None]:
audios = glob.glob("/content/drive/My Drive/Capstone/McAllister/Audios/*")
csvs = glob.glob("/content/drive/My Drive/Capstone/McAllister/CSV/*")

data = main(audios, csvs, "Katie McAllister ", "McAllister")
data.head
data.to_csv("/content/drive/My Drive/Capstone/McAllister/results.csv")

In [None]:
audios = ['/content/drive/My Drive/Capstone/Terrana/Audios/Class_1.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_1.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_2.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_2.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_3.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_3.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_4.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_4.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_5.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_5.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_6.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_7.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_7.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_8.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_8.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_9.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_9.2.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_10.1.wav', '/content/drive/My Drive/Capstone/Terrana/Audios/Class_10.2.wav']
csvs = glob.glob("/content/drive/My Drive/Capstone/Terrana/CSV/*") 

data = main(audios, csvs, "Alex Terrana ", "Terrana")
data.head
data.to_csv("/content/drive/My Drive/Capstone/Terrana/results.csv")

In [None]:
audios = ['/content/drive/My Drive/Capstone/Dosmann/Audios/Class_1.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_1.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_2.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_2.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_3.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_3.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_4.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_4.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_5.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_5.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_6.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_7.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_7.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_8.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_8.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_9.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_9.2.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_10.1.wav', '/content/drive/My Drive/Capstone/Dosmann/Audios/Class_10.2.wav']
csvs = ['/content/drive/My Drive/Capstone/Dosmann/CSV/Class_1.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_1.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_2.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_2.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_3.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_3.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_4.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_4.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_5.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_5.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_6.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_7.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_7.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_8.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_8.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_9.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_9.2.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_10.1.csv', '/content/drive/My Drive/Capstone/Dosmann/CSV/Class_10.2.csv']

data = main(audios, csvs, "Andy Dosmann ", "Dosmann")
data.head
data.to_csv("/content/drive/My Drive/Capstone/Dosmann/results.csv")

In [None]:
audios = glob.glob("/content/drive/My Drive/Capstone/Fiorelli/Audios/*")
csvs = glob.glob("/content/drive/My Drive/Capstone/Fiorelli/CSV/*")

print(audios)
print(csvs)

['/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_1.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_1.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_2.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_2.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_3.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_8.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_8.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_9.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_9.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_10.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_10.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_3.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_4.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_4.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_5.1.wav', '/conte

In [None]:
audios = ['/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_1.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_1.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_2.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_2.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_3.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_3.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_4.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_4.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_5.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_5.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_6.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_7.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_7.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_8.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_8.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_9.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_9.2.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_10.1.wav', '/content/drive/My Drive/Capstone/Fiorelli/Audios/Class_10.2.wav']
csvs = ['/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_1.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_1.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_2.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_2.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_3.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_3.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_4.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_4.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_5.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_5.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_6.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_7.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_7.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_8.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_8.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_9.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_9.2.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_10.1.csv', '/content/drive/My Drive/Capstone/Fiorelli/CSV/Class_10.2.csv']

data = main(audios, csvs, "Lindsey Fiorelli ", "Fiorelli")
data.head
data.to_csv("/content/drive/My Drive/Capstone/Fiorelli/results.csv")

Class_1.1.txt
Waiting for operation to complete...
Class_1.2.txt
Waiting for operation to complete...
Class_2.1.txt
Waiting for operation to complete...
Class_2.2.txt
Waiting for operation to complete...
Class_3.1.txt
Waiting for operation to complete...
Class_3.2.txt
Waiting for operation to complete...
Class_4.1.txt
Waiting for operation to complete...
Class_4.2.txt
Waiting for operation to complete...
Class_5.1.txt
Waiting for operation to complete...
Class_5.2.txt
Waiting for operation to complete...
Class_6.2.txt
Waiting for operation to complete...
Class_7.1.txt
Waiting for operation to complete...
Class_7.2.txt
Waiting for operation to complete...
Class_8.1.txt
Waiting for operation to complete...
Class_8.2.txt
Waiting for operation to complete...
Class_9.1.txt
Waiting for operation to complete...
Class_9.2.txt
Waiting for operation to complete...
Class_10.1.txt
Waiting for operation to complete...
Class_10.2.txt
Waiting for operation to complete...




1.   Import csv file with talk time windows, turn it into a dictionary
2.   Import audio, transform it into a concatenated version of the audio with only the instructor's voice
3.   Upload audio to Goolge Cloud
4.   Run the transcription service, obtain the transcript
5.   Run the transcript through sentiment anaysis

