# Emotion detection from video text transcription

#### This notebook shows a proof of concept. If a user has a recorded video with clear audio, the emotion can be detected from the video using deep learning techniques to perform classification. Simply input a short video file and run the 'predict' function to detect the emotion contained in it. We can also predict emotion from text alone, but video transcription emotion detection was provided for extra novelty.

<i>Get dependencies</i>

In [2]:
!pip install SpeechRecognition

Collecting SpeechRecognition
  Downloading SpeechRecognition-3.10.2-py2.py3-none-any.whl (32.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m32.8/32.8 MB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: SpeechRecognition
Successfully installed SpeechRecognition-3.10.2


<i>Mount the Google Drive to load local files. Import the trained model and tokenizer. Send the model to the GPU for faster inference (if running on a GPU). Load the class names into a list.</i>

In [3]:
import moviepy.editor as mp
import speech_recognition as sr
from google.colab import drive
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_ckpt = "distilbert-base-uncased"
model_name = f"{model_ckpt}-finetuned-emotion"
drive.mount('/content/drive')
dir = f'./drive/MyDrive/ml_class_group_project/Rich/{model_name}/checkpoint-500'

class_names = ['sadness', 'joy', 'love', 'anger', 'fear', 'surprise']

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = AutoModelForSequenceClassification.from_pretrained(dir).to(device)
tokenizer = AutoTokenizer.from_pretrained(f'./drive/MyDrive/ml_class_group_project/Rich/{model_name}/checkpoint-500')

Mounted at /content/drive


<i>Inference function. Takes in a string as argument 'text' and dispenses with gradient calculation for fast inference. Tokenizes the input and sends it to the GPU if applicable. Generates a prediction by taking the argmax of the logits.

In [4]:
def predict(text):
    with torch.no_grad():
      inputs = tokenizer(text, return_tensors="pt").to(device)
      output = model(**inputs)
      pred_label = torch.argmax(output.logits, axis=-1)[0].item()
      return (pred_label, class_names[pred_label])

<i>Load a local video file</i>

In [5]:
test_video = mp.VideoFileClip('./drive/MyDrive/ml_class_group_project/Rich/videos/test.mov')

<i>Convert the video to an audio file</i>

In [6]:
audio_file = test_video.audio
audio_file.write_audiofile('test.wav')

MoviePy - Writing audio in test.wav


                                                                   

MoviePy - Done.




<i>Create an instance of a speech recognizer object</i>

In [7]:
r = sr.Recognizer()

<i>Convert the speech file to an audio data object readable by the speech recognizer</i>

In [8]:
with sr.AudioFile('test.wav') as source:
  test_data = r.record(source)

<i>Lets us view the video inline in the notebook</i>

In [9]:
import moviepy.editor
test_resized = test_video.resize(height=240)

<i>An example video is provided below for comparison with the transcribed text (no need to run this cell. Simply play the video below)</i>

In [10]:
moviepy.editor.ipython_display(test_resized, verbose=False)

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video __temp__.mp4





                                                              

Moviepy - Done !
Moviepy - video ready __temp__.mp4




<i>Convert the audio to text and print the resulting transcription</i>

In [None]:
test = r.recognize_google(test_data)
print(test)

this is an example video with speech to text recognition I love speech to text recognition


<i>Predict the most likely class for the transcribed text</i>

In [None]:
predict(test)

(1, 'joy')

# The process is repeated below for each class present in the dataset

#### <u>Anger</u>

In [11]:
anger_video = mp.VideoFileClip('./drive/MyDrive/ml_class_group_project/Rich/videos/anger.mp4')

In [None]:
anger_audio = anger_video.audio
anger_audio.write_audiofile('anger.wav')

MoviePy - Writing audio in anger.wav


                                                                 

MoviePy - Done.




In [None]:
with sr.AudioFile('anger.wav') as source:
  anger_data = r.record(source)

In [12]:
anger_resized = anger_video.resize(height=240)

In [13]:
moviepy.editor.ipython_display(anger_resized, verbose=False)

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video __temp__.mp4




                                                              

Moviepy - Done !
Moviepy - video ready __temp__.mp4




In [None]:
anger = r.recognize_google(anger_data)
print(anger)

I'm upset about my grade on the test


In [None]:
predict(anger)

(3, 'anger')

#### <u>Sadness</u>

In [14]:
sadness_video = mp.VideoFileClip('./drive/MyDrive/ml_class_group_project/Rich/videos/sadness.mp4')

In [None]:
sadness_audio = sadness_video.audio
sadness_audio.write_audiofile('sadness.wav')

t:  46%|████▌     | 56/123 [04:06<00:02, 25.28it/s, now=None]

MoviePy - Writing audio in sadness.wav



chunk:   0%|          | 0/97 [00:00<?, ?it/s, now=None][A
chunk: 100%|██████████| 97/97 [00:00<00:00, 961.69it/s, now=None][A
t:  46%|████▌     | 56/123 [04:07<00:02, 25.28it/s, now=None]

MoviePy - Done.


In [None]:
with sr.AudioFile('sadness.wav') as source:
  sadness_data = r.record(source)

In [15]:
sadness_resized = sadness_video.resize(height=240)

In [16]:
moviepy.editor.ipython_display(sadness_resized, verbose=False)

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video __temp__.mp4




                                                              

Moviepy - Done !
Moviepy - video ready __temp__.mp4




In [None]:
sadness = r.recognize_google(sadness_data)
print(sadness)

I'm really sad about my favorite restaurant closing


In [None]:
predict(sadness)

(0, 'sadness')

#### <u>Surprise</u>

In [17]:
surprise_video = mp.VideoFileClip('./drive/MyDrive/ml_class_group_project/Rich/videos/surprise.mp4')

In [None]:
surprise_audio = surprise_video.audio
surprise_audio.write_audiofile('surprise.wav')

t:  46%|████▌     | 56/123 [10:30<00:02, 25.28it/s, now=None]

MoviePy - Writing audio in surprise.wav



chunk:   0%|          | 0/124 [00:00<?, ?it/s, now=None][A
t:  46%|████▌     | 56/123 [10:30<00:02, 25.28it/s, now=None]

MoviePy - Done.


In [None]:
with sr.AudioFile('surprise.wav') as source:
  surprise_data = r.record(source)

In [18]:
surprise_resized = surprise_video.resize(height=240)

In [19]:
moviepy.editor.ipython_display(surprise_resized, verbose=False)

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video __temp__.mp4







Moviepy - Done !
Moviepy - video ready __temp__.mp4


In [None]:
surprise = r.recognize_google(surprise_data)
print(surprise)

I'm astonished that the Patriots didn't win the Superbowl this year


In [None]:
predict(surprise)

(5, 'surprise')

#### <u>Love</u>

In [20]:
love_video = mp.VideoFileClip('./drive/MyDrive/ml_class_group_project/Rich/videos/love.mp4')

In [None]:
love_audio = love_video.audio
love_audio.write_audiofile('love.wav')

t:  46%|████▌     | 56/123 [48:23<00:02, 25.28it/s, now=None]

MoviePy - Writing audio in love.wav



chunk:   0%|          | 0/108 [00:00<?, ?it/s, now=None][A
chunk:  49%|████▉     | 53/108 [00:00<00:00, 527.99it/s, now=None][A
chunk:  98%|█████████▊| 106/108 [00:00<00:00, 491.90it/s, now=None][A
t:  46%|████▌     | 56/123 [48:23<00:02, 25.28it/s, now=None]

MoviePy - Done.


In [None]:
with sr.AudioFile('love.wav') as source:
  love_data = r.record(source)

In [21]:
love_resized = love_video.resize(height=240)

In [22]:
moviepy.editor.ipython_display(love_resized, verbose=False)

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video __temp__.mp4




                                                              

Moviepy - Done !
Moviepy - video ready __temp__.mp4




In [None]:
love = r.recognize_google(love_data)
print(love)

I feel very close to my dog Baxter


In [None]:
predict(love)

(2, 'love')

#### <u>Fear</u>

In [23]:
fear_video = mp.VideoFileClip('./drive/MyDrive/ml_class_group_project/Rich/videos/fear.mp4')

In [None]:
fear_audio = fear_video.audio
fear_audio.write_audiofile('fear.wav')

MoviePy - Writing audio in fear.wav


                                                       

MoviePy - Done.




In [None]:
with sr.AudioFile('fear.wav') as source:
  fear_data = r.record(source)

In [24]:
fear_resized = fear_video.resize(height=240)

In [25]:
moviepy.editor.ipython_display(fear_resized, verbose=False)

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video __temp__.mp4




                                                            

Moviepy - Done !
Moviepy - video ready __temp__.mp4




In [None]:
fear = r.recognize_google(fear_data)
print(fear)

I'm very worried about losing my job


In [None]:
predict(fear)

(4, 'fear')

#### <u>Joy (a longer video example)</u>

In [26]:
long_joy_video = mp.VideoFileClip('./drive/MyDrive/ml_class_group_project/Rich/videos/long_joy_review.mp4')

In [None]:
long_joy_audio = long_joy_video.audio
long_joy_audio.write_audiofile('long_joy.wav')

t:  46%|████▌     | 56/123 [12:17<00:02, 25.28it/s, now=None]

MoviePy - Writing audio in long_joy.wav



chunk:   0%|          | 0/291 [00:00<?, ?it/s, now=None][A
chunk:  34%|███▍      | 100/291 [00:00<00:00, 997.36it/s, now=None][A
chunk:  69%|██████▊   | 200/291 [00:00<00:00, 711.54it/s, now=None][A
chunk:  95%|█████████▍| 276/291 [00:00<00:00, 670.15it/s, now=None][A
t:  46%|████▌     | 56/123 [12:18<00:02, 25.28it/s, now=None]

MoviePy - Done.


In [None]:
with sr.AudioFile('long_joy.wav') as source:
  long_joy_data = r.record(source)

In [27]:
long_joy_resized = long_joy_video.resize(height=240)

In [28]:
moviepy.editor.ipython_display(long_joy_resized, verbose=False)

Moviepy - Building video __temp__.mp4.
MoviePy - Writing audio in __temp__TEMP_MPY_wvf_snd.mp3




MoviePy - Done.
Moviepy - Writing video __temp__.mp4





                                                              

Moviepy - Done !
Moviepy - video ready __temp__.mp4




In [None]:
long_joy = r.recognize_google(long_joy_data)
print(long_joy)

I was really happy to receive this product in a mint condition it arrived exactly as described from amazon.com I'll be using it for many years to come and I will let you know what my comments are in the future


In [None]:
predict(long_joy)

(1, 'joy')

# Here, we test on some real Amazon text reviews that express emotion about products

The reviews are from the following page on Amazon:
<a href="https://www.amazon.com/Ring-Video-Doorbell-Venetian-Bronze-2020-Release/dp/B08N5NQ69J?ref_=Oct_DLandingS_D_7871bb64_0&th=1">https://www.amazon.com/Ring-Video-Doorbell-Venetian-Bronze-2020-Release/dp/B08N5NQ69J?ref_=Oct_DLandingS_D_7871bb64_0&th=1</a>

### An Amazon 1-star review, where the customer is not happy.

In [None]:
real_amazon_review_1_star = "I can't begin to tell you how frustrated I am with Ring.\
First, I bought the Ring (wired) video doorbell because it was on sale on the treasure truck. I removed my old, wired doorbell, measured the voltage to confirm that it was within the Ring doorbell specs, and I connected the wires, mounted the doorbell, and proceeded with the setup, only to get a flashing error code that the Ring troubleshooting guide described as \"your Ring video doorbell is not compatible with your home wiring.\" I unmounted the doorbell, checked the wiring, rechecked the voltage, and tried the setup again only to get the same error. So I removed everything, boxed it up, and sent it back."

In [None]:
predict(real_amazon_review_1_star)

(3, 'anger')

### An Amazon 5-star review, where the customer is very joyful.

In [None]:
real_amazon_review_5_stars = "I recently installed the Ring Video Doorbell at my home, and I'm thrilled with its performance. This device has truly simplified my life while providing added security to my doorstep.\
Installation was a breeze - I had it up and running in minutes. The sleek design blends seamlessly with my home's exterior, and the weather-resistant construction ensures durability.\
The HD camera delivers crisp video quality, allowing me to see and speak to visitors from anywhere using the Ring app on my smartphone. It's incredibly convenient, especially when I'm away from home or busy inside.\
The motion detection feature alerts me instantly to any activity at my door, providing peace of mind whether I'm expecting a delivery or keeping an eye on things while I'm out."

In [None]:
predict(real_amazon_review_5_stars)

(1, 'joy')