
# **AutoTrack: Spotify Song Recommendation Model Based on Facial and Image Features**

*CIS545 Final Project by Navya Janga and Matthew Pearl*



---

         

**Project Description**

This project aims to generate a song that is fitting to a given movie clip based on actor age and emotion of the image. The overall goal of this code is to eventually create a software that will auto-generate soundtracks for entire movies. 

# Imports and Co-Lab Set Up

In [7]:
import os
from os import listdir
from google.colab import drive

import gzip
import tarfile
import glob
import shutil

import pandas as pd
import numpy as np

import torch
from torchvision import datasets, transforms
import tensorflow

import math
import random

import matplotlib.pyplot as plt
import seaborn as sns

import PIL
from PIL import Image, ImageOps
from keras.preprocessing.image import img_to_array, load_img

In [8]:
# Using computer GPU for model training instead of colab space (****CHECKHW5*******)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Mount Google Drive -- datasets too big to load into colab directly 
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Part 1: CNN Models for Facial Age and Emotion Detection

We chose to use convolutional neural network models since they are extremely effective at image classification. The main benefit of CNNs over feed forward neural network models is that they take into account the relative positions of pixels and can be applied to images with different numbers of channels (e.g., 3 for RGB or 1 for grayscale.)

## 1.1: CNN Functions 

Here we define a series of general CNN functions that will be implemented later in this section.

In [9]:
# Intializing the CNN model
def initialize_cnn_model(in_channels, out_channels, kernel_size, stride, in_features, out_features):
  cnn_model = torch.nn.Sequential(
      torch.nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride),
      torch.nn.ReLU(),
      torch.nn.MaxPool2d(kernel_size=kernel_size-1),
      torch.nn.Flatten(start_dim=1),
      torch.nn.Linear(in_features=in_features, out_features=out_features)
  )
  return cnn_model

In [10]:
# training loop for CNN model
def train_cnn_model(cnn_model, optimizer, criterion, epochs, train_loader):
  for child in cnn_model.children():
    if hasattr(child, 'reset_parameters'):
      child.reset_parameters()
  cnn_model.to(device)
  cnn_model.train()

  for epoch in range(epochs):
    running_loss = 0.0
    for data, labels in train_loader:
      data, labels = data.to(device), labels.to(device).long()
      outputs = cnn_model(data)
      optimizer.zero_grad()
      loss = criterion(outputs, labels)
      loss.backward()
      optimizer.step()
      running_loss += loss.item()
    print('epoch: {}, loss: {}'.format(epoch + 1, running_loss / len(train_loader)))
    
    total, correct = 0, 0
    with torch.no_grad():
      _, predicted = torch.max(outputs.data, 1)
      total += labels.size(0)
      correct += (predicted == labels).sum().item()
      accuracy = correct / total
    print('epoch: {}, accuracy: {}'.format(epoch + 1, accuracy))

In [11]:
# testing loop for CNN model
def test_cnn_model(cnn_model, criterion, test_loader):
  cnn_model.eval()
  total, correct = 0, 0
  running_loss = 0.0
  with torch.no_grad():
    for data, labels in test_loader:
      data, labels = data.to(device), labels.to(device).long()
      outputs = cnn_model(data)
      loss = criterion(outputs, labels)
      _, predicted = torch.max(outputs.data, 1)
      total += labels.size(0)
      correct += (predicted == labels).sum().item()
      running_loss += loss.item()
  print('loss:', running_loss / len(test_loader), 'accuracy:', correct / total)

## 1.2: Facial Age Model 
In this section, we use the UTKFace Dataset. Specifically, we are using the "Aligned & Cropped Faces" File. This dataset is further explored in [Appendix 1: Facial Age EDA](https://colab.research.google.com/drive/1W7CBX02pssbdHSQrHFdoU8XORCe3wxSR?usp=sharing). 

In [12]:
from google.colab import files
uploaded = files.upload()

Saving crop_part1.tar.gz to crop_part1.tar.gz


**Test and Train Data** 

In [13]:
!tar -xf '/content/crop_part1.tar.gz'

archive = tarfile.open('/content/crop_part1.tar.gz')
archive.extractall()
file_names = archive.getnames()
for i in range(len(file_names)):
  file_names[i] = '/content/' + file_names[i]
age_img_paths = file_names
archive.close()

# shuffle the images to create a random train and test dataset
random.shuffle(age_img_paths)

# train: 80%, test: 20%
split_idx = int(len(age_img_paths) * 0.8)

age_train_paths = age_img_paths[:split_idx]
age_test_paths = age_img_paths[split_idx:]

# function to convert an image into a 3 x 200 x 200 tensor
def age_img_to_tensor(img_path):
  try:
    return torch.Tensor((np.asarray(load_img(img_path)) / 255).reshape(3, 200, 200))
  except:
    return torch.Tensor((np.zeros((3, 200, 200))))

# return the decade of the image's subject "teenage years" (+15 years from birth)
# to make a rough estimate of the type of music they listened to
def get_decade_class_from_path(img_path):
  try:
    age = int(img_path.split('/')[3].split('_')[0])
    decade = min(int(math.ceil((2021 - age + 15) / 10.0)) * 10, 2020)
    return int((decade - 1930) / 10)
  except:
    return 7

age_train_data_arrays = np.array([age_img_to_tensor(img_path).numpy() for img_path in age_train_paths]) 
age_train_data_labels = np.array([get_decade_class_from_path(img_path) for img_path in age_train_paths])

age_test_data_arrays = np.array([age_img_to_tensor(img_path).numpy() for img_path in age_test_paths]) 
age_test_data_labels = np.array([get_decade_class_from_path(img_path) for img_path in age_test_paths])


In [14]:
from torch.utils.data import TensorDataset, DataLoader

In [15]:
# each dataset is composed of 3 x 200 x 200 tensors representing images from the age dataset
age_training_dataset = TensorDataset(torch.Tensor(age_train_data_arrays), torch.Tensor(age_train_data_labels))
age_testing_dataset = TensorDataset(torch.Tensor(age_test_data_arrays), torch.Tensor(age_test_data_labels))

age_train_loader = DataLoader(age_training_dataset, batch_size=128, shuffle=True)
age_test_loader = DataLoader(age_testing_dataset, batch_size=1, shuffle=False)

**CNN Model**

In [16]:
# age CNN model parameters
age_in_channels = 3 # for the 3 RGB channels
age_out_channels = 64
age_kernel_size = 4
age_stride = 2
age_in_features = 33 * 33 * 64 # from the convnet calculator based on the chosen kernel, stride and out_channels sizes
age_out_features = len(set(age_train_data_labels).union(set(age_test_data_labels))) # number of possible classes

age_cnn_model = initialize_cnn_model(age_in_channels, age_out_channels, age_kernel_size, age_stride, age_in_features, age_out_features)

# we are using a cross entropy loss function and an Adam optimizer with a learning rate of 1e-3
age_criterion = torch.nn.CrossEntropyLoss()
age_optimizer = torch.optim.Adam(age_cnn_model.parameters(), lr=1e-3)

age_epochs = 15

In [17]:
train_cnn_model(age_cnn_model, age_optimizer, age_criterion, age_epochs, age_train_loader)

epoch: 1, loss: 3.2231777791054017
epoch: 1, accuracy: 0.5
epoch: 2, loss: 1.4672459671574254
epoch: 2, accuracy: 0.625
epoch: 3, loss: 1.279827279429282
epoch: 3, accuracy: 0.5625
epoch: 4, loss: 1.1930627803648672
epoch: 4, accuracy: 0.5625
epoch: 5, loss: 1.1485433347763554
epoch: 5, accuracy: 0.5625
epoch: 6, loss: 1.106885515874432
epoch: 6, accuracy: 0.625
epoch: 7, loss: 1.0807066425200431
epoch: 7, accuracy: 0.5
epoch: 8, loss: 1.0289070231299247
epoch: 8, accuracy: 0.625
epoch: 9, loss: 1.0094760944766383
epoch: 9, accuracy: 0.375
epoch: 10, loss: 0.9519179357636359
epoch: 10, accuracy: 0.625
epoch: 11, loss: 0.9183795067571825
epoch: 11, accuracy: 0.75
epoch: 12, loss: 0.8687595744286815
epoch: 12, accuracy: 0.6875
epoch: 13, loss: 0.8125473674266569
epoch: 13, accuracy: 0.625
epoch: 14, loss: 0.7871734672977079
epoch: 14, accuracy: 0.75
epoch: 15, loss: 0.7444342682438512
epoch: 15, accuracy: 0.625


In [18]:
torch.save(age_cnn_model.state_dict(), 'age_checkpoint.pth')

In [19]:
age_criterion = torch.nn.CrossEntropyLoss()
test_cnn_model(age_cnn_model, age_criterion, age_test_loader)

loss: 1.280514815858302 accuracy: 0.575370464997445


## 1.3: Facial Emotion Model

This section creates a model for the Facial Emotion dataset, which is further explored in Appendix 2: Facial Emotion Model. 



**Test and Train Data**

In [20]:
# since the grayscale pictures come as strings of pixel values, we need to convert
# these strings into a tensor
def get_pixel_tensor(pixel_str):
  return torch.Tensor([float(s) / 255 for s in pixel_str.split(' ')])

# read in the ICML face data as a Pandas dataframe
emotions_df = pd.read_csv('/content/drive/Shared drives/CIS545 Final Project/icml_face_data.csv', skip_blank_lines=False)
emotions_df[' pixels'] = emotions_df[' pixels'].apply(lambda pixel_str: get_pixel_tensor(pixel_str))

# 2 separate dataframes for training and testing
emotions_train_df = emotions_df[emotions_df[' Usage'] == 'Training']
emotions_test_df = emotions_df[emotions_df[' Usage'] == 'PrivateTest']

# generate training and testing data
# the inputs are 1 x 48 x 48 tensors representing the images in the dataset
# the outputs are integer encodings of the emotion represented in the images
emotions_train_arrays = np.array([np.resize(t.numpy(), (1, 48, 48)) for t in emotions_train_df[' pixels'].to_list()])
emotions_train_emotions = np.array(emotions_train_df['emotion'].tolist())

emotions_test_arrays = np.array([np.resize(t.numpy(), (1, 48, 48)) for t in emotions_test_df[' pixels'].to_list()])
emotions_test_emotions = np.array(emotions_test_df['emotion'].tolist())

In [21]:
# generate PyTorch datasets and data loaders for training/testing based on the previously acquired data
emotions_training_dataset = TensorDataset(torch.Tensor(emotions_train_arrays), torch.Tensor(emotions_train_emotions))
emotions_testing_dataset = TensorDataset(torch.Tensor(emotions_test_arrays), torch.Tensor(emotions_test_emotions))

emotions_train_loader = DataLoader(emotions_training_dataset, batch_size=128, shuffle=True)
emotions_test_loader = DataLoader(emotions_testing_dataset, batch_size=1, shuffle=False)

**CNN Model**

In [22]:
# emotion CNN parameters
emotions_in_channels = 1 # 1 channel because the image is grayscale
emotions_out_channels = 12
emotions_kernel_size = 2
emotions_stride = 1
emotions_in_features = 26508 # from the convnet calculator
emotions_out_features = 7 # number of possible emotion classes

# we are using a cross entropy loss function and an Adam optimizer with a learning rate of 1e-3
emotions_cnn_model = initialize_cnn_model(emotions_in_channels, emotions_out_channels, emotions_kernel_size, emotions_stride, emotions_in_features, emotions_out_features)
emotions_criterion = torch.nn.CrossEntropyLoss()
emotions_optimizer = torch.optim.Adam(emotions_cnn_model.parameters(), lr=1e-3)

emotions_epochs = 15

In [23]:
train_cnn_model(emotions_cnn_model, emotions_optimizer, emotions_criterion, emotions_epochs, emotions_train_loader)

epoch: 1, loss: 1.8205373403761123
epoch: 1, accuracy: 0.3783783783783784
epoch: 2, loss: 1.616312017440796
epoch: 2, accuracy: 0.35135135135135137
epoch: 3, loss: 1.5494705756505331
epoch: 3, accuracy: 0.3783783783783784
epoch: 4, loss: 1.5068836238649157
epoch: 4, accuracy: 0.43243243243243246
epoch: 5, loss: 1.4679030270046658
epoch: 5, accuracy: 0.5675675675675675
epoch: 6, loss: 1.4352569823794894
epoch: 6, accuracy: 0.4864864864864865
epoch: 7, loss: 1.4144621430502997
epoch: 7, accuracy: 0.24324324324324326
epoch: 8, loss: 1.3888413344489203
epoch: 8, accuracy: 0.32432432432432434
epoch: 9, loss: 1.3635575803120932
epoch: 9, accuracy: 0.4594594594594595
epoch: 10, loss: 1.3388655461205377
epoch: 10, accuracy: 0.4864864864864865
epoch: 11, loss: 1.3262824508878919
epoch: 11, accuracy: 0.5405405405405406
epoch: 12, loss: 1.2968372270796034
epoch: 12, accuracy: 0.4594594594594595
epoch: 13, loss: 1.2821579684151543
epoch: 13, accuracy: 0.43243243243243246
epoch: 14, loss: 1.2526946

In [24]:
emotions_criterion = torch.nn.CrossEntropyLoss()
test_cnn_model(emotions_cnn_model, emotions_criterion, emotions_test_loader)

loss: 1.5343169760920148 accuracy: 0.4134856505990527


# Part 2: Music Selection for Video Clips

## Part 2.1: Get Spotify Data Ready

In this section, we will take out Spotify Dataset, merge it with genre, and take the 2000 most popular songs. The EDA and cleaning for this section can be found in [Appendix 3: Spotify EDA](https://colab.research.google.com/drive/1HwJVahChbeW6yxtr92s2bqAlvHkMuCzA?usp=sharing). 



In [25]:
# read in the spotify data as a Pandas dataframe
path_to_spotify_data = '/content/drive/Shared drives/CIS545 Final Project/spotify_data.csv'
spotify_df = pd.read_csv(path_to_spotify_data, error_bad_lines=False).dropna().drop_duplicates()

# convert any numerical features to a numeric type
cols_to_numeric = ['acousticness', 'danceability', 'duration_ms', 'energy', 'explicit', 'instrumentalness', 'key', 'liveness', 'loudness', 'mode', 'popularity', 'year']
spotify_df[cols_to_numeric] = spotify_df[cols_to_numeric].apply(pd.to_numeric)

# utility function to convert a song's year to the decade it was released
def year_to_decade(year):
  return year - (year % 10)

# apply the utility function to the dataframe
spotify_df['year'] = spotify_df['year'].apply(lambda y: year_to_decade(y))

In [26]:
# read in the top two thousand songs data as a Pandas dataframe
path_to_top_two_thousand = '/content/drive/Shared drives/CIS545 Final Project/Spotify-2000.csv'
df_top_two_thousand = pd.read_csv(path_to_top_two_thousand).dropna().drop_duplicates()[['Title', 'Artist', 'Top Genre']]

In [27]:
# a list of more general genres to simplify the two thousand top songs data
general_genres = ['rock', 'jazz', 'hip hop', 'pop', 'alternative', 'adult standards', 'folk', 'indie', 'metal']

In [28]:
# given a genre in the actual dataframe, we convert it to a general genre if the name 
# of the general genre appears in that of the actual - otherwise, we return 'other'
def convert_to_generic_genre(genre):
  for g in general_genres:
    if g in genre:
      return g
  return 'other'

# apply the function to the dataframe in order to only have the more general genres
df_top_two_thousand['Top Genre'] = df_top_two_thousand['Top Genre'].apply(lambda g: convert_to_generic_genre(g))
df_top_two_thousand['Artist'] = df_top_two_thousand['Artist'].apply(lambda a: [a])

In [29]:
# a mapping from the general genres to an integer
genre_encoding = {}

for i in range(len(general_genres)):
  genre_encoding.update({general_genres[i]: i})
genre_encoding.update({'other': 9})

In [30]:
# merge the top two thousand songs dataframe on the original spotify dataframe
merged_df = df_top_two_thousand.merge(spotify_df, left_on='Title', right_on='name').drop_duplicates(subset=['Title'])
merged_df = merged_df.drop(['name'], axis=1)
merged_df['genre encoding'] = merged_df['Top Genre'].apply(lambda g: genre_encoding.get(g))

## Part 2.2: Video Clip Information

In [31]:
from PIL import Image, ImageOps
import cv2
from collections import defaultdict
import operator

In [32]:
emotions_map = {
  0: 'Angry', 
  1: 'Disgust', 
  2: 'Fear', 
  3: 'Happy', 
  4: 'Sad', 
  5: 'Surprise', 
  6: 'Neutral'
}

This is the crux of how our CNN models are applied to a movie clip. We begin by looping through each frame in the given mp4 file and consider those with a face using opencv's facial detection functionality. For each frame with a face, we predict the movie character's age and emotion and create a distribution for both. Finally, we return the age and emotion prediction with the maximum frequency. Given an approximately 45% accuracy for emotion detection and 55% accuracy for age determination, we can expect that for a clip with a sufficient amount of frames with faces, the majority of the distribution will often be correct.

In [33]:
def get_clip_info(path_to_clip):
  emotion_tensors = []
  age_tensors = []

  face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

  capture = cv2.VideoCapture(path_to_clip)
  success, image = capture.read()
  while success:
    try:
      # read in the image from the frame, convert to grayscale, and detect faces
      success, image = capture.read()
      image_grayscale = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
      faces = face_cascade.detectMultiScale(image_grayscale, 1.1, 4)
      # only focus on frames where 1 face is detected
      if len(faces) == 1:
        for (x, y, w, h) in faces:
          cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 0), 2)
        # create a 1 x 48 x 48 tensor representation of the image for the emotion CNN
        emotion_img = ImageOps.grayscale(Image.fromarray(image[y:y+w, x:x+h], 'RGB')).resize((48, 48))
        emotion_tensor = torch.Tensor(np.array(emotion_img).reshape(1, 48, 48) / 255)
        emotion_tensors.append(emotion_tensor)
        # create a 3 x 200 x 200 tensor representation of the image for the age CNN
        age_img = Image.fromarray(image[y:y+w, x:x+h], 'RGB').resize((200, 200))
        age_tensor = torch.Tensor(np.array(age_img).reshape(3, 200, 200) / 255)
        age_tensors.append(age_tensor)
    except:
      pass

  # create a distribution of the predicted emotion and decade in the entire clip
  emotion_frequencies = defaultdict( int )
  age_frequencies = defaultdict( int )

  for tensor in emotion_tensors:
    outputs = emotions_cnn_model(tensor.unsqueeze(0))
    _, predicted = torch.max(outputs.data, 1)
    emotion_frequencies[predicted.item()] += 1
  
  for tensor in age_tensors:
    outputs = age_cnn_model(tensor.unsqueeze(0))
    _, predicted = torch.max(outputs.data, 1)
    age_frequencies[predicted.item()] += 1

  # the returned predicted emotion and decade will be those with the highest frequency
  # assuming both models' accuracy is around ~45-60%, we can expect this to be pretty
  # accurate for a sufficient number of frames with faces
  predicted_emotion = max(emotion_frequencies.items(), key=operator.itemgetter(1))[0]
  predicted_decade = max(age_frequencies.items(), key=operator.itemgetter(1))[0]

  print({'emotion encoding': predicted_emotion, 'emotion': emotions_map.get(predicted_emotion), 'music decade': predicted_decade * 10 + 1930})
  return {'emotion encoding': predicted_emotion, 'emotion': emotions_map.get(predicted_emotion), 'music decade': predicted_decade * 10 + 1930}

## Part 2.3 Song Selection 

This section contains the functionality to make a song selection given the information returned from analyzing the frames of the movie clip. We begin by simply filtering the spotify dataset to songs released in the predicted decade for the character in the movie clip. Next, upon researching how songs and emotion relate, we made decisions about how to further narrow down the dataset based on the following rules: <br><br>

sad songs: low valence, low tempo <br>
happy/surprise songs: high valence, high tempo <br>
angry/disgust/fear songs: songs in the metal genre <br>
neutral songs: 0.45 <= valence <= 0.55, low danceability <br>

In [34]:
# to view all dataframe columns when printed (used for testing)
pd.options.display.max_columns = None

In [35]:
# based on the model predictions and selected song, returns a string for user interaction
def get_song_string(title, artist, clip_info_dict, has_decade):
  age_str = 'and based on their age probably likes music from the {}s.'.format(clip_info_dict.get('music decade')) if has_decade else 'The character would probably like music from the {}s but there are no songs from this decade'.format(clip_info_dict.get('music decade'))
  return "AutoTrack suggests {} by {} for the clip you input because the character's emotion appears to be {}\n".format(title, artist, clip_info_dict.get('emotion')) + age_str

# given the predictions from the CNN models, this function implements the previously
# discussed rules and, once filtered, picks a random song from the remaining dataframe
def select_song(clip_info_dict):
  music_decade = clip_info_dict.get('music decade')
  emotion = clip_info_dict.get('emotion')

  if emotion == 'Sad':
    has_decade = False
    sad_df = merged_df.sort_values(by=['valence'])
    if len(sad_df[sad_df['year'] == music_decade]) > 0:
      sad_df = sad_df[sad_df['year'] == music_decade]
      has_decade = True
    sad_df = sad_df.sort_values(by=['tempo']).head(min(20, len(sad_df)))
    max_idx = min(20, len(sad_df))
    rand_idx = random.randrange(0, max_idx)
    title = sad_df.iloc[rand_idx]['Title']
    artist = sad_df.iloc[rand_idx]['Artist'][0]
    return get_song_string(title, artist, clip_info_dict, has_decade)
  
  elif emotion == 'Happy' or emotion == 'Surprise':
    has_decade = False
    happy_df = merged_df[(merged_df['Top Genre'] == 'pop') & (merged_df['mode'] == 1)].sort_values(by=['valence'], ascending=False)
    if len(happy_df[happy_df['year'] == music_decade]) > 0:
      happy_df = happy_df[happy_df['year'] == music_decade]
      has_decade = True
    happy_df = happy_df.sort_values(by=['tempo'], ascending=False).head(min(20, len(happy_df)))
    max_idx = min(20, len(happy_df))
    rand_idx = random.randrange(0, max_idx)
    title = happy_df.iloc[rand_idx]['Title']
    artist = happy_df.iloc[rand_idx]['Artist'][0]
    return get_song_string(title, artist, clip_info_dict, has_decade)

  elif emotion == 'Angry' or emotion == 'Disgust' or emotion == 'Fear':
    has_decade = False
    metal_df = merged_df[merged_df['Top Genre'] == 'metal']
    if len(metal_df[metal_df['year'] == music_decade]) > 0:
      metal_df = metal_df[metal_df['year'] == music_decade]
      has_decade = True
    max_idx = min(20, len(metal_df))
    rand_idx = random.randrange(0, max_idx)
    title = metal_df.iloc[rand_idx]['Title']
    artist = metal_df.iloc[rand_idx]['Artist'][0]
    return get_song_string(title, artist, clip_info_dict, has_decade)
  
  elif emotion == 'Neutral':
    has_decade = False
    neutral_df = merged_df[(merged_df['valence'] >= 0.45) & (merged_df['valence'] <= 0.55)].sort_values(by=['danceability'])
    if len(neutral_df[neutral_df['year'] == music_decade]) > 0:
      neutral_df = neutral_df[neutral_df['year'] == music_decade]
      has_decade = True
    max_idx = min(20, len(neutral_df))
    rand_idx = random.randrange(0, max_idx)
    title = metal_df.iloc[rand_idx]['Title']
    artist = metal_df.iloc[rand_idx]['Artist'][0]
    return get_song_string(title, artist, clip_info_dict, has_decade)


In [36]:
# the main function - given a path to an mp4 file, AutoTrack will pair it with a song
def pick_song(path_to_clip):
  clip_info = get_clip_info(path_to_clip)
  return select_song(clip_info)

In [37]:
path_to_clip = 'giphy_7.mp4' # ENTER YOUR MP4 PATH HERE
print(pick_song(path_to_clip))

{'emotion encoding': 4, 'emotion': 'Sad', 'music decade': 2020}
AutoTrack suggests Nine Million Bicycles by Katie Melua for the clip you input because the character's emotion appears to be Sad
and based on their age probably likes music from the 2020s.


# Part 3: AutoTrack in Action

Here we have displayed results of AutoTrack on two GIFs that we ran through the code. In order for AutoTrack to work, the input GIF must be in MP4 format. 

## Part 3.1 Happy GIF

![](https://media.giphy.com/media/3og0ICmyySyzbmnxqE/giphy.gif)

{'emotion encoding': 3, 'emotion': 'Happy', 'music decade': 1990}<br>
AutoTrack suggests Man in the Mirror by Michael Jackson for the clip you input because the character's emotion appears to be Happy<br>
and based on their age probably likes music from the 1990s.<br>

## Part 3.2 Sad GIF


![](https://media.giphy.com/media/hmaIFbMUjVaZG/giphy.gif)

{'emotion encoding': 4, 'emotion': 'Sad', 'music decade': 2020}<br>
AutoTrack suggests Nine Million Bicycles by Katie Melua for the clip you input because the character's emotion appears to be Sad<br>
and based on their age probably likes music from the 2020s.





---


# **Appendix**
This contains hyperlinks of Google Colabs that explore the datasets we used including EDA and testing. 

[1: Facial Age EDA](https://colab.research.google.com/drive/1W7CBX02pssbdHSQrHFdoU8XORCe3wxSR?usp=sharing)

[2: Facial Emotion EDA](https://colab.research.google.com/drive/1zUTpqxVV3NBYfWjwEkHQwvxiAFpV-uqU?usp=sharing)

[3: Spotify EDA](https://colab.research.google.com/drive/1HwJVahChbeW6yxtr92s2bqAlvHkMuCzA?usp=sharing)
