# Notebook for feature extraction from the MOVIES

The features that we want to extract are:
- brightness
- contrast
- saturation
- sound strength
- music presence

The features are extracted from the movies using the following methods:
- **brightness**:
    1. Divide the video in frames using: https://www.geeksforgeeks.org/python-program-extract-frames-using-opencv/
    2. Convert the image to HSV color space (the Value channel is an approximation for brightness)
    3. Sum up all the values of the pixels in the Value channel
    4. Divide that brightness sum by the area of the image, which is just the width times the height.
    5. This gave us one value: the average brightness of that image per each time stamp.
- **contrast**:
    1. Same as brightness just with contrast
- **saturation**:
    1. Same as brightness just with saturation
- **sound strength**:
    1. Follow: https://towardsdatascience.com/generate-any-sport-highlights-using-python-3695c98baead

In [2]:
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os

### **BRIGHTNESS**

1. Divide the video in frames using: https://www.geeksforgeeks.org/python-program-extract-frames-using-opencv/
2. Convert the image to HSV color space (the Value channel is an approximation for brightness)
3. Sum up all the values of the pixels in the Value channel
4. Divide that brightness sum by the area of the image, which is just the width times the height.
5. This gave us one value: the average brightness of that image per each time stamp.

In [6]:
PATH_MOVIES = '/Users/silviaromanato/Desktop/ServerMIPLAB/FilmFiles/'

for movie_name in os.listdir(PATH_MOVIES):
    MOVIE_PATH = PATH_MOVIES + movie_name
    break
MOVIE_PATH

'/Users/silviaromanato/Desktop/ServerMIPLAB/FilmFiles/You_Again_exp.mp4'

Extracting the frames from the first movie:

In [23]:
def convert_to_hsv(image):
    hsv_image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
    h, s, v = cv2.split(hsv_image)
    return h, s, v

average_brightness = []
average_saturation = []
def FrameCapture(MOVIE_PATH):
    # Path to video file
    vidObj = cv2.VideoCapture(MOVIE_PATH)

    count = 0
    success = 1

    while success:
        success, image = vidObj.read()
        count += 1
        h_channel, s_channel, v_channel = convert_to_hsv(image)        
        sum_v = np.sum(v_channel)
        sum_s = np.sum(s_channel)
        height, width = v_channel.shape[:2]
        total_pixels = width * height
        average_brightness.append(sum_v / total_pixels)
        average_saturation.append(sum_s / total_pixels)


FrameCapture(MOVIE_PATH)

print(average_brightness)
print(average_saturation)

### **SOUND STRENGTH**:
Follow: https://towardsdatascience.com/generate-any-sport-highlights-using-python-3695c98baead

In [None]:
!pip install moviepy

1. Extract the audio from the movie

In [None]:
import moviepy.editor as mp
clip = mp.VideoFileClip(MOVIE_PATH).subclip(1, 1380)
clip.audio.write_audiofile("/Users/silviaromanato/Desktop/SEMESTER_PROJECT/Audio/audioYouAgain.wav")

2. Create chunks from the audio

In [None]:
import librosa
import IPython.display as ipd

filename = "filepath\\audio.wav"
# loading the file with a sampling rate
x, sr = librosa.load(filename, sr=22050)
# To get duration of the audio clip in minutes
int(librosa.get_duration(x, sr) / 60)
max_slice = 10
window_length = max_slice * sr
# Playing the audio chunk
a = x[21 * window_length:22 * window_length]
ipd.Audio(a, rate=sr)

3. Compute the short time energy of the chunk <br> <br>

The energy or power of an audio signal refers to the loudness of the sound. It is computed by the sum of the square of the amplitude of an audio signal in the time domain. When energy is computed for a chunk of an entire audio signal, then it is known as Short Time Energy.

In [None]:
import numpy as np
s_energy = np.array([sum(abs(x[i:i + window_length] ** 2)) for i in range(0, len(x), window_length)])


import matplotlib.pyplot as plt
plt.hist(s_energy)
plt.show()