## Poster Processing

The purpose of this notebook is to create "darkness" and "colorfulness" values for the poster for each movie in the data. These can then be used as features in our algorithms.

We defined these values as follows



**Darkness**: Average of the RGB values within the entire image.  
**Colorfulness**: Find the variance for each R, G and B values individually for the entire image and then take the average. 

In [1]:
import numpy as np
from skimage import io
import pandas as pd

In [2]:
# read in the data
df = pd.read_csv("movies.gz")

In [3]:
# verify the url path from the data
df["poster_path"].head()

0    /rhIRbceoE9lR4veEXuwCC2wARtG.jpg
1    /vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg
2    /6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg
3    /16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg
4    /e64sOI48hQXyru7naBFyssKFxVd.jpg
Name: poster_path, dtype: object

In [7]:
# loop through to pull the poster image from the web, make colorfulness and darkness calculations, then add to list
darkness = []
colorfulness = []
i=0

for poster in df["poster_path"]:
    print("Image #: " + str(i))
    try:
        url = ("http://image.tmdb.org/t/p/w185_and_h278_bestv2" + poster)
        print("Grabbing image from: " + url)
        image = io.imread(url)
        
        red = [sets[0] for sets in image]
        green = [sets[1] for sets in image]
        blue = [sets[2] for sets in image]
    
        rgb = []
        rgb.append(np.var(red))
        rgb.append(np.var(green))
        rgb.append(np.var(blue))
    
        print("RGB vars: " + str(rgb))
    
        darkness.append(np.average(image))
        colorfulness.append(np.average(rgb))
        
        i += 1
    except TypeError:
        print("Invalid URL")
        darkness.append(np.nan)
        colorfulness.append(np.nan)
        
        i+=1
    
    
    
    

Image #: 0
Grabbing image from: http://image.tmdb.org/t/p/w185_and_h278_bestv2/rhIRbceoE9lR4veEXuwCC2wARtG.jpg
RGB vars: [7402.63545135115, 7507.700195239493, 7568.560916734238]
Image #: 1
Grabbing image from: http://image.tmdb.org/t/p/w185_and_h278_bestv2/vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg
RGB vars: [7549.168463790119, 7409.033080010811, 7750.237047771855]
Image #: 2
Grabbing image from: http://image.tmdb.org/t/p/w185_and_h278_bestv2/6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg
RGB vars: [9418.370730753526, 9311.291313711621, 8410.378524518515]
Image #: 3
Grabbing image from: http://image.tmdb.org/t/p/w185_and_h278_bestv2/16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg
RGB vars: [4453.672988228122, 4856.2255749357355, 4862.721747781631]
Image #: 4
Grabbing image from: http://image.tmdb.org/t/p/w185_and_h278_bestv2/e64sOI48hQXyru7naBFyssKFxVd.jpg
RGB vars: [8281.421172414584, 8813.005136897676, 8898.997988659432]
Image #: 5
Grabbing image from: http://image.tmdb.org/t/p/w185_and_h278_bestv2/zMyfPUelumio3tiDKPffaUpsQT

KeyboardInterrupt: 

In [6]:
# convert lists to series and add to dataframe
se_dark = pd.Series(darkness)
se_color = pd.Series(colorfulness)
df['darkness'] = se_dark.values
df['colorfulness'] = se_color.values

In [None]:
# delete poster path from dataframe and create new csv file
del df['poster_path']
df.to_csv("movie_metadata_images.csv")