# Purpose

The purpose of this notebook is to generate movie poster urls for each movie_id we observe in our interactions dataset. These movie poster urls will be utilized in the front-end visualization tool we build for understanding recommender performance. 

In [2]:
cd ../

/Users/scottcronin/gh/recommender_deployed


In [3]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

from dotenv import find_dotenv, load_dotenv
load_dotenv(find_dotenv())
import pandas as pd
import numpy as np
import scipy.sparse as scs
from lightfm import LightFM
from tqdm import tqdm, tqdm_notebook
import time
import json

import os
import tmdbsimple as tmdb
tmdb.API_KEY = os.environ['TMDB_API_KEY']

# Load data

In [3]:
interactions = pd.read_csv('data/ratings.dat',
                           sep='::', engine='python',
                           header=None,
                           names=['uid', 'iid', 'rating', 'timestamp'],
                           usecols=['uid', 'iid', 'rating'],
                          )
display(interactions.sample(5))
print('Shape: {:>9,} x {}'.format(*interactions.shape))

Unnamed: 0,uid,iid,rating
6081280,43413,3033,3.0
8097261,58074,7162,4.0
3912973,28041,1275,5.0
8742337,62572,25,4.0
5738520,40992,1483,3.0


Shape: 10,000,054 x 3


[links](https://www.kaggle.com/grouplens/movielens-20m-dataset/version/2) is a downloaded csv which connects `movieId` of the movielens dataset to `tmdbId` of [The Movie Database](https://www.themoviedb.org/?language=en). The Movie Database contains the poster urls for each movieId.

In [7]:
links = pd.read_csv('data/links.csv')
display(links.sample(5))
print('Shape: {:>9,} x {}'.format(*links.shape))

Unnamed: 0,movieId,imdbId,tmdbId
41557,165727,3760966,333665.0
6890,7001,77745,11850.0
24273,113715,3605002,259761.0
7945,8628,114371,54022.0
32496,140269,338109,80219.0


Shape:    45,843 x 3


# Generate posters for each movieId in dataset

First we join movieIds in our dataset with tmbIds in links

In [38]:
movieIds = pd.DataFrame(interactions.iid.unique(), columns=['movieId'])
m = movieIds.merge(links[['movieId', 'tmdbId']], how='left').dropna().astype('int64')
m.head(4)

Unnamed: 0,movieId,tmdbId
0,122,11066
1,185,1642
2,231,8467
3,292,6950


Next we loop through each tmdbId to get the poster_url. To simplify this process, I used the [tmdbsimple](https://github.com/celiao/tmdbsimple) library to abstract the requests process.

In [45]:
posters = []

for i, movie in tqdm_notebook(m.iterrows(), total=10634):
    # by sleeping for half second, we do not hit tmdb's api too aggressively.
    time.sleep(0.5)
    try:
        _id = movie['tmdbId']
        poster_path = tmdb.Movies(_id).info()['poster_path']
    except:
        poster_path = 'error'    
    posters.append(poster_path)




Clean up the data and view a couple results

In [59]:
m['poster_path'] = posters
m['url_base'] = 'https://image.tmdb.org/t/p/w200'
m['poster_url'] = m['url_base'] + m['poster_path']
for url in m.sample(5).poster_url.tolist():
    print(url)

https://image.tmdb.org/t/p/w600_and_h900_bestv2//4viJcRFgF4cCPnJWgpvb3CRd2pK.jpg
https://image.tmdb.org/t/p/w600_and_h900_bestv2//hXa5ArW1Llu4SOqPWRQ7dzCDyOH.jpg
https://image.tmdb.org/t/p/w600_and_h900_bestv2//nacr1Xj8tJroyVJKzPtdtgApphj.jpg
https://image.tmdb.org/t/p/w600_and_h900_bestv2//yyCXkzmi8jChDD6qSmhD2QVs1E.jpg
https://image.tmdb.org/t/p/w600_and_h900_bestv2//mB0KF5T2s6raTjiV676Umd8ciE0.jpg


Convert to a dictionary, and store as a json object. This json file will be utilized on the front end

In [25]:
d = m['poster_path'].to_dict()
with open('app/objects/posters.json', 'w') as f:
    json.dump(d, f, indent=4)

In [26]:
d

{122: '/cc9YAZq5NXiIEJsHjW7p2FaHQkp.jpg',
 185: '/gKDNaAFzT21cSVeKQop7d1uhoSp.jpg',
 231: '/3PEAkZHa8ehfUkuKbzmQNRTTAAs.jpg',
 292: '/4KymNvlWR0XF0sqX2BWRd9Z3yXR.jpg',
 316: '/39WsfbB5BshvdbPAYRFXdsjC481.jpg',
 329: '/wjrXjlNpDq9U8vYmAwf420yDFtn.jpg',
 355: '/dnYXJZgstixBsOjF4JJrPCDRd2n.jpg',
 356: '/yE5d3BUhE8hCnkMUJOo1QDoOGNz.jpg',
 362: '/6BhybPISxk116taUo9J76tVo1hs.jpg',
 364: '/bKPtXn9n4M4s8vvZrbw40mYsefB.jpg',
 370: '/k3F8N3jeqXOpm1qjY7mL8O6vdx.jpg',
 377: '/u5ZqizbcZ0RZhVqmu8lSU4SARBT.jpg',
 420: '/tw9gAhqQcBFX0X0XfVbWqUsmzoU.jpg',
 466: '/3vILyxwmL4hcqiik638l8lL2d4h.jpg',
 480: '/c414cDeQ9b6qLPLeKmiJuLDUREJ.jpg',
 520: '/3pd3sdot0HfQTFzqgTUzaF4kcxP.jpg',
 539: '/afkYP15OeUOD0tFEmj6VvejuOcz.jpg',
 586: '/5Lo3sWuvbO4AnrAHYBgB5U1Opqd.jpg',
 588: '/7f53XAE4nPiGe9XprpGAeWHuKPw.jpg',
 589: '/2y4dmgWYRMYXdD1UyJVcn2HSd1D.jpg',
 594: '/bOtgcOIFBCUFdY2a737Na6gWQ0X.jpg',
 616: '/3LudCahifOrueMklYBxAXY2wpBg.jpg',
 110: '/2qAgGeYdLjelOEqjW9FYvPHpplC.jpg',
 151: '/8O52wIIbskNl9kZnGvP9Gi4A2oN