## Imports

In [1]:
import numpy as np
import pandas as pd

## Load and format data

Load data from csv saved from feature engineering process.

In [2]:
def load_imdb(path):
    data = pd.read_csv(path)
    movies = data['movie_title']
    data.drop('movie_title', axis=1, inplace=True)
    data.set_index(movies, inplace=True)
    return movies, data

In [3]:
movies, imdb = load_imdb('../data/imdb_dataset.csv')
imdb2 = pd.read_csv('../data/imdb_dataset.csv')
movies2 = imdb2.movie_title.copy()

## KDTree and BallTreee

In [4]:
from sklearn.neighbors import BallTree, KDTree

In [5]:
btree = BallTree(imdb, leaf_size=5)
kdtree = KDTree(imdb, leaf_size=2)

The array `liked` contains the list of movies the user has liked, based on these movies, the system recommends `n_movies` movies for each element in `liked`
<hr>
For example, if the user has liked movies:
    1. Pirates of the Caribbean: At World's End
    2. Spectre
    3. Spider-Man 3
the `liked` array will be `[1, 2, 6]`

In [6]:
liked = movies[[1, 2, 6]].index
l_imdb = imdb.iloc[liked, :]
print(l_imdb.index)

Index(['Pirates of the Caribbean: At World's End', 'Spectre', 'Spider-Man 3'], dtype='object', name='movie_title')


In [7]:
n_movies = 5 # number of movies to recommend for each element in array liked

In [8]:
dist, index = btree.query(l_imdb, n_movies) # ball tree query
for ind in index:
    print("\nBecause you liked {}:".format(movies[ind[0]]))
    for n, i in enumerate(ind[1:]):
        print("\t{}. {}".format(n+1, movies[i]))


Because you liked Pirates of the Caribbean: At World's End:
	1. A Christmas Carol
	2. Gangs of New York
	3. Michael Collins
	4. Austin Powers: The Spy Who Shagged Me

Because you liked Spectre:
	1. The Hobbit: The Desolation of Smaug
	2. Suicide Squad
	3. Maleficent
	4. The Theory of Everything

Because you liked Spider-Man 3:
	1. Spider-Man 3
	2. Yoga Hosers
	3. Get Smart
	4. Flawless


In [9]:
dist, index = kdtree.query(l_imdb, n_movies) # kd tree query
for ind in index: 
    print("\nBecause you liked {}:".format(movies[ind[0]]))
    for n, i in enumerate(ind[1:]):
        print("\t{}. {}".format(n+1, movies[i]))


Because you liked Pirates of the Caribbean: At World's End:
	1. A Christmas Carol
	2. Gangs of New York
	3. Michael Collins
	4. Austin Powers: The Spy Who Shagged Me

Because you liked Spectre:
	1. The Hobbit: The Desolation of Smaug
	2. Suicide Squad
	3. Maleficent
	4. The Theory of Everything

Because you liked Spider-Man 3:
	1. Spider-Man 3
	2. Yoga Hosers
	3. Get Smart
	4. Flawless


The recommendations are not fitting and need refinement.