## **Exercise 1: Collaborative Filtering**
In this exercise, you will be building a recommendation engine that recommends a movie based on ratings similarity between users. 

**A)** Read the two datasets "ratings.csv" and "movies.csv".

**B)** Prepare the ranking matrix to be used in collaborative filtering.

**C)** Create a function that finds the similar movies to a given movie id and try it. 

**D)** **EXTRA** Modify the fuction to take the user id as well and ensure that the recomended movies are not already watched by the user. 


In [1]:
import pandas as pd 
import numpy as np
from sklearn.neighbors import NearestNeighbors

### **A)**

In [2]:
# Read ratings.csv and drop "timestamp" column
ratings = pd.read_csv("ratings.csv") \
    .drop("timestamp", axis=1).set_index("movieId")
  
# Read movies.csv and drop "genres" column
movies = pd.read_csv("movies.csv") \
    .drop("genres", axis=1).set_index("movieId")

### **B)**

In [16]:
# Join the two dataframes on movieId
df = ratings.join(movies, how="inner").reset_index()

# Using pd.pivot_table, transform the table into a matrix so that each row represents a movie and each column represents a user
# (index should be "movieId")
df2 = pd.pivot_table(df, index=["movieId", "title"], columns="userId", values="rating").fillna(0)
df2 = df2.reset_index().set_index("movieId")

### **C)**

In [62]:
# Create a function that finds the similar movies to a given movie id.
def similar_movies(df, k, movie_id):
    # df is the movie-user matrix 
    # k is the number of similar movies to find
    # movie_id is the movie id to find similar movies to
    
    # Build a NearestNesighbors model
    kNN = NearestNeighbors(n_neighbors=k+1, algorithm="brute", metric='cosine')
    
    # Fit the model to using df
    kNN.fit(df.drop("title", axis=1))
    
    # Find the closest neighbors using .kneighbors and passing the rankings associated with the movie_id
    # This step will return a list of movie ids
    recs = kNN.kneighbors([list(df.drop("title", axis=1).loc[movie_id])], return_distance=False)[0]
#     print(recs)
    # Find the titles of the movie ids
    recs_names = [df.iloc[i]["title"] for i in recs]
    recs_names = [m for m in recs_names if m != df.loc[movie_id]["title"]]
    
    # return a list with the recommended movie titles
    return recs_names

In [63]:
movie_name = 'Shawshank Redemption, The (1994)'

# Find the movieId associated with the movie_name
movie_id = df2[df2["title"] == movie_name].index[0]

# Use similar_movies() to find 10 movie recommendations
recommended_movies = similar_movies(df2, 10, movie_id)

print("Since you watched %s: \n" % movie_name)
for i, m in enumerate(recommended_movies):
    print("%s) %s" % (i+1, m))

['Shawshank Redemption, The (1994)', 'Forrest Gump (1994)', 'Pulp Fiction (1994)', 'Silence of the Lambs, The (1991)', 'Usual Suspects, The (1995)', "Schindler's List (1993)", 'Fight Club (1999)', 'Braveheart (1995)', 'Matrix, The (1999)', 'Apollo 13 (1995)', 'Seven (a.k.a. Se7en) (1995)']
Since you watched Shawshank Redemption, The (1994): 

1) Forrest Gump (1994)
2) Pulp Fiction (1994)
3) Silence of the Lambs, The (1991)
4) Usual Suspects, The (1995)
5) Schindler's List (1993)
6) Fight Club (1999)
7) Braveheart (1995)
8) Matrix, The (1999)
9) Apollo 13 (1995)
10) Seven (a.k.a. Se7en) (1995)


### **D) EXTRA** 

In [42]:
## Modify the fuction to take the user id as well and ensure that the recomended movies are not already watched by the user. 

def similar_movies_2(df, k, movie_id, user_id):
    not_watched = list(df[df[user_id] == 0]["title"])
    kNN = NearestNeighbors(n_neighbors=k*3, algorithm="brute", metric='cosine')
    kNN.fit(df.drop("title", axis=1))
    recs = kNN.kneighbors([list(df.drop("title", axis=1).loc[movie_id])], return_distance=False)[0]
    recs_names = [df.iloc[i]["title"] for i in recs]
    recs_names = [m for m in recs_names if (m != df.loc[movie_id]["title"]) and (m in not_watched)]
    return recs_names[:k]


In [43]:
movie_name = 'Shawshank Redemption, The (1994)'
user_id = 5
movie_id = movies[movies["title"] == movie_name].index[0]
recommended_movies = similar_movies_2(df2, 10, movie_id, user_id)

print("Since you watched %s: \n" % movie_name)
for i, m in enumerate(recommended_movies):
    print("%s) %s" % (i+1, m))

Since you watched Shawshank Redemption, The (1994): 

1) Forrest Gump (1994)
2) Silence of the Lambs, The (1991)
3) Fight Club (1999)
4) Matrix, The (1999)
5) Seven (a.k.a. Se7en) (1995)
6) Lord of the Rings: The Return of the King, The (2003)
7) Godfather, The (1972)
8) Good Will Hunting (1997)
9) Jurassic Park (1993)
10) Lord of the Rings: The Fellowship of the Ring, The (2001)
