Let's build a movie recommendation system. Before we start, let's go over at a high level what we are trying to accomplish
and what the steps are to accomplish it.

The problem: We are trying to recommend a user that has rated a couple of movies to watch other movies that other users
SIMILAR to them enjoyed. We're basically going to use user-user collaborative filtering to show some peeps what movies 
they might want to watch next.

The solution (at a high level):
1. Convert data to a matrix
2. Calculate similarity for users in regard to a selected user and choose a threshold. I chose the top 25 similar users.
3. Recommend movies according to users that are most similar to the selected user

In [86]:
import pandas as pd
import numpy as np

ratings = pd.read_csv('ratings.csv')
movies = pd.read_csv('movies.csv')
data = ratings.merge(movies, on='movieId')
display(data.head(5))

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,5,1,4.0,847434962,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,7,1,4.5,1106635946,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
3,15,1,2.5,1510577970,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
4,17,1,4.5,1305696483,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy


In [90]:
display(data[(data['userId']== 500) & (data['rating'] == 5.0)])
def convert_to_matrix(df):
    """ convert the dataframe to a numpy matrix """
    ratings = df.pivot(index = 'userId', columns ='movieId', values = 'rating').fillna(0)
    user_id_index = {int(user_id):i for i,user_id in enumerate(sorted(df.userId.unique())) }
    index_movie_id = {i:int(movie_id) for i,movie_id in enumerate(sorted(df.movieId.unique())) }
    return np.array(ratings),user_id_index, index_movie_id

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
9051,500,1282,5.0,1005528236,Fantasia (1940),Animation|Children|Fantasy|Musical
12628,500,2542,5.0,1005527824,"Lock, Stock & Two Smoking Barrels (1998)",Comedy|Crime|Thriller
13560,500,2700,5.0,1005528236,"South Park: Bigger, Longer and Uncut (1999)",Animation|Comedy|Musical
14016,500,2858,5.0,1005527724,American Beauty (1999),Drama|Romance
14839,500,2997,5.0,1005527755,Being John Malkovich (1999),Comedy|Drama|Fantasy
19271,500,176,5.0,1005527755,Living in Oblivion (1995),Comedy
34508,500,1784,5.0,1005527784,As Good as It Gets (1997),Comedy|Drama|Romance
34866,500,3114,5.0,1005527784,Toy Story 2 (1999),Adventure|Animation|Children|Comedy|Fantasy
35431,500,4306,5.0,1005528355,Shrek (2001),Adventure|Animation|Children|Comedy|Fantasy|Ro...
60205,500,1747,5.0,1005528065,Wag the Dog (1997),Comedy


In [73]:
def similarity(user_1, user_2):
    return np.dot(user_1, user_2) / (np.linalg.norm(user_1) * np.linalg.norm(user_2))

In [78]:
def select_baseline_25_users(ratings_matrix,user_id_index,current_user):
    """ select top 25 similar users from ratings data to group new users with """
    similarity_rating = []
    current_user_profile = ratings_matrix[user_id_index[current_user]]

    for u_id in user_id_index.keys():
        if current_user != u_id:
            u_id_profile = ratings_matrix[user_id_index[u_id]]
            similarity_rating.append((u_id,similarity(u_id_profile,current_user_profile)))
            
    similarity_rating.sort(key = lambda x: x[1],reverse = True)
    return [x[0] for x in similarity_rating[0:25]]

In [79]:
def recommend(ratings_matrix, similar_user_ids, user_id_index, index_movie_id):
    ### BEGIN SOLUTION
    recommendations = set()
    for similar_user_id in similar_user_ids:
        similar_user_profile = ratings_matrix[user_id_index[similar_user_id]]
        for indx, movie_rating in enumerate(similar_user_profile):
            if movie_rating!=0.0:
                recommendations.add(index_movie_id[indx])
                break 
    return recommendations



In [82]:
ratings_matrix, user_id_index, index_movie_id = convert_to_matrix(ratings)
similar_users = select_baseline_25_users(ratings_matrix, user_id_index, 500)
recommendations = list(recommend(ratings_matrix, similar_users, user_id_index, index_movie_id))
# recommendations_df = movies[movies['movieId'].isin(list(recommendations))]
# recommendations_df = movies.query('movieId in @recommendations')
recommendations_df = movies.set_index('movieId').loc[recommendations].reset_index(inplace=False)
print(recommendations_df)

   movieId                                      title  \
0       32  Twelve Monkeys (a.k.a. 12 Monkeys) (1995)   
1        1                           Toy Story (1995)   
2        2                             Jumanji (1995)   
3        3                    Grumpier Old Men (1995)   
4       34                                Babe (1995)   
5     2445                      At First Sight (1999)   
6       21                          Get Shorty (1995)   

                                        genres  
0                      Mystery|Sci-Fi|Thriller  
1  Adventure|Animation|Children|Comedy|Fantasy  
2                   Adventure|Children|Fantasy  
3                               Comedy|Romance  
4                               Children|Drama  
5                                        Drama  
6                        Comedy|Crime|Thriller  
