# Movie Recommendation Engine
##### Description: This project is a demonstration of the use of recommendation models, applied to film selection. We will mainly use the collaborative recommendation, the one where we look for similar profiles to recommend a movie.

##### Data: Dataset was extracted from Movie lens. A database of user-given movies and ratings of approximately 100,000 ratings

###### Autor: Lucas Fernandes

### 0 - installing packages that will be used

In [1]:
## In this case, no new packeges are needed

### 1 - importing the libraries

In [2]:
#The Libraries that we will work with
import pandas as pd
import numpy as np

import seaborn as sns
%matplotlib inline
from matplotlib import pyplot as plt

from datetime import date

import warnings
warnings.simplefilter("ignore")

### 2 - Loading data from movie lens

##### Loading Movies IDs

In [3]:
#Local of the file:

#smaller Data Set
#filelocation = r'C:\Users\55119\Documents\Lucas - Minhas Pastas\Alura\Algoritimo de Recomentação\ml-latest-small\ml-latest-small\movies.csv'

#file Location for a Bigger data
filelocation = r'C:\Users\55119\Documents\Lucas - Minhas Pastas\Alura\Algoritimo de Recomentação\ml-latest\ml-latest\movies.csv'

In [4]:
#loading and analysing the data set inicial parameters:
data = pd.read_csv(filelocation)
print('______________Data set head is:___________')
print(data.head(2))
print('______________The number of col, and lines in the data set is:',data.shape)
print('______________the numeber of null values in the data is:',data.isna().sum().sum())
print('______________Data Type in the data set__________')
print(data.dtypes)


#renaming the data
movies_ids = data

______________Data set head is:___________
   movieId             title                                       genres
0        1  Toy Story (1995)  Adventure|Animation|Children|Comedy|Fantasy
1        2    Jumanji (1995)                   Adventure|Children|Fantasy
______________The number of col, and lines in the data set is: (58098, 3)
______________the numeber of null values in the data is: 0
______________Data Type in the data set__________
movieId     int64
title      object
genres     object
dtype: object


In [5]:
movies_ids.set_index('movieId',inplace=True)

##### Loading User Ratings

In [6]:
#Local of the file:

#smaller Data Set
#filelocation = r'C:\Users\55119\Documents\Lucas - Minhas Pastas\Alura\Algoritimo de Recomentação\ml-latest-small\ml-latest-small\ratings.csv'

#biger data set
filelocation = r'C:\Users\55119\Documents\Lucas - Minhas Pastas\Alura\Algoritimo de Recomentação\ml-latest\ml-latest\ratings.csv'

In [7]:
#loading and analysing the data set inicial parameters:
data = pd.read_csv(filelocation)
print('______________Data set head is:___________')
print(data.head(2))
print('______________The number of col, and lines in the data set is:',data.shape)
print('______________the numeber of null values in the data is:',data.isna().sum().sum())
print('______________Data Type in the data set__________')
print(data.dtypes)


#renaming the data
ratings = data

______________Data set head is:___________
   userId  movieId  rating   timestamp
0       1      307     3.5  1256677221
1       1      481     3.5  1256677456
______________The number of col, and lines in the data set is: (27753444, 4)
______________the numeber of null values in the data is: 0
______________Data Type in the data set__________
userId         int64
movieId        int64
rating       float64
timestamp      int64
dtype: object


In [8]:
## Putting a limmit in the rattings dataset
Limit = 500000
ratings = ratings[:Limit]

### 3 - Using the k-nearest neighbors algorithm to define distance between users

#### 3.1 - extracting notes per user in numpy format

In [9]:
#defining a routine to seach the rate by each user
def user_rating ( user ):
    
    # Seaching using the query, all the rates the user had given in the dataset
    rate_by_user = ratings.query('userId==%d' % user)
    
    #dropping the collums we dont need
    rate_by_user.drop(columns = ['userId', 'timestamp'], inplace= True)
    
    #setting the new Index
    rate_by_user.set_index('movieId', inplace= True)
    
    return rate_by_user

#### 3.2 - Defining the distance between vectors

In [10]:
def distance_vectors (vector1, vector2):
    #using the linalg.norm to calculate de distance between the 2 vectors (using sqrt(x^2+y^2))
    distance = np.linalg.norm(vector1 - vector2)
    return distance

#### 3.3- Defining the distance between users

In [11]:
def distance_users(userid1,userid2,min_length = 4):
    
    #using the first routine to find the rates by users
    user_rates1=user_rating(userid1)
    user_rates2=user_rating(userid2)
    
    #joing the rates in the same DF, in this case we use Join because it already join using the right index
    rates_vectors = user_rates1.join(user_rates2,lsuffix="_left",rsuffix = '_right').dropna()
    
    if (len(rates_vectors) < min_length):
        return [userid1, userid2, None]
    
    #using the distance_vector to calculate the distance between users
    
    distance_between_users = distance_vectors(rates_vectors.rating_left, rates_vectors.rating_right)
    return [userid1, userid2, distance_between_users]
    
    

#### 3.4 - Definig distance between all users in the dataset


In [12]:
#Creating a fuction to calculate the distance between a User an all our dataset:

#defining the function
def distance_between_all_users(compared_user,numberofuserscompared = None):
    #creating the array NOTAS, it will be used to store the distances
    distances=[]
    
    #creating the for to compare all the users with the one that we are looking for
    for users in ratings.userId.unique()[:numberofuserscompared]:
        result = distance_users(compared_user,users)
        distances.append(result)
        
    distances = pd.DataFrame(distances, columns = ['UserId_original','UserId_compared','Distance'])
    return distances


In [13]:
#distances = distance_between_all_users(1,20).dropna()
#distances.head()


#### 3.5 Creating a Function to find the nearest users froam a certain User

In [14]:
def Nearests_from_user(user,numberofuserscompared = None, top_nearests = 15):
    #Using the function to calculate the distance from the user to all the other users, it returns the nearest users from it.:
    distances = distance_between_all_users(user,numberofuserscompared).dropna()
    
    #Sorting by the nearests:
    distances.sort_values('Distance',inplace = True)
    
    #dropping the user row, we dont need to compare the user with itself
    distances = distances.set_index('UserId_compared').drop(user)
    
    return distances.head(top_nearests)

In [15]:
#example:
#a = Nearests_from_user(610)
#print(a)

#### 3.6 suggesting movies based on closest users (k-nearest)

In [16]:
##defining a Fuction to search the nearests users from a certain one, and suggest the movies that the nearest users like the most.
#def suggested_movies(user_to_suggest,numberofuserscompared = None,top_nearests = 10):

#    user_test = user_to_suggest
#
#    #getting the user ratting:
#    user_ratting = user_rating(user_test)
#
#    Nearest = Nearests_from_user(user_test,numberofuserscompared,top_nearests)
#    #trazendo os valores mias proximos
#    print(f'The users_id most near from the user {user_test} is {Nearest.iloc[1].name}, Distance = {Nearest.iloc[1].Distance}')
#
#    #pegando as notas do usuário mais similar:
#    rating_similar = user_rating(Nearest.iloc[0].name)
#
#    #taking off the movies that the original user had already seen:
#    rating_similar = rating_similar.drop(user_ratting.index, errors = 'ignore')
#
#    #sorting by the hights rates to the smallest
#    rating_similar = rating_similar.sort_values('rating' ,ascending= False )
#
#    #Addind the movie names:
#    rating_similar = rating_similar.join(movies_ids)
#    rating_similar.dropna(inplace = True)
#    return(rating_similar)

In [17]:
#a = suggested_movies(100)

In [18]:
#a.head(15)

In [19]:
##defining a Fuction to search the nearests users from a certain one, and suggest the movies that the nearest users like the most.
def suggested_moviesv02(user_to_suggest,numberofuserscompared = None ,top_nearests = 20,number_of_suggestions = 100):
    user_test = user_to_suggest

    #getting the user_original ratting:
    user_ratting = user_rating(user_test)

    #calculating the nearests users:
    Nearest = Nearests_from_user(user_test,numberofuserscompared,top_nearests)

    #Printing the answer:
    print(f'The users_id most near from the user {user_test} is {Nearest.index}, Distance = {Nearest.Distance}')

    Near_usersid_from_user_tested = Nearest.index
    Near_usersid_from_user_tested

    #Getting the movies seen by near users from the data set (movies)
    ratings_by_nearest_users = ratings.set_index('userId').loc[Near_usersid_from_user_tested]

    #Grouping the moviest by their ID, and the rating score mean:
    mean_ratings_by_nearest_users = ratings_by_nearest_users.groupby('movieId').mean()[['rating']]

    #We would like to filter if only a certain number of people has seen the movies, so we will count the number of movie watched:
    count_ratings_by_nearest_users = ratings_by_nearest_users.groupby('movieId').count()[['rating']]
    
    #adding couting in the mean
    mean_ratings_by_nearest_users = mean_ratings_by_nearest_users.join(count_ratings_by_nearest_users, lsuffix = '_means', rsuffix = '_counts')
    
    #Taking off the movies the user already watched
    mean_ratings_by_nearest_users = mean_ratings_by_nearest_users.drop(user_ratting.index, errors = 'ignore')
    
    #filtering only movies with more than certain amount of votes
    minimum_filter = top_nearests*0.4
    
    mean_ratings_by_nearest_users = mean_ratings_by_nearest_users.query('rating_counts >= %.2f' % minimum_filter)
    
    #sorting by Rates
    mean_ratings_by_nearest_users.sort_values('rating_means', ascending = False,inplace = True)

    #best movie to be suggested:
    suggested_movies = mean_ratings_by_nearest_users.join(movies_ids).head(number_of_suggestions)

    #printing the result:
    return (suggested_movies)


In [20]:
#suggested_moviesv02(152)

### 4 - Testing new Users

In [21]:
#Defining new User:
def new_user(data):
    
    #first: getting the number of the last UserId, and adding 1
    new_user_Id = ratings['userId'].max()+1
    #printing the response:
    print(f"The new user ID is: {new_user_Id} ")
    
    #creating the df from the new user ratting:
    new_user_ratting = pd.DataFrame(data,columns = ['movieId','rating'])
    
    #creating the column UserId in this new Data Frame:
    new_user_ratting['userId'] =   new_user_Id
    
    return pd.concat([ratings, new_user_ratting])
    

In [22]:
#deffining a new user to test:
lucasoliveiras = ([
[55247,5], #into the wild - Rate: 5
[356,5], #forrest gump - rate: 5
[4886,5], #monstros SA - rate:5
[4896,5], #Harry potter e a pedra filosofal rate:5
[7153,5], #senhor do aneis o retorno do rei rate:5
[1,5], #Toy Story rate:5
])

In [23]:
#calling a function to add my new user
#ratings = new_user(lucasoliveiras)

In [24]:
#user_rating(611).join(movies_ids)

In [25]:
#suggested_moviesv02(611)

In [26]:
#print(user_rating(288).sort_values('rating',ascending = False).head(30))

In [27]:
Thamiris_filmes=([
[318,5], #um sonho de liberdade
[4993,5], #Um amor para recordar
[5066,5], #Senhor dos aneis
[134853,5], #Divertidamente
[4886,4],#Monstros SA
[53125,3], #piratas do caribe
[1,5], #Toy story 1
[364,5],#lion King
[1907,5],#mulan
[858,5], #the god father
[7153,5],# O retorno do rei
[68157,3],# Bastardos inglorios
[60069,5],#Wall e
[5952,5],# Duas torres
[356,4],# Forrest Gump
])

In [28]:
#colocando um novo usuário no banco de dados
ratings = new_user(Thamiris_filmes)
#printando as notas desse usuário:
user_rating(ratings.userId.max()).join(movies_ids)


The new user ID is: 5082 


Unnamed: 0_level_0,rating,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
318,5.0,"Shawshank Redemption, The (1994)",Crime|Drama
4993,5.0,"Lord of the Rings: The Fellowship of the Ring,...",Adventure|Fantasy
5066,5.0,"Walk to Remember, A (2002)",Drama|Romance
134853,5.0,Inside Out (2015),Adventure|Animation|Children|Comedy|Drama|Fantasy
4886,4.0,"Monsters, Inc. (2001)",Adventure|Animation|Children|Comedy|Fantasy
53125,3.0,Pirates of the Caribbean: At World's End (2007),Action|Adventure|Comedy|Fantasy
1,5.0,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
364,5.0,"Lion King, The (1994)",Adventure|Animation|Children|Drama|Musical|IMAX
1907,5.0,Mulan (1998),Adventure|Animation|Children|Comedy|Drama|Musi...
858,5.0,"Godfather, The (1972)",Crime|Drama


In [29]:
#Sugerindo para o ultimo usuário
print(f'Recomendation for the user {ratings.userId.max()}')
suggested_moviesv02(ratings.userId.max(),numberofuserscompared=None,top_nearests=50,number_of_suggestions=25)


Recomendation for the user 5082
The users_id most near from the user 5082 is Int64Index([ 114, 3387,  158, 3365, 3364, 4129, 3337, 1070, 4239, 3893,  275,
            1017,  299, 4414, 4475,  705, 4556, 3173,  377, 2728, 2451, 3869,
            4934, 5029, 2096, 1943, 3644, 3758, 3548, 3649, 2443, 3583, 1526,
            4893,  934,  340,  357,  516, 1727, 4308, 3112, 1589, 4000, 1287,
            4395, 1565,   35, 2788, 3987,   27],
           dtype='int64', name='UserId_compared'), Distance = UserId_compared
114     0.000000
3387    0.000000
158     0.000000
3365    0.000000
3364    0.000000
4129    0.000000
3337    0.000000
1070    0.000000
4239    0.000000
3893    0.000000
275     0.000000
1017    0.000000
299     0.000000
4414    0.000000
4475    0.000000
705     0.000000
4556    0.000000
3173    0.000000
377     0.000000
2728    0.000000
2451    0.000000
3869    0.000000
4934    0.000000
5029    0.000000
2096    0.000000
1943    0.000000
3644    0.000000
3758    0.000000
3548    

Unnamed: 0_level_0,rating_means,rating_counts,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
58559,4.586957,23,"Dark Knight, The (2008)",Action|Crime|Drama|IMAX
260,4.576923,26,Star Wars: Episode IV - A New Hope (1977),Action|Adventure|Sci-Fi
3578,4.575,20,Gladiator (2000),Action|Adventure|Drama
2571,4.513514,37,"Matrix, The (1999)",Action|Sci-Fi|Thriller
110,4.475,20,Braveheart (1995),Action|Drama|War
593,4.4,20,"Silence of the Lambs, The (1991)",Crime|Horror|Thriller
1196,4.391304,23,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Sci-Fi
2959,4.380952,21,Fight Club (1999),Action|Crime|Drama|Thriller
1198,4.34,25,Raiders of the Lost Ark (Indiana Jones and the...,Action|Adventure
296,4.26087,23,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller


In [30]:
Amato_filmes = ([
[356,5],
[1682,5],
[7147,5],
[64614,5],
[55247,2],
])


In [31]:
#colocando um novo usuário no banco de dados
ratings = new_user(Amato_filmes)
#printando as notas desse usuário:
user_rating(ratings.userId.max()).join(movies_ids)

The new user ID is: 5083 


Unnamed: 0_level_0,rating,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
356,5.0,Forrest Gump (1994),Comedy|Drama|Romance|War
1682,5.0,"Truman Show, The (1998)",Comedy|Drama|Sci-Fi
7147,5.0,Big Fish (2003),Drama|Fantasy|Romance
64614,5.0,Gran Torino (2008),Crime|Drama
55247,2.0,Into the Wild (2007),Action|Adventure|Drama


In [32]:
#Sugerindo para o ultimo usuário
print(f'Recomendation for the user {ratings.userId.max()}')
suggested_moviesv02(ratings.userId.max(),numberofuserscompared=None,top_nearests=50,number_of_suggestions=25)


Recomendation for the user 5083
The users_id most near from the user 5083 is Int64Index([ 667, 2779, 1413, 4178, 2861,  464,  670, 2502, 4294, 2878, 4441,
            4372, 1293, 3822, 2150, 4425, 3631, 3344, 1464, 3983, 1695, 3177,
            1933, 1867, 2069, 2084, 3134, 3557, 4044, 1654, 2467, 1803, 4666,
             471, 2847, 2374, 2407, 2768,   48,  719,  869, 1513, 4806, 2698,
            3846, 4692, 2121,  465,  930,   73],
           dtype='int64', name='UserId_compared'), Distance = UserId_compared
667     0.500000
2779    0.707107
1413    1.118034
4178    1.118034
2861    1.224745
464     1.224745
670     1.322876
2502    1.581139
4294    1.581139
2878    1.732051
4441    1.802776
4372    1.936492
1293    2.000000
3822    2.000000
2150    2.061553
4425    2.061553
3631    2.061553
3344    2.061553
1464    2.121320
3983    2.121320
1695    2.121320
3177    2.121320
1933    2.121320
1867    2.179449
2069    2.179449
2084    2.236068
3134    2.291288
3557    2.291288
4044    

Unnamed: 0_level_0,rating_means,rating_counts,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
318,4.684783,46,"Shawshank Redemption, The (1994)",Crime|Drama
296,4.565217,46,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller
2959,4.489796,49,Fight Club (1999),Action|Crime|Drama|Thriller
6016,4.45,30,City of God (Cidade de Deus) (2002),Action|Adventure|Crime|Drama|Thriller
1213,4.390625,32,Goodfellas (1990),Crime|Drama
1196,4.367647,34,Star Wars: Episode V - The Empire Strikes Back...,Action|Adventure|Sci-Fi
858,4.342857,35,"Godfather, The (1972)",Crime|Drama
56782,4.333333,24,There Will Be Blood (2007),Drama|Western
50,4.333333,36,"Usual Suspects, The (1995)",Crime|Mystery|Thriller
2329,4.305556,36,American History X (1998),Crime|Drama


In [33]:
#Sugerindo para o ultimo usuário
print(f'Recomendation for the user {ratings.userId.max()}')
suggested_moviesv02(ratings.userId.max(),numberofuserscompared=None,top_nearests=15,number_of_suggestions=10)


Recomendation for the user 5083
The users_id most near from the user 5083 is Int64Index([ 667, 2779, 1413, 4178, 2861,  464,  670, 2502, 4294, 2878, 4441,
            4372, 1293, 3822, 2150],
           dtype='int64', name='UserId_compared'), Distance = UserId_compared
667     0.500000
2779    0.707107
1413    1.118034
4178    1.118034
2861    1.224745
464     1.224745
670     1.322876
2502    1.581139
4294    1.581139
2878    1.732051
4441    1.802776
4372    1.936492
1293    2.000000
3822    2.000000
2150    2.061553
Name: Distance, dtype: float64


Unnamed: 0_level_0,rating_means,rating_counts,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
318,4.730769,13,"Shawshank Redemption, The (1994)",Crime|Drama
6016,4.6875,8,City of God (Cidade de Deus) (2002),Action|Adventure|Crime|Drama|Thriller
2959,4.678571,14,Fight Club (1999),Action|Crime|Drama|Thriller
4226,4.625,12,Memento (2000),Mystery|Thriller
293,4.583333,12,Léon: The Professional (a.k.a. The Professiona...,Action|Crime|Drama|Thriller
2542,4.571429,7,"Lock, Stock & Two Smoking Barrels (1998)",Comedy|Crime|Thriller
778,4.5625,8,Trainspotting (1996),Comedy|Crime|Drama
7361,4.541667,12,Eternal Sunshine of the Spotless Mind (2004),Drama|Romance|Sci-Fi
296,4.538462,13,Pulp Fiction (1994),Comedy|Crime|Drama|Thriller
48394,4.5,10,"Pan's Labyrinth (Laberinto del fauno, El) (2006)",Drama|Fantasy|Thriller


In [34]:
Decio = ([
[163645,5],
[170297,5],
[134130,5],
[1209,5],
[39183,1],
])

In [35]:
#colocando um novo usuário no banco de dados
ratings = new_user(Decio)
#printando as notas desse usuário:
user_rating(ratings.userId.max()).join(movies_ids)

The new user ID is: 5084 


Unnamed: 0_level_0,rating,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
163645,5.0,Hacksaw Ridge (2016),Drama|War
170297,5.0,Ultimate Avengers 2 (2006),Action|Animation|Sci-Fi
134130,5.0,The Martian (2015),Adventure|Drama|Sci-Fi
1209,5.0,Once Upon a Time in the West (C'era una volta ...,Action|Drama|Western
39183,1.0,Brokeback Mountain (2005),Drama|Romance


In [36]:
#Sugerindo para o ultimo usuário
print(f'Recomendation for the user {ratings.userId.max()}')
suggested_moviesv02(ratings.userId.max(),numberofuserscompared=None,top_nearests=50, number_of_suggestions=25)


Recomendation for the user 5084
The users_id most near from the user 5084 is Int64Index([1507, 2069, 3450], dtype='int64', name='UserId_compared'), Distance = UserId_compared
1507    2.345208
2069    3.162278
3450    3.674235
Name: Distance, dtype: float64


Unnamed: 0_level_0,rating_means,rating_counts,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1


In [37]:
Amato_2 = ([
[2687,5],
[56949,5],
[4896,5],
[1721,5],
[6242,1],
])


In [38]:
#colocando um novo usuário no banco de dados
ratings = new_user(Amato_2)
#printando as notas desse usuário:
user_rating(ratings.userId.max()).join(movies_ids)

The new user ID is: 5085 


Unnamed: 0_level_0,rating,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2687,5.0,Tarzan (1999),Adventure|Animation|Children|Drama
56949,5.0,27 Dresses (2008),Comedy|Romance
4896,5.0,Harry Potter and the Sorcerer's Stone (a.k.a. ...,Adventure|Children|Fantasy
1721,5.0,Titanic (1997),Drama|Romance
6242,1.0,Ringu (Ring) (1998),Horror|Mystery|Thriller


In [39]:
#Sugerindo para o ultimo usuário
print(f'Recomendation for the user {ratings.userId.max()}')
suggested_moviesv02(ratings.userId.max(),numberofuserscompared=None,top_nearests=50, number_of_suggestions=25)


Recomendation for the user 5085
The users_id most near from the user 5085 is Int64Index([3959, 1563, 2463, 2606, 4507, 4208, 2551, 1636], dtype='int64', name='UserId_compared'), Distance = UserId_compared
3959    2.000000
1563    2.645751
2463    2.958040
2606    3.354102
4507    3.674235
4208    3.872983
2551    4.663690
1636    5.338539
Name: Distance, dtype: float64


Unnamed: 0_level_0,rating_means,rating_counts,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1


In [40]:
Amato_3 = ([
[2687,5],
[56949,5],
[4896,5],
[1721,5],
[8368,5],
[40815,5],
[54001,5],
[69844,5],
[81834,5],
[88125,5],
[5816,5],
])

In [41]:
#colocando um novo usuário no banco de dados
ratings = new_user(Amato_3)
#printando as notas desse usuário:
user_rating(ratings.userId.max()).join(movies_ids)

The new user ID is: 5086 


Unnamed: 0_level_0,rating,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2687,5.0,Tarzan (1999),Adventure|Animation|Children|Drama
56949,5.0,27 Dresses (2008),Comedy|Romance
4896,5.0,Harry Potter and the Sorcerer's Stone (a.k.a. ...,Adventure|Children|Fantasy
1721,5.0,Titanic (1997),Drama|Romance
8368,5.0,Harry Potter and the Prisoner of Azkaban (2004),Adventure|Fantasy|IMAX
40815,5.0,Harry Potter and the Goblet of Fire (2005),Adventure|Fantasy|Thriller|IMAX
54001,5.0,Harry Potter and the Order of the Phoenix (2007),Adventure|Drama|Fantasy|IMAX
69844,5.0,Harry Potter and the Half-Blood Prince (2009),Adventure|Fantasy|Mystery|Romance|IMAX
81834,5.0,Harry Potter and the Deathly Hallows: Part 1 (...,Action|Adventure|Fantasy|IMAX
88125,5.0,Harry Potter and the Deathly Hallows: Part 2 (...,Action|Adventure|Drama|Fantasy|Mystery|IMAX


In [42]:
#Sugerindo para o ultimo usuário
print(f'Recomendation for the user {ratings.userId.max()}')
suggested_moviesv02(ratings.userId.max(),numberofuserscompared=None,top_nearests=50, number_of_suggestions=25)


Recomendation for the user 5086
The users_id most near from the user 5086 is Int64Index([2306, 2322, 2443,  368, 2460, 5085, 2847, 2918, 1195, 3500, 3584,
            3722, 1908, 3778,  158, 4018, 4440, 4559, 4588, 1426, 4615, 4802,
            4818, 5041, 1313, 3918, 1943, 1467, 1017, 1974, 4218, 4610, 1961,
            2010, 5053, 3291,   15, 1299, 3360, 2158, 2242, 2230, 2407,  235,
            2301, 4846, 4726, 1263, 2173,  372],
           dtype='int64', name='UserId_compared'), Distance = UserId_compared
2306    0.000000
2322    0.000000
2443    0.000000
368     0.000000
2460    0.000000
5085    0.000000
2847    0.000000
2918    0.000000
1195    0.000000
3500    0.000000
3584    0.000000
3722    0.000000
1908    0.000000
3778    0.000000
158     0.000000
4018    0.000000
4440    0.000000
4559    0.000000
4588    0.000000
1426    0.000000
4615    0.000000
4802    0.000000
4818    0.000000
5041    0.000000
1313    0.000000
3918    0.000000
1943    0.000000
1467    0.500000
1017    

Unnamed: 0_level_0,rating_means,rating_counts,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
109487,4.52,25,Interstellar (2014),Sci-Fi|IMAX
4993,4.5,40,"Lord of the Rings: The Fellowship of the Ring,...",Adventure|Fantasy
68157,4.5,20,Inglourious Basterds (2009),Action|Drama|War
5989,4.5,20,Catch Me If You Can (2002),Crime|Drama
6377,4.482759,29,Finding Nemo (2003),Adventure|Animation|Children|Comedy
318,4.46875,32,"Shawshank Redemption, The (1994)",Crime|Drama
7153,4.4625,40,"Lord of the Rings: The Return of the King, The...",Action|Adventure|Drama|Fantasy
593,4.461538,26,"Silence of the Lambs, The (1991)",Crime|Horror|Thriller
4995,4.452381,21,"Beautiful Mind, A (2001)",Drama|Romance
79132,4.435484,31,Inception (2010),Action|Crime|Drama|Mystery|Sci-Fi|Thriller|IMAX


In [43]:
#Sugerindo para o ultimo usuário
print(f'Recomendation for the user {ratings.userId.max()}')
suggested_moviesv02(ratings.userId.max(),numberofuserscompared=None,top_nearests=15, number_of_suggestions=10)


Recomendation for the user 5086
The users_id most near from the user 5086 is Int64Index([2306, 2322, 2443,  368, 2460, 5085, 2847, 2918, 1195, 3500, 3584,
            3722, 1908, 3778,  158],
           dtype='int64', name='UserId_compared'), Distance = UserId_compared
2306    0.0
2322    0.0
2443    0.0
368     0.0
2460    0.0
5085    0.0
2847    0.0
2918    0.0
1195    0.0
3500    0.0
3584    0.0
3722    0.0
1908    0.0
3778    0.0
158     0.0
Name: Distance, dtype: float64


Unnamed: 0_level_0,rating_means,rating_counts,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
3578,4.75,6,Gladiator (2000),Action|Adventure|Drama
152081,4.666667,6,Zootopia (2016),Action|Adventure|Animation|Children|Comedy
6539,4.625,8,Pirates of the Caribbean: The Curse of the Bla...,Action|Adventure|Comedy|Fantasy
5418,4.583333,6,"Bourne Identity, The (2002)",Action|Mystery|Thriller
8665,4.583333,6,"Bourne Supremacy, The (2004)",Action|Crime|Thriller
48780,4.5,7,"Prestige, The (2006)",Drama|Mystery|Sci-Fi|Thriller
593,4.5,7,"Silence of the Lambs, The (1991)",Crime|Horror|Thriller
68157,4.5,6,Inglourious Basterds (2009),Action|Drama|War
6377,4.5,9,Finding Nemo (2003),Adventure|Animation|Children|Comedy
2716,4.5,6,Ghostbusters (a.k.a. Ghost Busters) (1984),Action|Comedy|Sci-Fi
