## Information Retrieval lab5

- Martyna Stasiak id.156071
- Maria Musiał id.156062
----

The purpose of the exercise is to implement a recommendation system for a movie search engine.

When we think about selecting a video that our user will like, let's first consider what data we have available? First of all, we have information in the database about how our user rated the movies he once watched. It's worth noting here that this is absolutely not all of the movies in our database given, and most often it's a heavily limited subset of a huge set of movies. So we can find out which movies our user liked and which ones he didn't. 

Is this all the data available? Well, no! We also have information about the preferences of other users! So we can find in the data a sample of users who have similar movie taste to our user. Note that virtually every such other user has watched some movies that our user has never watched before! The idea behind collaborative filtering is very simple: if another user with similar tastes rated a movie highly, our user will probably rate it highly too! Let's recommend movies that users with similar tastes have rated highly!


Let's formalize some ideas:
 - how to count the similarity between users' tastes? 
 
 Just calculate the correlation between their movie ratings. Users with a strongly positive correlation have similar tastes, and those with a strongly negative correlation have opposite tastes;) 
 
 - Having found similar users, how to count the predicted rating of the video by our user?
 
 We count the weighted average of ratings of users with similar tastes where the weight is the measure of similarity (correlation). The closer a user's tastes are to us, the more weight his rating has for us. (slide 27, http://www.mmds.org/mmds/v2.1/ch09-recsys1.pdf)


In [1]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr

df = pd.read_csv('./ratings.csv')
df

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


-----

### <b>Task 1
Modify the dataframe to have moveID as index, userID as column and rating as values

In [2]:
dfTask = df.pivot(index='movieId', columns='userId', values='rating')
dfTask.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,,,,4.0,,4.5,,,,...,4.0,,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,,,,,,4.0,,4.0,,,...,,4.0,,5.0,3.5,,,2.0,,
3,4.0,,,,,5.0,,,,,...,,,,,,,,2.0,,
4,,,,,,3.0,,,,,...,,,,,,,,,,
5,,,,,,5.0,,,,,...,,,,3.0,,,,,,


#### Now let's also see some stats about our movie database

In [3]:
numMovies = dfTask.shape[0]
numUsers = dfTask.shape[1]

numNonNan = dfTask.notna().sum().sum()
numNan = dfTask.isna().sum().sum()

#most and least watched movies:
movieWatchCount = dfTask.count(axis=1)
mostWatchedMovie = movieWatchCount.idxmax()
mostWatchedMovieWatchCount = movieWatchCount.max()
leastWatchedMovie = movieWatchCount.idxmin()
leastWatchedMovieWatchCount = movieWatchCount.min()

#most and least active users:
userWatchCount = dfTask.count(axis=0)
mostActiveUser = userWatchCount.idxmax()
mostActiveUserWatchCount = userWatchCount.max()
leastActiveUser = userWatchCount.idxmin()
leastActiveUserWatchCount = userWatchCount.min()


print(f"Dataset summary:")
print(f"Number of movies in the dataset: {numMovies}")
print(f"Number of users in the dataset: {numUsers}")
print(f"Number of non-NaN values in the dataset: {numNonNan}")
print(f"Number of NaN values in the dataset: {numNan}\n")

print(f"Most watched movie: {mostWatchedMovie} ({mostWatchedMovieWatchCount} watches)")
print(f"Least watched movie: {leastWatchedMovie} ({leastWatchedMovieWatchCount} watches)\n")

print(f"Most active user: {mostActiveUser} ({mostActiveUserWatchCount} movies rated)")
print(f"Least active user: {leastActiveUser} ({leastActiveUserWatchCount} movies rated)\n")


Dataset summary:
Number of movies in the dataset: 9724
Number of users in the dataset: 610
Number of non-NaN values in the dataset: 100836
Number of NaN values in the dataset: 5830804

Most watched movie: 356 (329 watches)
Least watched movie: 49 (1 watches)

Most active user: 414 (2698 movies rated)
Least active user: 53 (20 movies rated)



Small remark: <br>
Those stats for the most/least active user and watched movie might be different since there are different movies that might have the same 'watch count' (same with the users) and we print only one of them :)

--------

### <b>Task 2
Let's try to recommend movies for user 610. Calculate the correlation between this user and the remaining ones.

In [4]:
user = 610
user

610

In [6]:
userRatings = dfTask[user]
userRatings

movieId
1         5.0
2         NaN
3         NaN
4         NaN
5         NaN
         ... 
193581    NaN
193583    NaN
193585    NaN
193587    NaN
193609    NaN
Name: 610, Length: 9724, dtype: float64

In [93]:
correlations = {}
for other_user in dfTask.columns:
    if other_user != user:
        commonRatings = dfTask[[user, other_user]].dropna()
        # print(commonRatings)
        if len(commonRatings) >= 2: # at least 2 common ratings for the pearson correlation
            correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
            
sorted_correlations = sorted(correlations.items(), key=lambda x: x[1], reverse=True)
print(f"Top correlated users with the user {user} are:")
for user, corr in sorted_correlations[:10]:
    print(f"User {user} with correlation {corr:.2f}")

  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top correlated users with the user 610 are:
User 158 with correlation 0.91
User 92 with correlation 0.90
User 120 with correlation 0.88
User 138 with correlation 0.82
User 13 with correlation 0.80
User 206 with correlation 0.75
User 26 with correlation 0.70
User 118 with correlation 0.69
User 146 with correlation 0.69
User 191 with correlation 0.64


In [25]:
def getBestCorrelations(user, common_movies=2):
    correlations = {}
    for other_user in dfTask.columns:
        if other_user != user:
            commonRatings = dfTask[[user, other_user]].dropna()
            if len(commonRatings) >= common_movies: # at least 2 common ratings for the pearson correlation
                correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
    sorted_correlations = sorted(correlations.items(), key=lambda x: x[1], reverse=True)
    return sorted_correlations

In [26]:
sortedCorr = getBestCorrelations(610)
print(f"Top correlated users with the user {user} are:")
for user, corr in sorted_correlations[:5]:
    print(f"User {user} with correlation {corr:.2f}")

  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top correlated users with the user 191 are:
User 158 with correlation 0.91
User 92 with correlation 0.90
User 120 with correlation 0.88
User 138 with correlation 0.82
User 13 with correlation 0.80


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


### Task
There are a few users with the perfect match. Isn't it suspicious? Check it

### Task
Find 5 users with at least 5 common movies with user=610 and the highest correlation with that user

In [33]:
def getBest5Users(user, num = 5, common_movies=5):
    sorted_correlations = getBestCorrelations(user, common_movies=common_movies)

    print(f"Top {num} correlated users with the user {user} who have watched at least {common_movies} same movies are:")
    for different_user, corr in sorted_correlations[:num]:
        print(f"User {different_user} with correlation {corr:.2f}")

In [None]:
getBest5Users(610)

Top 5 correlated users with the user 610 who have watched at least 5 same movies are:
User 92 with correlation 0.90
User 120 with correlation 0.88
User 463 with correlation 0.82
User 138 with correlation 0.82
User 494 with correlation 0.81


### Task
Predict scores for each movie based on the most correlated users. Use weighted average with correlation coefficient as weights.
$$\hat{y_j} = \frac{\sum_{i \in U} w_iy_{ij}}{\sum_{i \in U} w_i}$$

$U$ is a set of those users that also watched $j$th moveie, $w$ denotes the correlation between our user and $i$th user, $y_{ij}$ is a score given by $i$th user to $j$th movie
Use only movies watched by at least two users from the considered set

In [35]:
def predictScores(user, movie):
    bestUsers = getBest5Users(user)
    userRatings = dfTask[user]
    if movie in userRatings:
        return userRatings[movie]
    else:
        ratings = []
        for other_user, corr in bestUsers:
            otherUserRatings = dfTask[other_user]
            if movie in otherUserRatings:
                ratings.append((otherUserRatings[movie], corr))
        if len(ratings) == 0:
            return None
        return sum(r * w for r, w in ratings) / sum(w for r, w in ratings)

In [None]:
for movie in dfTask.index:
    print(f"Predicted rating for movie {movie} is {predictScores(user, movie)}")


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 1 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 2 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 3 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 4 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 5 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 6 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 7 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 8 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 9 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 10 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 11 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 12 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 13 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 14 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 15 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 16 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 17 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 18 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 19 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 20 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 21 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 22 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 23 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 24 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 25 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 26 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 27 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


Top 5 correlated users with the user 13 who have watched at least 5 same movies are:
User 144 with correlation 0.91
User 1 with correlation 0.88
User 209 with correlation 0.88
User 222 with correlation 0.83
User 256 with correlation 0.80
Predicted rating for movie 28 is nan


  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]
  correlations[other_user] = pearsonr(commonRatings[user], commonRatings[other_user])[0]


KeyboardInterrupt: 

### Task
How to check the quality of our recommendations? 

We have to remove a few scores from the dataset and then compare predictions with the real ones.

Try to improve the system, you can use the following ideas:
 - Can we use more users (e.g. with negative correlation)?
 - Which difference is more important predicting 5 when a real score is 4 or predicting 3 instead of 2?
 - Did we use the best value for the minimal number of common movies?
 - Is prediction for a movie seen by just one user trustworthy?
 
 
Describe your approach, its strengths and weaknesses, and analyze the results. Send the report (notebook with comments/markdown) within 144 hours after the class to gmiebs@cs.put.poznan.pl, start the subject with [IR]

Credits to F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872 and Mateusz Lango