## Recommender Systems Assignment
Rodžers Ušackis, ACS301

### About The Project

#### Type Of Recommendation System

The recommendation system type I chose for this assignment is **collaborative filtering** recommender system.

#### Dataset Used

The dataset I chose for this assignment is an actual dataset that Netflix themselves provided for their open competition,
in which they gave a task to improve their existing recommendation system.

[Wikipedia Info About The Competition](https://en.wikipedia.org/wiki/Netflix_Prize)

Note:
I initially wanted to use the original [Dataset](https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data) which was provided by netflix, but due to memory issues/errors I opted for a [reduced dataset](https://www.kaggle.com/datasets/rishitjavia/netflix-movie-rating-dataset).

It fits the requirements necessary for a **collaborative filtering** recommender system by having **user-item ratings**.

In this case, it has various training datasets, which contain:
- MovieIDs range from 1 to 17770 sequentially.
- CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users.
- Ratings are on a five star (integral) scale from 1 to 5.
- Dates have the format YYYY-MM-DD.

As well as a file which can be used for decrypting movie ID's to actual titles and release dates.

It contains:
- MovieID do not correspond to actual Netflix movie ids or IMDB movie ids.
- YearOfRelease can range from 1890 to 2005 and may correspond to the release of corresponding DVD, not necessarily its theaterical release.
- Title is the Netflix movie title and may not correspond to titles used on other sites. Titles are in English.


##### Note:

While doing the assignment, I ran into memory issues, so I decided to scale down the operation.

Maybe I could look into the big data aspect of the real assignment later on.

#### Libraries Used

In [1]:
import gc
import sys
import pandas as pd
import numpy as np

from sklearn.neighbors import NearestNeighbors
from sklearn.metrics.pairwise import cosine_similarity

from collections import deque

### Building A Recommender System

#### Importing the datasets

##### Movie Titles Dataset

First, we import the movie titles dataset.

Since someone didn't think about using ';' as a seperator for the csv file, I ran into an error when trying to read movie titles which contained a comma.

e.g.
*72,1974,At Home Among Strangers, A Stranger Among His Own*

Since the movie title uses the same character, which is used for the seperator, an error is thrown, since it expects three columns, but found four.

To counter that, I wrote a function that joins the two columns of the bad lines of the csv file into one.

In [2]:
def manual_separation(bad_line):

    # This if statement is written only for displaying purposes, I didn't want to print every line.
    if bad_line[0] == '72':
        print(bad_line)

    right_split = bad_line[:-2] + [",".join(bad_line[-2:])]

    return right_split

In [3]:
movie_titles_list_raw = pd.read_csv('datasets/original-netflix-dataset/movie_titles.csv', encoding='ISO-8859-1', names=['Movie ID', 'Year', 'Name'],
    engine='python', on_bad_lines=manual_separation).set_index('Movie ID')

['72', '1974', 'At Home Among Strangers', ' A Stranger Among His Own']


  movie_titles_list_raw = pd.read_csv('datasets/original-netflix-dataset/movie_titles.csv', encoding='ISO-8859-1', names=['Movie ID', 'Year', 'Name'],


I didn't like the fact that the 'Year' column was being stored as a float64 type, were this a larger dataset, it would clog up a lot of memory.

So I decided to convert it to type int, but first I had to drop rows with NaN values.

In [4]:
movie_titles_list_raw.dropna(inplace=True)
movie_titles_list_raw['Year'] = movie_titles_list_raw['Year'].astype('int')

In [5]:
print(f'Movie Title Dataset\n'
      f'-------------------'
      f'\nShape: {movie_titles_list_raw.shape}\n'
      f'-------------------'
      f'\nDtypes: \n{movie_titles_list_raw.dtypes}\n'
      f'-------------------'
      f'\nSize: {sys.getsizeof(movie_titles_list_raw)}\n'
      f'-------------------------------------------------------------------'
      f'\n\n{movie_titles_list_raw.head(72)}')

Movie Title Dataset
-------------------
Shape: (17763, 2)
-------------------
Dtypes: 
Year     int64
Name    object
dtype: object
-------------------
Size: 1671134
-------------------------------------------------------------------

          Year                                               Name
Movie ID                                                         
1         2003                                    Dinosaur Planet
2         2004                         Isle of Man TT 2004 Review
3         1997                                          Character
4         1994                       Paula Abdul's Get Up & Dance
5         2004                           The Rise and Fall of ECW
...        ...                                                ...
68        2004                                        Invader Zim
69        2003                               WWE: Armageddon 2003
70        1999                              Tai Chi: The 24 Forms
71        1995                    Maya L

As we can see, row 72 has been taken care of, so have the other rows which run into the same issue.

##### Movie User Ratings Dataset

In [6]:
combined_data_1_list_raw = pd.read_csv('datasets/original-netflix-dataset/combined_data_1.txt', header=None, names=['User', 'Rating', 'Date'], usecols=[0, 1, 2])

# combined_data_2_list_raw = pd.read_csv('datasets/original-netflix-dataset/combined_data_2.txt', header=None, names=['User', 'Rating', 'Date'], usecols=[0, 1, 2])

# combined_data_3_list_raw = pd.read_csv('datasets/original-netflix-dataset/combined_data_3.txt', header=None, names=['User', 'Rating', 'Date'], usecols=[0, 1, 2])

# combined_data_4_list_raw = pd.read_csv('datasets/original-netflix-dataset/combined_data_4.txt', header=None, names=['User', 'Rating', 'Date'], usecols=[0, 1, 2])

combined_data_1_list_raw.head(5)

Unnamed: 0,User,Rating,Date
0,1:,,
1,1488844,3.0,2005-09-06
2,822109,5.0,2005-05-13
3,885013,4.0,2005-10-19
4,30878,4.0,2005-12-26


The first row is the issue we're faced with.

Instead of the dataset being:
- **'movie, user, rating, date'**

It is:
 - **'movie: user, rating, date'**

In [7]:
# Find empty rows to slice dataframe for each movie
tmp_movies = combined_data_1_list_raw[combined_data_1_list_raw['Rating'].isna()]['User'].reset_index()
movie_indices = [[index, int(movie[:-1])] for index, movie in tmp_movies.values]

# Shift the movie_indices by one to get start and endpoints of all movies
shifted_movie_indices = deque(movie_indices)
shifted_movie_indices.rotate(-1)


# Gather all dataframes
user_data = []

# Iterate over all movies
for [df_id_1, movie_id], [df_id_2, next_movie_id] in zip(movie_indices, shifted_movie_indices):

    # Check if it is the last movie in the file
    if df_id_1<df_id_2:
        tmp_df = combined_data_1_list_raw.loc[df_id_1+1:df_id_2-1].copy()
    else:
        tmp_df = combined_data_1_list_raw.loc[df_id_1+1:].copy()

    # Create movie_id column
    tmp_df['Movie'] = movie_id

    # Append dataframe to list
    user_data.append(tmp_df)

# Combine all dataframes
combined_data_complete_list_final = pd.concat(user_data)

# Remove variables from memory
del user_data, tmp_movies, tmp_df, shifted_movie_indices, movie_indices, df_id_1, movie_id, df_id_2, next_movie_id, combined_data_1_list_raw
gc.collect()

0

Since there were some data type issues with this dataset as well, I decided to make some changes here also.

Namely:
- Rating
    - float64 -> int64
- User
    - object -> int64

In [8]:
before = sys.getsizeof(combined_data_complete_list_final)
print(f'Size of the dataset before: {before}')

combined_data_complete_list_final['Rating'] = combined_data_complete_list_final['Rating'].astype('int')
combined_data_complete_list_final['User'] = combined_data_complete_list_final['User'].astype('int')

after = sys.getsizeof(combined_data_complete_list_final)
print(f'Size of the dataset after: {after}\n\n'
      f'Memory usage reduced by: {abs(before - after)} ({int(abs(((before - after) * 100) / after))}% smaller)')

del before, after

Size of the dataset before: 3718210513
Size of the dataset after: 2381322652

Memory usage reduced by: 1336887861 (56% smaller)


That's a lot of memory saved if I do say so myself.

This will allow us to work with more data.

In [9]:
print(f'Movie User-Ratings Dataset\n'
      f'--------------------------'
      f'\nShape: {combined_data_complete_list_final.shape}\n'
      f'--------------------------'
      f'\nDtypes: \n{combined_data_complete_list_final.dtypes}\n'
      f'--------------------------'
      f'\nSize: {sys.getsizeof(combined_data_complete_list_final)}\n'
      f'-------------------------------------------------------------------'
      f'\n\n{combined_data_complete_list_final.head(5)}')

Movie User-Ratings Dataset
--------------------------
Shape: (24053764, 4)
--------------------------
Dtypes: 
User       int64
Rating     int64
Date      object
Movie      int64
dtype: object
--------------------------
Size: 2381322652
-------------------------------------------------------------------

      User  Rating        Date  Movie
1  1488844       3  2005-09-06      1
2   822109       5  2005-05-13      1
3   885013       4  2005-10-19      1
4    30878       4  2005-12-26      1
5   823519       3  2004-05-03      1


In [10]:
print(combined_data_complete_list_final.isna().sum())

User      0
Rating    0
Date      0
Movie     0
dtype: int64


No null values, that's nice.

In [11]:
print(f'Number of unique users: {combined_data_complete_list_final.User.nunique()}\n\n'
      f'Number of unique movies: {combined_data_complete_list_final.Movie.nunique()}')

Number of unique users: 470758

Number of unique movies: 4499


### Dataset Manipulation

In order to deal with memory issues, I decided to limit the rows that should qualify for the recommender system.

This in theory should also improve the quality of the system.

And my approach is as follows:
- Remove movies which have too little reviews (not popular)
- Remove users who haven't submitted enough reviews (not as active)

In [12]:
row_count_limit = 300000

qualified_ratings = combined_data_complete_list_final[:row_count_limit]

movie_rating_counts = qualified_ratings.groupby('Movie')['Rating'].count()
minimum_movie_rating_count = movie_rating_counts.quantile(q=0.3)

# I will just default this to 2
user_rating_counts = qualified_ratings.groupby('User')['Rating'].count()
# minimum_user_rating_count = user_rating_counts.quantile(q=0.75)

print(f'Minimum movie rating count threshold: {minimum_movie_rating_count}\n'
      f'Minimum user rating count threshold: {5}')

Minimum movie rating count threshold: 215.20000000000002
Minimum user rating count threshold: 5


In [13]:
qualified_ratings = qualified_ratings[qualified_ratings['Movie'].isin(movie_rating_counts[movie_rating_counts > minimum_movie_rating_count].index)]

qualified_ratings = qualified_ratings[qualified_ratings['User'].isin(user_rating_counts[user_rating_counts > 5].index)]

In [14]:
before = sys.getsizeof(combined_data_complete_list_final)

after = sys.getsizeof(qualified_ratings)

print(f'The dataset has been reduced from {row_count_limit} rows to {qualified_ratings.shape[0]} rows.\n')

del before, after, row_count_limit, combined_data_complete_list_final
gc.collect()

The dataset has been reduced from 300000 rows to 22250 rows.



0

### Collaborative Filtering

Since filtering requires quite a lot of memory resources, we will limit the rating count to 10 thousand.

In [15]:
rating_pivot_table = qualified_ratings.pivot_table(index='User', columns='Movie', values='Rating')
mean_user_ratings = rating_pivot_table.mean(axis=1)
mean_movie_ratings = rating_pivot_table.T.mean(axis=1)

print(f'Pivot table shape: {rating_pivot_table.shape}\n\n'
      f'{rating_pivot_table}')

Pivot table shape: (2989, 54)

Movie     1    3    5   6    8   10  12  15   16   17  ...  65  67   68  70  \
User                                                   ...                    
1333     NaN  4.0  NaN NaN  3.0 NaN NaN NaN  NaN  NaN  ... NaN NaN  NaN NaN   
2213     NaN  NaN  NaN NaN  NaN NaN NaN NaN  3.0  NaN  ... NaN NaN  NaN NaN   
3321     3.0  NaN  4.0 NaN  1.0 NaN NaN NaN  3.0  2.0  ... NaN NaN  NaN NaN   
3998     NaN  NaN  NaN NaN  NaN NaN NaN NaN  NaN  NaN  ... NaN NaN  NaN NaN   
5652     NaN  NaN  NaN NaN  3.0 NaN NaN NaN  NaN  NaN  ... NaN NaN  4.0 NaN   
...      ...  ...  ...  ..  ...  ..  ..  ..  ...  ...  ...  ..  ..  ...  ..   
2646515  NaN  NaN  NaN NaN  NaN NaN NaN NaN  NaN  NaN  ... NaN NaN  NaN NaN   
2646634  NaN  NaN  NaN NaN  NaN NaN NaN NaN  NaN  NaN  ... NaN NaN  NaN NaN   
2647197  NaN  NaN  NaN NaN  1.0 NaN NaN NaN  NaN  NaN  ... NaN NaN  NaN NaN   
2648287  NaN  NaN  NaN NaN  NaN NaN NaN NaN  NaN  NaN  ... NaN NaN  NaN NaN   
2648885  NaN  NaN  Na

In [16]:
# Note: As we are subtracting the mean from each rating to standardize
# all users with only one rating or who had rated everything the same will be dropped

# Drop all columns containing only zeros representing users who did not rate
rating_pivot_table.fillna(0, inplace=True)

# Normalize the values
# rating_pivot_table = rating_pivot_table.apply(lambda x: (x-np.mean(x))/(np.max(x)-np.min(x)), axis=1)

print(f'Pivot table shape: {rating_pivot_table.shape}\n\n'
      f'{rating_pivot_table}')

Pivot table shape: (2989, 54)

Movie     1    3    5    6    8    10   12   15   16   17  ...   65   67   68  \
User                                                       ...                  
1333     0.0  4.0  0.0  0.0  3.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
2213     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  3.0  0.0  ...  0.0  0.0  0.0   
3321     3.0  0.0  4.0  0.0  1.0  0.0  0.0  0.0  3.0  2.0  ...  0.0  0.0  0.0   
3998     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
5652     0.0  0.0  0.0  0.0  3.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  4.0   
...      ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...   
2646515  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
2646634  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
2647197  0.0  0.0  0.0  0.0  1.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0   
2648287  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  ...  0.0  0.0  0.0 

In [17]:
# This is how to get the real id from the rating_pivot_table and vice-versa.
# Useful in the future for user_similarity decoding.
print(rating_pivot_table.index[0])
print(rating_pivot_table.index.get_loc(1333))

1333
0


#### User Based Collaborative Filtering

User-based collaborative filtering is a method of making recommendations based on the preferences of similar users.
It works by identifying users who have similar tastes and interests, and then recommending items that those similar users have liked.

![user-based-collaborative-filtering](images/user_based.png)

In [18]:
user_similarity = cosine_similarity(rating_pivot_table)

In [19]:
print(f'User similarity shape: {user_similarity.shape}\n\n'
      f'{user_similarity}')

User similarity shape: (2989, 2989)

[[1.         0.19131158 0.31151662 ... 0.55865556 0.43278921 0.2318231 ]
 [0.19131158 1.         0.63869479 ... 0.3301611  0.49404842 0.44918569]
 [0.31151662 0.63869479 1.         ... 0.54627928 0.55042637 0.43374812]
 ...
 [0.55865556 0.3301611  0.54627928 ... 1.         0.59430562 0.37426366]
 [0.43278921 0.49404842 0.55042637 ... 0.59430562 1.         0.39385955]
 [0.2318231  0.44918569 0.43374812 ... 0.37426366 0.39385955 1.        ]]


In [20]:
print(qualified_ratings)

           User  Rating        Date  Movie
1       1488844       3  2005-09-06      1
4         30878       4  2005-12-26      1
8       1248029       3  2004-04-22      1
21      1080361       3  2005-03-28      1
23       558634       4  2004-12-14      1
...         ...     ...         ...    ...
300030  1661344       4  2003-08-19     77
300035  1045879       2  2003-05-03     77
300051   223160       2  2004-05-18     77
300058  1814516       4  2004-03-23     77
300073   972399       2  2005-04-04     77

[22250 rows x 4 columns]


In [21]:
def user_based_cf_recommender(user_id):
    # Since user_id != index in between the dataframes,
    # we need a way to decode these values.

    # Within this function I did it with the variables:
    # user_id_to_index and user_index_to_user_id
    user_id_to_index = rating_pivot_table.index.get_loc(user_id)

    scores = list(enumerate(user_similarity[user_id_to_index]))
    sorted_scores = sorted(scores, key= lambda x:x[1], reverse=True)[1:10]

    user_index_to_user_id = rating_pivot_table.index[sorted_scores[0][0]]
    similar_user_id = user_index_to_user_id
    print(f'The most similar user to user {user_id} is user {similar_user_id} with a score of {sorted_scores[0][1]}.\n')
    print(f'Other similar users include:')
    for user, score in sorted_scores[1:]:
        print(f'ID: {rating_pivot_table.index[user]}\tScore: {score}')

    # Retrieve the movie ID's of the movies that the user has already seen
    user_seen_movies = list(set(qualified_ratings[qualified_ratings.User == user_id]['Movie'].values))
    print(f'\nUser {user_id} has seen these movies: {user_seen_movies}.')

    # Retrieve the movie ID's that the person most similar to the user has seen before
    # the movie IDs that most similar person has seen before
    similar_person_seen_movies = list(set(qualified_ratings[qualified_ratings.User == similar_user_id]['Movie'].values))
    print(f'User {similar_user_id} has seen these movies: {similar_person_seen_movies}.')

    # Return 10 movies which the person most similar to the user has seen, but the user has not yet seen.
    # (Difference between the lists)
    recommended_movie_ids = list(set(similar_person_seen_movies) - set(user_seen_movies))[:9]
    print(f'\nThese are the movies which we can recommend (unseen movies): {recommended_movie_ids}.')

    return movie_titles_list_raw.loc[recommended_movie_ids]

In [22]:
user_based_cf_recommender(1488844)

The most similar user to user 1488844 is user 427967 with a score of 0.8388704928078611.

Other similar users include:
ID: 1137159	Score: 0.788009048074417
ID: 2276333	Score: 0.7865834032652848
ID: 544833	Score: 0.7856742013183862
ID: 1661600	Score: 0.7803898228929836
ID: 2402139	Score: 0.7602522864777522
ID: 1341214	Score: 0.7516283835227737
ID: 2457781	Score: 0.7465910488440874
ID: 1084999	Score: 0.7423923386456233

User 1488844 has seen these movies: [1, 8, 44, 76, 17, 58, 30].
User 427967 has seen these movies: [8, 44, 76, 58, 28, 30].

These are the movies which we can recommend (unseen movies): [28].


Unnamed: 0_level_0,Year,Name
Movie ID,Unnamed: 1_level_1,Unnamed: 2_level_1
28,2002,Lilo and Stitch


Note: This only works sometimes, depending on the user.

I guess now that I'm using a dataset that is much larger than before, there are far too many customers with the same ratings on the same movies.

In case something goes wrong, I'm providing a screenshot of a working solution below.

![proof](images/proof.png)

#### Item Based Collaborative Filtering

Item-based collaborative filtering is the recommendation system to use the similarity between items using the ratings by users.
The fundamental assumption for this type of collaborative filtering is that the user should, in theory, give similar ratings to similar movies.

##### Version 1

In [23]:
item_similarity = cosine_similarity(rating_pivot_table.T)

In [24]:
print(item_similarity)

[[1.         0.06436628 0.1459996  ... 0.07357741 0.12991025 0.14930844]
 [0.06436628 1.         0.05353381 ... 0.10470684 0.08889538 0.07643782]
 [0.1459996  0.05353381 1.         ... 0.08027002 0.11995224 0.11848327]
 ...
 [0.07357741 0.10470684 0.08027002 ... 1.         0.03461641 0.07907354]
 [0.12991025 0.08889538 0.11995224 ... 0.03461641 1.         0.27704767]
 [0.14930844 0.07643782 0.11848327 ... 0.07907354 0.27704767 1.        ]]


In [25]:
def item_based_cf_recommender(movie_name):
    # This time I decided to work with a movie name, so we have to decode that first
    movie_idx = movie_titles_list_raw[movie_titles_list_raw['Name'].str.contains(movie_name)].index[0] - 1

    specific_movie_correlation = item_similarity[movie_idx]

    result = pd.DataFrame({'Score': specific_movie_correlation}).sort_values('Score', ascending=False)

    return result

Some movies we can choose from for the testing of the function.

In [26]:
for i in range(1, 28):
    print(movie_titles_list_raw.loc[i]['Name'])

Dinosaur Planet
Isle of Man TT 2004 Review
Character
Paula Abdul's Get Up & Dance
The Rise and Fall of ECW
Sick
8 Man
What the #$*! Do We Know!?
Class of Nuke 'Em High 2
Fighter
Full Frame: Documentary Shorts
My Favorite Brunette
Lord of the Rings: The Return of the King: Extended Edition: Bonus Material
Nature: Antarctica
Neil Diamond: Greatest Hits Live
Screamers
7 Seconds
Immortal Beloved
By Dawn's Early Light
Seeta Aur Geeta
Strange Relations
Chump Change
Clifford: Clifford Saves the Day! / Clifford's Fluffiest Friend Cleo
My Bloody Valentine
Inspector Morse 31: Death Is Now My Neighbour
Never Die Alone
Sesame Street: Elmo's World: The Street We Live On


In [27]:
user = 1488844
movie = "Character"
item_based_df = item_based_cf_recommender(movie)

In [28]:
item_based_df

Unnamed: 0,Score
2,1.0
23,0.196157
34,0.161213
46,0.149077
0,0.146
18,0.145398
20,0.139675
26,0.138521
41,0.133013
52,0.119952


In [29]:
print(f'The most similar movie to "{movie}" is "{movie_titles_list_raw.iloc[item_based_df.index[1]]["Name"]}".')

The most similar movie to "Character" is "My Bloody Valentine".


##### Version 2

In [30]:
def item_based_cf_recommender_2_all_recommendations(user_id, movie_name):
    knn = NearestNeighbors(metric='cosine', n_neighbors=rating_pivot_table.shape[1])
    knn.fit(rating_pivot_table.T)
    distances, indices = knn.kneighbors(rating_pivot_table.T)

    movie_idx = movie_titles_list_raw[movie_titles_list_raw['Name'].str.contains(movie_name)].index[0] - 1

    # Real movie ID's
    indices = pd.DataFrame(indices + 1)
    similar_movies = indices.loc[movie_idx, :]

    # Inverted distances
    inverted_distances = pd.DataFrame(1 - distances)
    inverted_movie_distances = inverted_distances.loc[movie_idx, :]

    index = 1

    # test_df = pd.Series(
    #    mean_movie_ratings.loc[movie_idx + 1] + rating_pivot_table.T.iloc[similar_movies].subtract(mean_movie_ratings.loc[similar_movies], axis='index').mul(
    #       inverted_movie_distances, axis='index').sum(axis='index') / sum(inverted_movie_distances), name='recommendation')

    print_out = f'Movies most similar to "{movie_name}"'

    print(f'{print_out}')
    print('-' * len(print_out))

    for movie in similar_movies:
        print(f'#{index} - {movie_titles_list_raw.loc[movie]["Name"]}\n'
              f'(Movie ID: {movie}\tDistance: {inverted_movie_distances.loc[index - 1]})\n')
              # f'Predicted Rating: {1})\n')
        index += 1

In [31]:
# mean_user_ratings.loc[user]

In [32]:
# print(mean_movie_ratings)

In [33]:
item_based_cf_recommender_2_all_recommendations(user, movie)

Movies most similar to "Character"
----------------------------------
#1 - Character
(Movie ID: 3	Distance: 1.0)

#2 - My Bloody Valentine
(Movie ID: 24	Distance: 0.1961565351679424)

#3 - Ferngully 2: The Magical Rescue
(Movie ID: 35	Distance: 0.1612133403523368)

#4 - The Bad and the Beautiful
(Movie ID: 47	Distance: 0.14907676954549254)

#5 - Dinosaur Planet
(Movie ID: 1	Distance: 0.14599959559572562)

#6 - By Dawn's Early Light
(Movie ID: 19	Distance: 0.1453983399897939)

#7 - Strange Relations
(Movie ID: 21	Distance: 0.13967498100234965)

#8 - Sesame Street: Elmo's World: The Street We Live On
(Movie ID: 27	Distance: 0.13852070838627473)

#9 - Searching for Paradise
(Movie ID: 42	Distance: 0.13301306121428857)

#10 - The Bonesetter
(Movie ID: 53	Distance: 0.11995224185897024)

#11 - We're Not Married
(Movie ID: 54	Distance: 0.11848326583061597)

#12 - Neil Diamond: Greatest Hits Live
(Movie ID: 15	Distance: 0.11827990895640306)

#13 - Immortal Beloved
(Movie ID: 18	Distance: 0.115

If we compare these results to output [28], then we can see that they are the same.

The reason I made a second version for the same filtering type, is that I wanted to get predicted ratings for a specific user, but I ran into many issues throughout the project and spent way too much time trying to fix them.