In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import pandas as pd
import numpy as np


# Intro


**Notes**

The main bulk of the material comes from https://developers.google.com/machine-learning/recommendation/overview/candidate-generation. If you want to go further later, you can take a look at http://nicolas-hug.com/blog/matrix_facto_3. It is absolutely not expected to look at these two links for the interviews  or to complete the test.

**Context**: 

We want to build a movies' recommender in order to get new movies to watch during the lock down. We will base our work on a variation of the MovieLens dataset. 
The data consists of movies seen by the users, some informations about the movies, and some informations about the users. The problem consists in predicting which movies a given user might like.

We are presenting you here first a naive approach in order to familarize yourself with the problem and show you how it might be solved.

**Task**:

The code presented is a first implementation but has a number of shortcomings in its structure and features (more on that in the conclusion). Your task consist in producing a refactoring, so as to be one step closer to a "clean" code.

**Evaluation**:

Our goal here is two fold:
- See how you understand a problem and adapt to an already given approach to tackle it.
- See how you can design new features.
- See how you manipulate python code: understanding, ideas to refactor etc ...

The projects will be evaluated on the quality of the source code produced.

# The data

First, let's load some data.

In [3]:
from content_based_filtering.helpers.dataloader import load_users, load_movies, load_ratings
from config import *

In [4]:
users = load_users(USERS)
users.head()

Unnamed: 0_level_0,gender,age,occupation,zip_code,generalized_zip_code
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,F,1,10,48067,480
1,M,56,16,70072,700
2,M,25,15,55117,551
3,M,45,7,2460,24
4,M,25,20,55455,554


In [5]:
movies = load_movies(MOVIES)
movies.head()

Unnamed: 0_level_0,title,year,Animation,Children's,Comedy,Adventure,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Sci-Fi,Documentary,War,Musical,Mystery,Film-Noir,Western
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
0,Toy Story,1995,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Jumanji,1995,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Grumpier Old Men,1995,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,Waiting to Exhale,1995,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
4,Father of the Bride Part II,1995,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [6]:
movies.sort_index()

Unnamed: 0_level_0,title,year,Animation,Children's,Comedy,Adventure,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Sci-Fi,Documentary,War,Musical,Mystery,Film-Noir,Western
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
0,Toy Story,1995,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Jumanji,1995,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Grumpier Old Men,1995,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,Waiting to Exhale,1995,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
4,Father of the Bride Part II,1995,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3878,Meet the Parents,2000,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3879,Requiem for a Dream,2000,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3880,Tigerland,2000,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3881,Two Family House,2000,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


In [7]:
movies.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3883 entries, 0 to 3882
Data columns (total 20 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   title        3883 non-null   object
 1   year         3883 non-null   int64 
 2   Animation    3883 non-null   int8  
 3   Children's   3883 non-null   int8  
 4   Comedy       3883 non-null   int8  
 5   Adventure    3883 non-null   int8  
 6   Fantasy      3883 non-null   int8  
 7   Romance      3883 non-null   int8  
 8   Drama        3883 non-null   int8  
 9   Action       3883 non-null   int8  
 10  Crime        3883 non-null   int8  
 11  Thriller     3883 non-null   int8  
 12  Horror       3883 non-null   int8  
 13  Sci-Fi       3883 non-null   int8  
 14  Documentary  3883 non-null   int8  
 15  War          3883 non-null   int8  
 16  Musical      3883 non-null   int8  
 17  Mystery      3883 non-null   int8  
 18  Film-Noir    3883 non-null   int8  
 19  Western      3883 non-null 

In [8]:
ratings = load_ratings(RATINGS)
ratings.head()

Unnamed: 0,user_id,movie_id,rating
0,0,1176,5
1,0,655,3
2,0,902,3
3,0,3339,4
4,0,2286,5


# Content-based Filtering

Content-based filtering uses item features to recommend other items similar to what the user likes, based on their previous actions or explicit feedback. We dont use other users information !

For example, if user `A` liked `Harry Potter 1`, he/she will like `Harry Potter 2`

In [9]:
%%html
<img src='https://miro.medium.com/max/1642/1*BME1JjIlBEAI9BV5pOO5Mg.png' height="300" width="250"/>

What are similar movies ? In order to answer to this question we need to build a similiarity measure. 

## Features

This measure will operate on the characteristics (**features**) of the movies to determine which are close. In our case, we have access to the genres of the movies. For example, the genres of `Toy Story` are: `Animation`, `Children's` and `Comedy`. This is represented as follow in our dataset:

In [10]:
genre_cols = movies.columns.drop(["title", "year"])
genre_cols

Index(['Animation', 'Children's', 'Comedy', 'Adventure', 'Fantasy', 'Romance',
       'Drama', 'Action', 'Crime', 'Thriller', 'Horror', 'Sci-Fi',
       'Documentary', 'War', 'Musical', 'Mystery', 'Film-Noir', 'Western'],
      dtype='object')

In [11]:
genre_and_title_cols = movies.columns.drop("year")

In [12]:
movies[genre_and_title_cols].head()

Unnamed: 0_level_0,title,Animation,Children's,Comedy,Adventure,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Sci-Fi,Documentary,War,Musical,Mystery,Film-Noir,Western
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
0,Toy Story,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Jumanji,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Grumpier Old Men,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
3,Waiting to Exhale,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
4,Father of the Bride Part II,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Similarity

Now that we have some features, we will try to find a function that performs a similiarity measure. The Similarity function will take two items (two list of features) and return a number proportional to their similarity. 

For the following we will consider that the Similarity between two movies is the number of genres they have in common.

Here is an example with `Toy Story` and `E.T`

In [13]:
toy_story_genres = movies[genre_and_title_cols].loc[movies.title == 'Toy Story'][genre_cols].iloc[0]
toy_story_genres

Animation      1
Children's     1
Comedy         1
Adventure      0
Fantasy        0
Romance        0
Drama          0
Action         0
Crime          0
Thriller       0
Horror         0
Sci-Fi         0
Documentary    0
War            0
Musical        0
Mystery        0
Film-Noir      0
Western        0
Name: 0, dtype: int8

In [14]:
et_genres = movies[genre_and_title_cols].loc[movies.title == 'E.T. the Extra-Terrestrial'][genre_cols].iloc[0]
et_genres

Animation      0
Children's     1
Comedy         0
Adventure      0
Fantasy        1
Romance        0
Drama          1
Action         0
Crime          0
Thriller       0
Horror         0
Sci-Fi         1
Documentary    0
War            0
Musical        0
Mystery        0
Film-Noir      0
Western        0
Name: 1081, dtype: int8

In [15]:
et_genres.values * toy_story_genres

Animation      0
Children's     1
Comedy         0
Adventure      0
Fantasy        0
Romance        0
Drama          0
Action         0
Crime          0
Thriller       0
Horror         0
Sci-Fi         0
Documentary    0
War            0
Musical        0
Mystery        0
Film-Noir      0
Western        0
Name: 0, dtype: int8

In [16]:
(et_genres.values * toy_story_genres).sum() # scalar product

1

So our similarity measure returns `1.0` for these two movies. 

Let's see another example where we compare `Toy Stories` and `Pocahontas`

In [17]:
pocahontas_genres = movies[genre_and_title_cols].loc[movies.title == 'Pocahontas'][genre_cols].iloc[0]
(pocahontas_genres.values * toy_story_genres).sum()

2

This tels us that `Pocahontas` is closer to `Toy Stories` than `E.T.` which makes sense.


## Scaling up

Ok, that's a nice measure. Now we are going to scale it up to all movies of our dataset. To do so smartly, let's take a look at the operation we just did, but from a mathematical point of view. To do so, we will think of the list of features of a movie as a vector `V`. Then, our similarity measure between `Toy Story` and `E.T.` becomes:
$ V_{ToyStory} \cdot V_{ET}^{T}$

More generally the similarity measure between a movie `i` and another movie `j` is : $ V_{i} \cdot V_{j}^{T}$

Now we can think of `movies` as a matrix containing all features vectors describing the movies. Here is how our similiarity measure looks in this context:

![](imgs/dot_product_matrices.png)

To obtain the similiarity between all movies of our dataset we have to perform the dot product of the `movies` matrix with the transposed of the `movies` matrix.

In [18]:
from content_based_filtering.helpers.similarity import pairsimilarity

In [19]:
similarity = pairsimilarity(movies[genre_cols].values)
similarity.shape

(3883, 3883)

In [20]:
similarity

array([[3, 1, 1, ..., 0, 0, 0],
       [1, 3, 0, ..., 0, 0, 0],
       [1, 0, 2, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 1, 1, 1],
       [0, 0, 0, ..., 1, 1, 2]], dtype=int8)

We can now get the similarity between `Toy Story` and any other movie of our dataset

In [21]:
similarity_with_toy_story = similarity[0] # 0 is Toy Story
similarity_with_toy_story

array([3, 1, 1, ..., 0, 0, 0], dtype=int8)

In [22]:
for i in range(10):
    print(f"Similarity between Toy story and {movies.iloc[i]['title']} (index {i}) is {similarity_with_toy_story[i]}")

Similarity between Toy story and Toy Story (index 0) is 3
Similarity between Toy story and Jumanji (index 1) is 1
Similarity between Toy story and Grumpier Old Men (index 2) is 1
Similarity between Toy story and Waiting to Exhale (index 3) is 1
Similarity between Toy story and Father of the Bride Part II (index 4) is 1
Similarity between Toy story and Heat (index 5) is 0
Similarity between Toy story and Sabrina (index 6) is 1
Similarity between Toy story and Tom and Huck (index 7) is 1
Similarity between Toy story and Sudden Death (index 8) is 0
Similarity between Toy story and GoldenEye (index 9) is 0


## A bit of polishing

### Helpers:

We also built some helpers to handle the movies dataset:

In [23]:
from content_based_filtering.helpers.movies import get_movie_id, get_movie_name, get_movie_year
    
print (get_movie_id(movies, 'Toy Story'))
print (get_movie_id(movies, 'Die Hard'))

print (get_movie_name(movies, 0))
print (get_movie_name(movies, 1000))
print (get_movie_year(movies, 1000))

[0]
[1023]
Toy Story
Parent Trap, The
1961


### Finding similar movies:
Here is a method giving us the movie the most similar to another movie:

In [24]:
from content_based_filtering.helpers.similarity import get_most_similar_by_id, get_most_similar_movies

In [25]:
get_movie_id(movies, 'Toy Story')

array([0])

In [26]:
index = np.concatenate([
    get_movie_id(movies, 'Toy Story'),
    get_movie_id(movies, 'Die Hard')
])

get_most_similar_by_id(similarity, index, top=10)

array([[ 667, 3685, 3682, 2009, 2011, 2012, 2033, 2072, 2073, 3542],
       [1533, 3628,   96, 1244, 3198, 1483, 1848, 2406, 2733,  289]])

In [27]:
get_movie_id(movies, 'Die Hard')

array([1023])

In [28]:
index = np.concatenate([
    get_movie_id(movies, 'Psycho'),
])

get_most_similar_by_id(similarity, index, top=10)

array([[3593, 2923, 1312, 3407, 1957, 1927, 1926, 1925,  732,   69],
       [  69, 1599, 2757, 3701, 2044, 2049, 3695,  855, 2052, 2053]])

In [29]:
get_most_similar_movies(movies, similarity, ["Psycho", "Toy Story", "Die Hard"], top=10)

Unnamed: 0,title,year,1,2,3,4,5,6,7,8,9,10
1201,Psycho,1960,"(3593, Puppet Master III: Toulon's Revenge)","(2923, Rawhead Rex)","(1312, Believers, The)","(3407, Jacob's Ladder)","(1957, Disturbing Behavior)","(1927, Poltergeist III)","(1926, Poltergeist II: The Other Side)","(1925, Poltergeist)","(732, Thinner)","(69, From Dusk Till Dawn)"
2320,Psycho,1998,"(69, From Dusk Till Dawn)","(1599, Devil's Advocate, The)","(2757, 13th Warrior, The)","(3701, Dreamscape)","(2044, Graveyard Shift)","(2049, Dead Zone, The)","(3695, F/X 2)","(855, Bound)","(2052, Cujo)","(2053, Children of the Corn)"
0,Toy Story,1995,"(667, Space Jam)","(3685, Adventures of Rocky and Bullwinkle, The)","(3682, Chicken Run)","(2009, Jungle Book, The)","(2011, Lady and the Tramp)","(2012, Little Mermaid, The)","(2033, Steamboat Willie)","(2072, American Tail, An)","(2073, American Tail: Fievel Goes West, An)","(3542, Saludos Amigos)"
1023,Die Hard,1988,"(1533, Face/Off)","(3628, Predator 2)","(96, Shopping)","(1244, Diva)","(3198, Mariachi, El)","(1483, Breakdown)","(1848, Armageddon)","(2406, 52 Pick-Up)","(2733, Tequila Sunrise)","(289, Outbreak)"


### Giving a recommendation:

And finally, let's find some movies to recommend based on previously liked movies:

In [30]:
from content_based_filtering.helpers.movies import get_user_best_ratings
from content_based_filtering.helpers.recommendations import get_recommendations

In [31]:
ratings

Unnamed: 0,user_id,movie_id,rating
0,0,1176,5
1,0,655,3
2,0,902,3
3,0,3339,4
4,0,2286,5
...,...,...,...
1000204,6039,1075,1
1000205,6039,1078,5
1000206,6039,558,5
1000207,6039,1080,4


In [32]:
get_user_best_ratings(ratings, [0,1, 999], top=5)

Unnamed: 0_level_0,Unnamed: 1_level_0,movie_id,rating
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0,1176,5
0,1,1016,5
0,2,0,5
0,3,3036,5
0,4,1892,5
1,0,1336,5
1,1,2167,5
1,2,3078,5
1,3,1273,5
1,4,108,5


In [33]:
get_recommendations(movies, ratings, similarity, [0, 1,999], top=5)

Unnamed: 0_level_0,Unnamed: 1_level_0,movie_id,rating,title,year,1,2,3,4,5
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0,1176,5,One Flew Over the Cuckoo's Nest,1975,"(3882, Contender, The)","(2452, Airport 1975)","(872, Sweet Nothing)","(2437, Other Sister, The)","(871, Trigger Effect, The)"
0,1,1016,5,Dumbo,1941,"(1642, Anastasia)","(360, Lion King, The)","(1459, Cats Don't Dance)","(773, Hunchback of Notre Dame, The)","(1526, Hercules)"
0,2,0,5,Toy Story,1995,"(667, Space Jam)","(3685, Adventures of Rocky and Bullwinkle, The)","(3682, Chicken Run)","(2009, Jungle Book, The)","(2011, Lady and the Tramp)"
0,3,3036,5,Awakenings,1990,"(3882, Contender, The)","(2452, Airport 1975)","(872, Sweet Nothing)","(2437, Other Sister, The)","(871, Trigger Effect, The)"
0,4,1892,5,Rain Man,1988,"(3882, Contender, The)","(2452, Airport 1975)","(872, Sweet Nothing)","(2437, Other Sister, The)","(871, Trigger Effect, The)"
1,0,1336,5,Shine,1996,"(2858, Brief Encounter)","(1156, Cinema Paradiso)","(193, Something to Talk About)","(3086, Anna and the King)","(3599, Romeo and Juliet)"
1,1,2167,5,Simon Birch,1998,"(3882, Contender, The)","(2452, Airport 1975)","(872, Sweet Nothing)","(2437, Other Sister, The)","(871, Trigger Effect, The)"
1,2,3078,5,"Green Mile, The",1999,"(3882, Contender, The)","(896, North by Northwest)","(1778, Ratchet)","(227, Dolores Claiborne)","(2133, Lifeboat)"
1,3,1273,5,Gandhi,1982,"(3882, Contender, The)","(2452, Airport 1975)","(872, Sweet Nothing)","(2437, Other Sister, The)","(871, Trigger Effect, The)"
1,4,108,5,Braveheart,1995,"(1178, Star Wars: Episode V - The Empire Strik...","(1545, G.I. Jane)","(3684, Patriot, The)","(3574, Fighting Seabees, The)","(2993, Longest Day, The)"


# Conclusion:

The code presented is a first implementation but has a number of shortcomings preventing the collaboration of multiple MLE and Data Scientists:
- It is not possible to introduce easily new features mainly because the code is just a bunch of functions in one file.
- The code can not be scaled to other datasets or variations of the tasks.
- There is no evaluation of the performances.
- There is no testing

Additionaly a number we could think of some features to add, for example, what about looking at similar users to find a recommendation for our targeted user ?

# Find similar users

Following the same approach with movies, I'm going to use a one hot encoder to transform each unique value in category and then compute the similarity matrix on users.

In [34]:
users.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6040 entries, 0 to 6039
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   gender                6040 non-null   object
 1   age                   6040 non-null   int64 
 2   occupation            6040 non-null   int64 
 3   zip_code              6040 non-null   object
 4   generalized_zip_code  6040 non-null   object
dtypes: int64(2), object(3)
memory usage: 283.1+ KB


In [35]:
users[["age", "occupation"]] = users[["age", "occupation"]].astype("category")

In [36]:
users.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6040 entries, 0 to 6039
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype   
---  ------                --------------  -----   
 0   gender                6040 non-null   object  
 1   age                   6040 non-null   category
 2   occupation            6040 non-null   category
 3   zip_code              6040 non-null   object  
 4   generalized_zip_code  6040 non-null   object  
dtypes: category(2), object(3)
memory usage: 201.7+ KB


In [37]:
dummies_users = pd.get_dummies(users.drop("zip_code", axis=1))
dummies_users

Unnamed: 0_level_0,gender_F,gender_M,age_1,age_18,age_25,age_35,age_45,age_50,age_56,occupation_0,...,generalized_zip_code_988,generalized_zip_code_989,generalized_zip_code_990,generalized_zip_code_991,generalized_zip_code_992,generalized_zip_code_993,generalized_zip_code_995,generalized_zip_code_997,generalized_zip_code_998,generalized_zip_code_999
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,1,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,0,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,1,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6035,1,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6036,1,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6037,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
6038,1,0,0,0,0,0,1,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [38]:
pair_similarity_user = pairsimilarity(dummies_users.values) # Slow operations
pair_similarity_user

array([[4, 0, 0, ..., 1, 1, 0],
       [0, 4, 1, ..., 1, 0, 1],
       [0, 1, 4, ..., 0, 0, 2],
       ...,
       [1, 1, 0, ..., 4, 1, 0],
       [1, 0, 0, ..., 1, 4, 0],
       [0, 1, 2, ..., 0, 0, 4]], dtype=uint8)

#### Example for user_id = 10

In [39]:
current_user = [10]

In [40]:
users_similar_to_current_user = get_most_similar_by_id(pair_similarity_user, current_user)
users_similar_to_current_user

array([[ 595, 2705, 3357, 4913,  498, 5130, 3387, 3078,   68, 3828]])

In [41]:
pair_similarity_user[
    users_similar_to_current_user,
    current_user
]

array([[3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], dtype=uint8)

It's the best similar users for ``user_id=10``.

In [42]:
get_recommendations(movies, ratings, similarity, users_similar_to_current_user, top=2)

Unnamed: 0_level_0,Unnamed: 1_level_0,movie_id,rating,title,year,1,2
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
68,0,1166,5,"English Patient, The",1996,"(149, Rob Roy)","(900, Casablanca)"
68,1,293,5,Pulp Fiction,1994,"(1528, MURDER and murder)","(3259, Ghost Dog: The Way of the Samurai)"
498,0,1180,5,Raiders of the Lost Ark,1981,"(1255, Highlander)","(2098, Blade)"
498,1,795,5,"Time to Kill, A",1996,"(3882, Contender, The)","(2452, Airport 1975)"
595,0,547,5,"Nightmare Before Christmas, The",1993,"(105, Muppet Treasure Island)","(1015, Mary Poppins)"
595,1,3104,5,Any Given Sunday,1999,"(3882, Contender, The)","(2452, Airport 1975)"
2705,0,493,5,Much Ado About Nothing,1993,"(3039, Fisher King, The)","(2316, Home Fries)"
2705,1,1287,5,When Harry Met Sally...,1989,"(3039, Fisher King, The)","(2316, Home Fries)"
3828,1,1287,5,When Harry Met Sally...,1989,"(3039, Fisher King, The)","(2316, Home Fries)"
3078,0,109,5,Taxi Driver,1976,"(3882, Contender, The)","(896, North by Northwest)"
