## Using KNN


* A generic collaborative filtering model where the only data which can be used is user and item interaction
* Here we use explicity feedback to recommend top n movies based on user's favourite movie 
* Other than user-item interaction via explicit feedback, which here is rating from 1 to 5, we don't use any other feature
* Data Source: https://grouplens.org/datasets/movielens/latest/ (ml-latest-small-zip)
* Theory reference: https://towardsdatascience.com/prototyping-a-recommender-system-step-by-step-part-1-knn-item-based-collaborative-filtering-637969614ea

In [1]:
import pandas as pd
import os 
from scipy.sparse import csr_matrix
from fuzzywuzzy import fuzz



In [2]:
os.getcwd()

'c:\\Users\\manpresingh\\OneDrive - Microsoft\\Personal\\Recommendation Models'

In [3]:
df_movies = pd.read_csv('./ml-latest-small/movies.csv', 
                        usecols=['movieId', 'title', 'genres'], 
                        dtype={'movieId':'int32', 'title':'str', 'genres':'str'})

df_ratings = pd.read_csv('./ml-latest-small/ratings.csv',
                         usecols=['userId', 'movieId', 'rating'],
                         dtype={'userId': 'int32', 'movieId': 'int32', 'rating': 'float32'})

In [4]:
df_movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9742 entries, 0 to 9741
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   movieId  9742 non-null   int32 
 1   title    9742 non-null   object
 2   genres   9742 non-null   object
dtypes: int32(1), object(2)
memory usage: 190.4+ KB


In [5]:
df_ratings.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100836 entries, 0 to 100835
Data columns (total 3 columns):
 #   Column   Non-Null Count   Dtype  
---  ------   --------------   -----  
 0   userId   100836 non-null  int32  
 1   movieId  100836 non-null  int32  
 2   rating   100836 non-null  float32
dtypes: float32(1), int32(2)
memory usage: 1.2 MB


In [6]:
df_movies.head(5)

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [7]:
df_ratings.head(5)

Unnamed: 0,userId,movieId,rating
0,1,1,4.0
1,1,3,4.0
2,1,6,4.0
3,1,47,5.0
4,1,50,5.0


In [8]:
num_users = df_ratings.userId.nunique()
num_items = df_ratings.movieId.nunique()
num_users, num_items

(610, 9724)

In [9]:
df_ratings['rating'].value_counts().sort_index()

rating
0.5     1370
1.0     2811
1.5     1791
2.0     7551
2.5     5550
3.0    20047
3.5    13136
4.0    26818
4.5     8551
5.0    13211
Name: count, dtype: int64

In [10]:
df_ratings['movieId'].value_counts().sort_values(ascending=False).head(5)

movieId
356     329
318     317
296     307
593     279
2571    278
Name: count, dtype: int64

In [11]:
# Randomly took 50 for this POC
# Ideally, we need to plot and find the popular movies which are being seen / rated more
# This is doone to avoid using unseen/unrated movies to reduce KNN compue
# But, it can also degrade the model and as unrated movies will never get recommended

pd.DataFrame(df_ratings['movieId'].value_counts()).query('count >=50')

Unnamed: 0_level_0,count
movieId,Unnamed: 1_level_1
356,329
318,317
296,307
593,279
2571,278
...,...
333,50
3785,50
8361,50
2105,50


In [12]:
popular_movies = list(set(pd.DataFrame(df_ratings['movieId'].value_counts()).query('count >=50').index))

df_movies.movieId.nunique()

In [13]:
df_ratings_popular_movies = df_ratings[df_ratings.movieId.isin(popular_movies)]

In [14]:
df_ratings.shape

(100836, 3)

In [15]:
df_ratings_popular_movies.shape

(41360, 3)

In [16]:
df_ratings_popular_movies.movieId.nunique()

450

In [17]:
df_ratings_popular_movies.userId.nunique()

606

In [18]:
movie_user_matrix = df_ratings_popular_movies.pivot(index='movieId', columns = 'userId', values='rating').fillna(0)
movie_user_matrix

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,0.0,0.0,4.0,0.0,4.5,0.0,0.0,0.0,...,4.0,0.0,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0,0.0,0.0,...,0.0,4.0,0.0,5.0,3.5,0.0,0.0,2.0,0.0,0.0
3,4.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
6,4.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,3.0,4.0,3.0,0.0,0.0,0.0,0.0,0.0,5.0
7,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,2.5,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
109374,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.5
109487,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,...,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5
112852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.5
116797,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
movie_to_idx = {
    movie: i for i, movie in enumerate(list(df_movies.set_index('movieId').loc[movie_user_matrix.index].title))
    
}

In [20]:
movie_to_idx

{'Toy Story (1995)': 0,
 'Jumanji (1995)': 1,
 'Grumpier Old Men (1995)': 2,
 'Heat (1995)': 3,
 'Sabrina (1995)': 4,
 'GoldenEye (1995)': 5,
 'American President, The (1995)': 6,
 'Casino (1995)': 7,
 'Sense and Sensibility (1995)': 8,
 'Ace Ventura: When Nature Calls (1995)': 9,
 'Get Shorty (1995)': 10,
 'Leaving Las Vegas (1995)': 11,
 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)': 12,
 'Babe (1995)': 13,
 'Dead Man Walking (1995)': 14,
 'Clueless (1995)': 15,
 'Seven (a.k.a. Se7en) (1995)': 16,
 'Pocahontas (1995)': 17,
 'Usual Suspects, The (1995)': 18,
 "Mr. Holland's Opus (1995)": 19,
 'From Dusk Till Dawn (1996)': 20,
 'Broken Arrow (1996)': 21,
 'Happy Gilmore (1996)': 22,
 'Braveheart (1995)': 23,
 'Taxi Driver (1976)': 24,
 'Birdcage, The (1996)': 25,
 'Bad Boys (1995)': 26,
 'Apollo 13 (1995)': 27,
 'Batman Forever (1995)': 28,
 'Casper (1995)': 29,
 'Congo (1995)': 30,
 'Crimson Tide (1995)': 31,
 'Desperado (1995)': 32,
 'Die Hard: With a Vengeance (1995)': 33,
 'First Kni

In [21]:
movie_user_sparse_matrix = csr_matrix(movie_user_matrix.values)

In [22]:
movie_user_sparse_matrix

<450x606 sparse matrix of type '<class 'numpy.float32'>'
	with 41360 stored elements in Compressed Sparse Row format>

In [23]:
movie_user_matrix

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,0.0,0.0,4.0,0.0,4.5,0.0,0.0,0.0,...,4.0,0.0,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0,0.0,0.0,...,0.0,4.0,0.0,5.0,3.5,0.0,0.0,2.0,0.0,0.0
3,4.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
6,4.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,3.0,4.0,3.0,0.0,0.0,0.0,0.0,0.0,5.0
7,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,2.5,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
109374,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.5
109487,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,...,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5
112852,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.5
116797,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
print(movie_user_sparse_matrix)
# (0,4) means that oth row index has some non-zero value for 4th column index in movie_user_matrix

  (0, 0)	4.0
  (0, 4)	4.0
  (0, 6)	4.5
  (0, 14)	2.5
  (0, 16)	4.5
  (0, 17)	3.5
  (0, 18)	4.0
  (0, 20)	3.5
  (0, 26)	3.0
  (0, 30)	5.0
  (0, 31)	3.0
  (0, 32)	3.0
  (0, 39)	5.0
  (0, 42)	5.0
  (0, 43)	3.0
  (0, 44)	4.0
  (0, 45)	5.0
  (0, 49)	3.0
  (0, 52)	3.0
  (0, 55)	5.0
  (0, 61)	5.0
  (0, 62)	4.0
  (0, 64)	4.0
  (0, 66)	2.5
  (0, 69)	5.0
  :	:
  (449, 302)	5.0
  (449, 303)	3.0
  (449, 316)	3.5
  (449, 328)	5.0
  (449, 336)	2.0
  (449, 349)	2.5
  (449, 362)	3.5
  (449, 377)	5.0
  (449, 405)	3.5
  (449, 411)	4.0
  (449, 414)	4.0
  (449, 444)	4.0
  (449, 471)	5.0
  (449, 491)	5.0
  (449, 519)	5.0
  (449, 521)	3.5
  (449, 530)	5.0
  (449, 544)	3.0
  (449, 546)	5.0
  (449, 556)	3.0
  (449, 557)	2.0
  (449, 581)	4.0
  (449, 591)	4.0
  (449, 594)	3.5
  (449, 605)	3.0


In [25]:
from sklearn.neighbors import NearestNeighbors
import sklearn

In [26]:
model_knn = NearestNeighbors(metric='cosine', algorithm='auto', n_neighbors=20, n_jobs=-1)

# https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html

# fit
model_knn.fit(movie_user_sparse_matrix)

In [27]:
# model_knn = NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)
# # fit
# model_knn.fit(movie_user_matrix)


# This also works, byt why do we need csr_matrix ? 
# We need csr for efficient storage and computation when dealing with sparse data
# Now, our default movie_user_matrix is sparse, hence having most of values as 0.
# Now, csdr matrix helps to store only the non-zero elements, thereby saving memeory

# 
# The Nearest Neighbor algorithms in Scikit-learn can handle 
# either dense NumPy arrays or sparse matrices as input2. 
# For sparse matrices, only “nonzero” elements may be considered neighbors. 
# Therefore, using a csr_matrix can make the computation more efficient and feasible, 
# especially when dealing with large datasets.

In [28]:
my_favorite = 'Jurasic'

In [29]:
match_tuple=[]
for title, idx in movie_to_idx.items():
    ratio = fuzz.ratio(title.lower(), my_favorite.lower())
    if ratio >=45:
        match_tuple.append((title, idx, ratio))

In [30]:
match_tuple = sorted(match_tuple, key=lambda x: x[2])[::-1]

In [31]:
match_tuple

[('Jurassic Park (1993)', 81, 52)]

In [32]:
fuzz_matched_movies = [x[0] for x in match_tuple]
# This is done to find the movie name which matches our list of movies
# Sometimes user can type incorrect spelling or some other shortform of the movie
# So, fuzzy is used to find the most similar movie in our movie list which the user has entered
# The, this most simialr movie becomes the user input and is used by KN to find the similar movies

In [33]:
fuzz_matched_movies

['Jurassic Park (1993)']

In [34]:
match_tuple[0][1]

81

In [35]:
distances, indices = model_knn.kneighbors(movie_user_sparse_matrix[match_tuple[0][1]], n_neighbors=10)

In [36]:
distances

array([[0.        , 0.28001684, 0.31174213, 0.33054626, 0.33119565,
        0.3382675 , 0.36050385, 0.36158162, 0.37122047, 0.38068283]],
      dtype=float32)

In [37]:
raw_recommends = \
        sorted(list(zip(indices.squeeze().tolist(), distances.squeeze().tolist())), key=lambda x: x[1])[:0:-1]

In [38]:
raw_recommends

[(69, 0.3806828260421753),
 (27, 0.3712204694747925),
 (114, 0.36158162355422974),
 (99, 0.3605038523674011),
 (68, 0.3382675051689148),
 (78, 0.3311956524848938),
 (23, 0.3305462598800659),
 (62, 0.3117421269416809),
 (97, 0.28001683950424194)]

In [39]:
reverse_mapper = {v: k for k, v in movie_to_idx.items()}

In [40]:
reverse_mapper

{0: 'Toy Story (1995)',
 1: 'Jumanji (1995)',
 2: 'Grumpier Old Men (1995)',
 3: 'Heat (1995)',
 4: 'Sabrina (1995)',
 5: 'GoldenEye (1995)',
 6: 'American President, The (1995)',
 7: 'Casino (1995)',
 8: 'Sense and Sensibility (1995)',
 9: 'Ace Ventura: When Nature Calls (1995)',
 10: 'Get Shorty (1995)',
 11: 'Leaving Las Vegas (1995)',
 12: 'Twelve Monkeys (a.k.a. 12 Monkeys) (1995)',
 13: 'Babe (1995)',
 14: 'Dead Man Walking (1995)',
 15: 'Clueless (1995)',
 16: 'Seven (a.k.a. Se7en) (1995)',
 17: 'Pocahontas (1995)',
 18: 'Usual Suspects, The (1995)',
 19: "Mr. Holland's Opus (1995)",
 20: 'From Dusk Till Dawn (1996)',
 21: 'Broken Arrow (1996)',
 22: 'Happy Gilmore (1996)',
 23: 'Braveheart (1995)',
 24: 'Taxi Driver (1976)',
 25: 'Birdcage, The (1996)',
 26: 'Bad Boys (1995)',
 27: 'Apollo 13 (1995)',
 28: 'Batman Forever (1995)',
 29: 'Casper (1995)',
 30: 'Congo (1995)',
 31: 'Crimson Tide (1995)',
 32: 'Desperado (1995)',
 33: 'Die Hard: With a Vengeance (1995)',
 34: 'First

In [41]:
for i, (fuzz_matched_movies, dist) in enumerate(raw_recommends):
        print(f'{i+1} -> {reverse_mapper[fuzz_matched_movies]} -> {dist}')
        # print(f'{i}: {reverse_mapper[fuzz_matched_movies]}, with distance of {dist}')

1 -> True Lies (1994) -> 0.3806828260421753
2 -> Apollo 13 (1995) -> 0.3712204694747925
3 -> Independence Day (a.k.a. ID4) (1996) -> 0.36158162355422974
4 -> Batman (1989) -> 0.3605038523674011
5 -> Speed (1994) -> 0.3382675051689148
6 -> Fugitive, The (1993) -> 0.3311956524848938
7 -> Braveheart (1995) -> 0.3305462598800659
8 -> Forrest Gump (1994) -> 0.3117421269416809
9 -> Terminator 2: Judgment Day (1991) -> 0.28001683950424194


### Working of KNN manually:

In [152]:
movie_user_matrix.shape
# 450 movies and 606 users

(450, 606)

In [156]:
movie_user_matrix.iloc[81,:]
# Now, this is the vector of 81st movie

userId
1      4.0
2      0.0
3      0.0
4      0.0
5      0.0
      ... 
606    2.5
607    4.0
608    3.0
609    3.0
610    5.0
Name: 480, Length: 606, dtype: float32

In [158]:
reverse_mapper[81]

'Jurassic Park (1993)'

In [159]:
movie_to_idx['Jurassic Park (1993)']

81

In [160]:
# Now, let's try to find the cosine similarity of this 81st movie with all the other movies

In [231]:
# let's start with true Lines:

movie_to_idx['True Lies (1994)']

69

In [173]:
import numpy as np
from numpy.linalg import norm

base_movie = np.array(movie_user_matrix.iloc[81,:])

First_movie = np.array(movie_user_matrix.iloc[69,:])

In [230]:
base_movie, First_movie

(array([4. , 0. , 0. , 0. , 0. , 5. , 5. , 4. , 0. , 0. , 4. , 0. , 0. ,
        3. , 0. , 0. , 4.5, 3.5, 2. , 0. , 4. , 0. , 0. , 0. , 0. , 0. ,
        4. , 2.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 3. , 0. ,
        0. , 0. , 0. , 4. , 0. , 4. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
        3. , 0. , 5. , 5. , 4. , 3. , 0. , 0. , 0. , 2.5, 4. , 0. , 3. ,
        0. , 3.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 2. , 0. , 4. , 5. ,
        0. , 4. , 4. , 3. , 3. , 0. , 3. , 0. , 0. , 0. , 0. , 4.5, 0. ,
        5. , 4. , 3.5, 5. , 0. , 0. , 0. , 0. , 0. , 3. , 4. , 0. , 0. ,
        0. , 0. , 4. , 3. , 0. , 2.5, 5. , 0. , 0. , 5. , 4. , 4. , 0. ,
        0. , 0. , 0. , 4. , 0. , 3.5, 0. , 3. , 0. , 0. , 3.5, 0. , 0. ,
        0. , 3. , 0. , 5. , 0. , 3.5, 0. , 0. , 4. , 3. , 3. , 0. , 3. ,
        0. , 0. , 0. , 0. , 3. , 0. , 0. , 0. , 0. , 0. , 0. , 3. , 0. ,
        0. , 3. , 2. , 0. , 0. , 0. , 5. , 4. , 0. , 3.5, 0. , 4. , 3. ,
        0. , 0. , 4. , 5. , 0. , 3.5, 3. , 4. , 0. 

In [174]:
import scipy

In [175]:
scipy.spatial.distance.cosine(base_movie, First_movie)

0.3806830048561096

## use itemCF from LibRecommender

In [42]:
import numpy as np
import pandas as pd
from libreco.data import random_split, DatasetPure
from libreco.algorithms import ItemCF  # ItemCF algorithm
from libreco.evaluation import evaluate

Instructions for updating:
non-resource variables are not supported in the long term


In [95]:
df_ratings_new = df_ratings.rename(columns={'userId':'user', 'movieId':'item', 'rating':'label'})

In [104]:
from libreco.data import random_split

train_data, eval_data, test_data = random_split(df_ratings_new, multi_ratios=[0.8, 0.1, 0.1])

In [105]:
train_data

Unnamed: 0,user,item,label
20596,135,4085,4.0
37993,260,750,4.0
98177,606,6798,2.5
36409,249,434,3.5
88947,573,74458,4.5
...,...,...,...
56650,376,48516,4.5
76231,480,10,4.0
57287,380,3408,3.0
12085,74,3481,3.5


In [107]:
eval_data.user.unique

<bound method Series.unique of 38114    260
26441    182
53190    351
92569    597
42863    288
        ... 
19164    123
25238    177
48888    317
79107    489
8165      57
Name: user, Length: 9651, dtype: int32>

In [108]:
train_data, data_info = DatasetPure.build_trainset(train_data)
eval_data = DatasetPure.build_evalset(eval_data)
test_data = DatasetPure.build_testset(test_data)

https://librecommender.readthedocs.io/en/latest/api/algorithms/item_cf.html

In [250]:
# Initialize ItemCF model
itemcf = ItemCF(
    task="rating",
    data_info=data_info,
    sim_type="pearson", #sim_type ({'cosine', 'pearson', 'jaccard'}, default: 'cosine') – Types for computing similarities.
    k_sim=75, #Number of similar items to use
    store_top_k=True, #store_top_k (bool, default: True) – Whether to store top k similar items after training.
    num_threads=3,
    min_common=4, #Number of minimum common items to consider when computing similarities.
    mode="invert",
    seed=100,
    # lower_upper_bound (tuple or None, default: None) – Lower and upper score bound for rating task.
)

In [251]:
itemcf.fit(train_data,neg_sampling=False, verbose=1, eval_data=eval_data)

Training start time: [35m2024-04-09 21:32:07[0m
Final block size and num: (8965, 1)
sim_matrix elapsed: 0.800s
sim_matrix, shape: (8965, 8965), num_elements: 2121120, density: 570.0403 %


top_k: 100%|██████████| 8965/8965 [00:00<00:00, 17352.95it/s]


In [252]:
evaluate(model=itemcf, neg_sampling=False, data=test_data)

eval_pointwise:   0%|          | 0/2 [00:00<?, ?it/s]

[31mNo common interaction or similar neighbor for user 600 and item 5940, proceed with default prediction[0m
[31mNo common interaction or similar neighbor for user 139 and item 1957, proceed with default prediction[0m
[31mNo common interaction or similar neighbor for user 181 and item 3822, proceed with default prediction[0m
[31mNo common interaction or similar neighbor for user 324 and item 72, proceed with default prediction[0m
[31mNo common interaction or similar neighbor for user 155 and item 2316, proceed with default prediction[0m
[31mNo common interaction or similar neighbor for user 56 and item 1521, proceed with default prediction[0m


eval_pointwise: 100%|██████████| 2/2 [00:00<00:00,  2.19it/s]


{'loss': 1.0437134096152516}

In [253]:
itemcf.predict(user=[1,1,1], item=[12,89,2139],
               cold_start='popular')
# Make prediction(s) on given user(s) and item(s).

[4.3551411628723145, 3.7751598358154297, 4.6934285163879395]

* By default, the recommendation result returned by model.recommend_user() method will filter out items that a user has previously consumed.
* However, if you use a very large n_rec and number of consumed items for this user plus n_rec exceeds number of total items, i.e. len(user_consumed) + n_rec > n_items, the consumed items will not be filtered out since there are not enough items to recommend. If you don’t want to filter out consumed items, set filter_consumed=False
* LibRecommender also supports random recommendation by setting random_rec=True (By default it is False). Of course, it’s not completely random, but random sampling based on each item’s prediction scores. It’s basically a trade-off between accuracy and diversity
* Finally, batch recommendation is also supported by simply passing a list to the user parameter. The returned result will be a dict, with users as keys and numpy.array as values.

In [254]:
user_id=1
mylist = list(itemcf.recommend_user(user=user_id, 
                                    n_rec=10, 
                                    filter_consumed=False,
                                    random_rec=False)[user_id])
mylist
# Recommend a list of items for given user(s).

# itemcf.recommend_user(user=[1], n_rec=10, filter_consumed=False)

[2139, 1025, 1275, 2090, 1031, 2470, 2193, 2949, 2993, 2947]

In [255]:
df_ratings[(df_ratings.userId==user_id) & (df_ratings.movieId.isin(mylist))].sort_values(by='rating', ascending=False)#[0:20]


Unnamed: 0,userId,movieId,rating
52,1,1025,5.0
55,1,1031,5.0
86,1,1275,5.0
128,1,2090,5.0
137,1,2139,5.0
159,1,2470,5.0
189,1,2947,5.0
191,1,2949,5.0
196,1,2993,5.0
142,1,2193,4.0


In [256]:
user_id=1
mylist = list(itemcf.recommend_user(user=user_id, 
                                    n_rec=10, 
                                    filter_consumed=True,
                                    random_rec=False)[user_id])
mylist
# Recommend a list of items for given user(s).

# itemcf.recommend_user(user=[1], n_rec=10, filter_consumed=False)

[4855, 45186, 5110, 2161, 7162, 3638, 2802, 2134, 3740, 6755]

In [257]:
df_ratings[(df_ratings.userId==user_id) & (df_ratings.movieId.isin(mylist))].sort_values(by='rating', ascending=False)#[0:20]


Unnamed: 0,userId,movieId,rating
140,1,2161,5.0
226,1,3740,4.0
