### Матричные факторизации

В данной работе вам предстоит познакомиться с практической стороной матричных разложений.
Работа поделена на 4 задания:
1. Вам необходимо реализовать SVD разложения используя SGD на explicit данных
2. Вам необходимо реализовать матричное разложения используя ALS на implicit данных
3. Вам необходимо реализовать матричное разложения используя BPR(pair-wise loss) на implicit данных
4. Вам необходимо реализовать матричное разложения используя WARP(list-wise loss) на implicit данных

Мягкий дедлайн 28 Сентября (пишутся замечания, выставляется оценка, есть возможность исправить до жесткого дедлайна)

Жесткий дедлайн 5 Октября (Итоговая проверка)

In [1]:
!pip install implicit
!pip install lightfm

Collecting implicit
[?25l  Downloading https://files.pythonhosted.org/packages/bc/07/c0121884722d16e2c5beeb815f6b84b41cbf22e738e4075f1475be2791bc/implicit-0.4.4.tar.gz (1.1MB)
[K     |████████████████████████████████| 1.1MB 2.8MB/s 
Building wheels for collected packages: implicit
  Building wheel for implicit (setup.py) ... [?25l[?25hdone
  Created wheel for implicit: filename=implicit-0.4.4-cp36-cp36m-linux_x86_64.whl size=3419437 sha256=e5345d070bee15379d8cb8f1660134bfd906f57038f4be3ce6a5a2a9da1d91dc
  Stored in directory: /root/.cache/pip/wheels/bf/d4/ec/fd4f622fcbefb7521f149905295b2c26adecb23af38aa28217
Successfully built implicit
Installing collected packages: implicit
Successfully installed implicit-0.4.4
Collecting lightfm
[?25l  Downloading https://files.pythonhosted.org/packages/e9/8e/5485ac5a8616abe1c673d1e033e2f232b4319ab95424b42499fabff2257f/lightfm-1.15.tar.gz (302kB)
[K     |████████████████████████████████| 307kB 2.9MB/s 
Building wheels for collected packages: li

In [2]:
import implicit
import pandas as pd
import numpy as np
import scipy.sparse as sp
from tqdm.notebook import tqdm

from lightfm.datasets import fetch_movielens

In [3]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn.functional as F
import torchvision.models as models
import torchvision

In [4]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

В данной работе мы будем работать с explicit датасетом movieLens, в котором представленны пары user_id movie_id и rating выставленный пользователем фильму

Скачать датасет можно по ссылке https://grouplens.org/datasets/movielens/1m/

In [5]:
!wget http://files.grouplens.org/datasets/movielens/ml-1m.zip
!unzip ml-1m.zip

--2020-10-06 19:20:48--  http://files.grouplens.org/datasets/movielens/ml-1m.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5917549 (5.6M) [application/zip]
Saving to: ‘ml-1m.zip’


2020-10-06 19:20:48 (19.0 MB/s) - ‘ml-1m.zip’ saved [5917549/5917549]

Archive:  ml-1m.zip
   creating: ml-1m/
  inflating: ml-1m/movies.dat        
  inflating: ml-1m/ratings.dat       
  inflating: ml-1m/README            
  inflating: ml-1m/users.dat         


In [6]:
ratings = pd.read_csv('ml-1m/ratings.dat', delimiter='::', header=None, 
        names=['user_id', 'movie_id', 'rating', 'timestamp'], 
        usecols=['user_id', 'movie_id', 'rating'], engine='python')

In [7]:
movie_info = pd.read_csv('ml-1m/movies.dat', delimiter='::', header=None, 
        names=['movie_id', 'name', 'category'], engine='python')

Теперь ваша очередь реализовать самые популярные алгоритмы матричных разложений

Что будет оцениваться:
1. Корректность алгоритма
2. Качество получившихся симиларов
3. Качество итоговых рекомендаций для юзера

**что поменялось:**
* Везде пофиксил баг с неправильными индексами при рекомендациях
* Везде подвигал немного параметры

### preprocess

In [8]:
users = ratings["user_id"]
movies = ratings["movie_id"]


меняем индексацию, чтобы не хранить несуществующих пользователей и фильмы

In [9]:
uid2userid = np.unique(users)
userid2uid = {i: j for (j, i) in enumerate(uid2userid)}
new_users = [userid2uid[i] for i in users]

In [10]:
mid2movieid = np.unique(movies)
movieid2mid = {i: j for (j, i) in enumerate(mid2movieid)}
new_movies = [movieid2mid[i] for i in movies]

сделал explicit матрицу

In [11]:
explicit = np.vstack((new_users, new_movies, ratings["rating"].to_numpy())).T

In [12]:
user_item = sp.coo_matrix((ratings["rating"], (new_users, new_movies)))
user_item_explicit = user_item.tocsr()

In [13]:
user_count, movie_count = len(uid2userid), len(mid2movieid)

делаю implicit матрицу

In [14]:
implicit_data = explicit[explicit[:,2] >= 4]
implicit_data[:,2] = 1

In [15]:
user_item = sp.coo_matrix((np.ones_like(implicit_data[:,0]), (implicit_data[:,0], implicit_data[:,1])))
user_item_implicit = user_item.tocsr()

поменял лямбды

In [16]:
get_similars = lambda item_id, model : [movie_info[movie_info["movie_id"] == mid2movieid[i]]["name"].to_string()
                                        for i in model.similar_items(movieid2mid[item_id])]

In [17]:
get_recommendations = lambda user_id, model, mat, col="name" : [movie_info[movie_info["movie_id"] == mid2movieid[i]][col].to_string() 
                                                    for i in model.recommend(userid2uid[user_id], mat)]

In [18]:
get_user_history = lambda user_id, implicit_ratings : [movie_info[movie_info["movie_id"] == mid2movieid[x]]["name"].to_string() 
                                            for x in implicit_ratings[implicit_ratings[:,0] == userid2uid[user_id]][:,1]]

### Задание 1. Не использую готовые решения, реализовать SVD разложение используя SGD на explicit данных

* вместо sgd теперь просто gd, и обновление идет сразу по всем векторам

In [None]:
class SVD:
    def __init__(self, user_count, movie_count, k=128):
       self.u = np.random.uniform(0, 1 / np.sqrt(k), size=(user_count, k))
       self.v = np.random.uniform(0, 1 / np.sqrt(k), size=(movie_count, k))
       self.bu = np.random.uniform(0, 1 / np.sqrt(k), size=user_count)
       self.bv = np.random.uniform(0, 1 / np.sqrt(k), size=movie_count)
    #    self.b = 3


    def run_train(self, train_data, 
                  l=0.01, 
                  g=0.01, 
                  lr=1e-3, 
                  bs=512,
                  epochs=10):

        for i in tqdm(range(epochs)):
            running_loss = 0.0
            pred = self.u @ self.v.T
            pred = pred + self.bv
            pred = (pred.T + self.bu).T

            error = pred
            error[train_data[:,0], train_data[:,1]] = pred[train_data[:,0], train_data[:,1]] - train_data[:,2]


            new_u = self.u - lr * (error @ self.v + l * self.u)
            new_v = self.v - lr * (error.T @ self.u + l * self.v)
            new_bu = self.bu - lr * (error.mean(axis=1) + g * self.bu)
            new_bv = self.bv - lr * (error.mean(axis=0) + g * self.bv)

            self.u = new_u
            self.v = new_v
            self.bu = new_bu
            self.bv = new_bv
            
                
    def similar_items(self, item_id, N=10):
        scores = self.v @ self.v[item_id] / np.linalg.norm(self.v, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind

    def recommend(self, user_id, user_items, N=10):
        unseen = np.array([False if i in user_items[user_id].nonzero()[1] else True for i in np.arange(len(self.v))])
        scores = self.v[unseen] @ self.u[user_id] + self.bv[unseen]
        ind = np.argsort(scores)[::-1]
        real_ind = np.arange(len(self.v))[unseen][ind]
        return real_ind[:N]

In [None]:
model = SVD(user_count, movie_count, k=128)

In [None]:
model.run_train(explicit, epochs=100, lr=5e-4, l=1e-3, g=1e-3)

HBox(children=(FloatProgress(value=0.0), HTML(value='')))




In [None]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '584    Aladdin (1992)',
 '360    Lion King, The (1994)',
 '33    Babe (1995)',
 '591    Beauty and the Beast (1991)',
 '2225    Antz (1998)',
 '3186    League of Their Own, A (1992)',
 '1526    Hercules (1997)']

In [None]:
get_recommendations(4, model, user_item_explicit, col="name")

['2502    Matrix, The (1999)',
 '1284    Butch Cassidy and the Sundance Kid (1969)',
 '585    Terminator 2: Judgment Day (1991)',
 '453    Fugitive, The (1993)',
 '1271    Indiana Jones and the Last Crusade (1989)',
 '847    Godfather, The (1972)',
 '2460    Planet of the Apes (1968)',
 '108    Braveheart (1995)',
 '1884    French Connection, The (1971)',
 '1182    Aliens (1986)']

In [None]:
get_recommendations(4, model, user_item_explicit, col="category")

['2502    Action|Sci-Fi|Thriller',
 '1284    Action|Comedy|Western',
 '585    Action|Sci-Fi|Thriller',
 '453    Action|Thriller',
 '1271    Action|Adventure',
 '847    Action|Crime|Drama',
 '2460    Action|Sci-Fi',
 '108    Action|Drama|War',
 '1884    Action|Crime|Drama|Thriller',
 '1182    Action|Sci-Fi|Thriller|War']

### Задание 2. Не использую готовые решения, реализовать матричное разложение используя ALS на implicit данных

* поменял точную оптимизацию на градиентный спуск из лекции

In [None]:
class SGDALS:
    def __init__(self, user_count, movie_count, k=32):
       self.x = np.random.uniform(0, 1 / np.sqrt(k), size=(user_count, k))
       self.y = np.random.uniform(0, 1 / np.sqrt(k), size=(movie_count, k))
       self.user_count = user_count
       self.k = k
       self.movie_count = movie_count

    def run_train(self, r, l=0.05, lr=1e-3, epochs=100):
        for i in tqdm(range(epochs)):
            pred = self.x @ self.y.T
            pred[r[:,0], r[:,1]] -= r[:,2]

            if i % 2:
                #user update
                self.x = self.x - lr * (pred @ self.y + l * self.x)
            else:
                #item update
                self.y = self.y - lr * (pred.T @ self.x + l * self.y)
            
    
    def similar_items(self, item_id, N=10):
        scores = self.y @ self.y[item_id] / np.linalg.norm(self.y, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind

    def similar_users(self, user_id, N=10):
        scores = self.x @ self.x[user_id] / np.linalg.norm(self.x, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind

    def recommend(self, user_id, user_items, N=10):
        unseen = [False if i in user_items[user_id].nonzero()[1] else True for i in np.arange(len(self.y))]
        scores = self.y[unseen] @ self.x[user_id]
        ind = np.argsort(scores)[::-1]
        real_ind = np.arange(len(self.y))[unseen][ind]
        return real_ind[:N]

In [None]:
model = SGDALS(user_count, movie_count)

In [None]:
model.run_train(implicit_data, epochs=100, lr=1e-3, l=1e-3)

HBox(children=(FloatProgress(value=0.0), HTML(value='')))




In [None]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '584    Aladdin (1992)',
 '1179    Princess Bride, The (1987)',
 '1245    Groundhog Day (1993)',
 '3682    Chicken Run (2000)',
 '1250    Back to the Future (1985)',
 '33    Babe (1995)',
 '3106    Galaxy Quest (1999)']

In [None]:
get_recommendations(4, model, user_item_explicit)

['585    Terminator 2: Judgment Day (1991)',
 '2502    Matrix, The (1999)',
 '1182    Aliens (1986)',
 '847    Godfather, The (1972)',
 '2847    Total Recall (1990)',
 '537    Blade Runner (1982)',
 '453    Fugitive, The (1993)',
 '2693    Sixth Sense, The (1999)',
 '1539    Men in Black (1997)',
 '108    Braveheart (1995)']

In [None]:
get_recommendations(4, model, user_item_explicit, "category")

['585    Action|Sci-Fi|Thriller',
 '2502    Action|Sci-Fi|Thriller',
 '1182    Action|Sci-Fi|Thriller|War',
 '847    Action|Crime|Drama',
 '2847    Action|Adventure|Sci-Fi|Thriller',
 '537    Film-Noir|Sci-Fi',
 '453    Action|Thriller',
 '2693    Thriller',
 '1539    Action|Adventure|Comedy|Sci-Fi',
 '108    Action|Drama|War']

### Задание 3. Не использую готовые решения, реализовать матричное разложение BPR на implicit данных

* добавил item bias

In [19]:
class BPRData:
    def __init__(self, dataset):
        self.pos = dataset[:,:2]
        self.neg = None
        self.neg_dict = {}
        for i in tqdm(range(user_count)):
            seen = set(self.pos[self.pos[:,0] == i][:,1])
            unseen = set(np.arange(movie_count)) - seen
            unseen = np.array(list(unseen), dtype=int)
            self.neg_dict[i] = unseen
        

    def set_neg(self, mode="lazy"):
        if mode == "correct":
            self.neg = self.pos.copy()
            new_negs = None
            for i in range(user_count):
                cur_pos = self.pos[:,0] == i
                len_cur = (self.pos[:,0] == i).sum()
                cur_negs = np.random.choice(self.neg_dict[i], len_cur)
                if new_negs is None:
                    new_negs = cur_negs
                else:
                    new_negs = np.hstack((new_negs, cur_negs))
            self.neg[:,1] = new_negs
        elif mode == "lazy":
            self.neg = self.pos.copy()
            self.neg[:,1] = np.random.choice(np.arange(movie_count), len(self.pos))
    def __len__(self):
        return len(self.pos)
    
    def __getitem__(self, i):
        pos_ex = self.pos[i]
        neg_ex = self.neg[i]
        # negs = self.neg[self.neg[:,0]==pos_ex[0]]
        # neg_ex = self.neg[np.random.randint(len(self.neg))]

        return [*pos_ex, neg_ex[1]]

In [20]:
traindata = BPRData(implicit_data)

HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




In [None]:
traindata.set_neg()

In [22]:
class BPR():
    def __init__(self, user_count, item_count, k=64):
        self.w = np.random.uniform(0, 1 / np.sqrt(k), size=(user_count, k-1))
        self.w = np.hstack((self.w, np.ones(user_count).reshape(-1, 1)))
        self.h = np.random.uniform(0, 1 / np.sqrt(k), size=(item_count, k))
        self.use_bias = True

    def run_train(self, train_data, 
                  num_epochs=100, 
                  lr=1e-3, 
                  lam=(1e-3, 1e-3, 1e-3), 
                  neg_update="correct",
                  verbose=False):

     
        
        for epoch in range(num_epochs):
            running_loss = 0
            train_data.set_neg(neg_update)
            dataloader = torch.utils.data.DataLoader(train_data, 
                                                    batch_size=1, 
                                                    shuffle=True)
            cnt = 0
            for (u, i, j) in tqdm(dataloader):
                cnt += 1
                user_emb = self.w[u]
                item_emb1 = self.h[i]
                item_emb2 = self.h[j]
                
                x_ui = user_emb @ item_emb1
                x_uj = user_emb @ item_emb2
                x_uij = x_ui - x_uj
                norm = np.exp(-x_uij) / (1 + np.exp(-x_uij))

                # update w
                self.w[u,:-1] = self.w[u, :-1] + \
                                lr * (norm * (self.h[i, :-1] - self.h[j, :-1]) \
                                                    - lam[0] * self.w[u, :-1])
                
                # self.w[u] = self.w[u] + \
                #                 lr * (norm * (self.h[i] - self.h[j]) \
                #                                     + lam[0] * self.w[u])

                # update h

                self.h[i] = self.h[i] + lr * (norm *  self.w[u] - lam[1] * self.h[i])
                self.h[j] = self.h[j] + lr * (norm * -self.w[u] - lam[2] * self.h[j])


                running_loss += -np.log(1 + np.exp(x_uij)) - \
                                lam[0] * self.w[u] @ self.w[u] - \
                                lam[1] * self.h[i] @ self.h[i] - \
                                lam[2] * self.h[j] @ self.h[j]
            

            print(f"[{epoch:3}/{num_epochs}]: loss={running_loss:.4f}")

    def similar_items(self, item_id, N=10):
        if self.use_bias:
            h = self.h
        else:
            h = self.h[:,:-1]
        scores = h @ h[item_id] / np.linalg.norm(h, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind


    def recommend(self, user_id, user_items, N=10):
        unseen = [i  for i in np.arange(movie_count) if i not in user_items[user_id].nonzero()[1]]
        scores = self.h[unseen] @ self.w[user_id]
        ind = np.argsort(scores)[::-1]
        real_ind = np.arange(len(self.h))[unseen][ind]
        return real_ind[:N]

In [34]:
model = BPR(user_count, movie_count, k=64)

In [None]:
model.run_train(traindata, num_epochs=10, verbose=True, lam=(1e-3, 1e-4, 1e-4), lr=1e-2, neg_update="correct")

In [29]:
get_recommendations(4, model, user_item_implicit)

['2789    American Beauty (1999)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '2693    Sixth Sense, The (1999)',
 '585    Terminator 2: Judgment Day (1991)',
 '589    Silence of the Lambs, The (1991)',
 "523    Schindler's List (1993)",
 '604    Fargo (1996)',
 '2502    Matrix, The (1999)',
 '108    Braveheart (1995)',
 '1192    Star Wars: Episode VI - Return of the Jedi (1983)']

In [30]:
get_recommendations(4, model, user_item_implicit, "category")

['2789    Comedy|Drama',
 '1178    Action|Adventure|Drama|Sci-Fi|War',
 '2693    Thriller',
 '585    Action|Sci-Fi|Thriller',
 '589    Drama|Thriller',
 '523    Drama|War',
 '604    Crime|Drama|Thriller',
 '2502    Action|Sci-Fi|Thriller',
 '108    Action|Drama|War',
 '1192    Action|Adventure|Romance|Sci-Fi|War']

In [31]:
get_similars(1, model)

['0    Toy Story (1995)',
 '1179    Princess Bride, The (1987)',
 '1180    Raiders of the Lost Ark (1981)',
 '1081    E.T. the Extra-Terrestrial (1982)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '1575    L.A. Confidential (1997)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '2647    Ghostbusters (1984)',
 '1271    Indiana Jones and the Last Crusade (1989)',
 '315    Shawshank Redemption, The (1994)']

### Задание 4. Не использую готовые решения, реализовать матричное разложение WARP на implicit данных

* добавил item_bias

In [None]:
class WARPData:
    def __init__(self, dataset):
        self.pos = dataset[:,:2]
        self.neg = None
        self.neg_dict = {}
        for i in tqdm(range(user_count)):
            seen = set(self.pos[self.pos[:,0] == i][:,1])
            unseen = set(np.arange(movie_count)) - seen
            unseen = np.array(list(unseen), dtype=int)
            self.neg_dict[i] = unseen
        
    def __len__(self):
        return len(self.pos)
    
    def __getitem__(self, i):
        pos_ex = self.pos[i]

        # negs = self.neg[self.neg[:,0]==pos_ex[0]]
        # neg_ex = self.neg[np.random.randint(len(self.neg))]

        return pos_ex, self.neg_dict[pos_ex[0]]

In [None]:
traindata = WARPData(implicit_data)

HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




In [None]:
class WARP():
    def __init__(self, user_count, item_count, k=64):
        self.w = np.random.uniform(0, 1 / np.sqrt(k), size=(user_count, k))
        self.w = np.hstack((self.w, np.ones(user_count).reshape(-1, 1)))

        self.h = np.random.uniform(0, 1 / np.sqrt(k), size=(item_count, k + 1))


    def run_train(self, traindata, 
                  num_epochs=10, 
                  lr=1e-3, 
                  verbose=False):

    
        for epoch in range(num_epochs):
            running_loss = 0
            # traindata.set_neg()
            dataloader = torch.utils.data.DataLoader(traindata, 
                                                    batch_size=1, 
                                                    shuffle=True)

            for (pos, negs) in tqdm(dataloader):

                u, i = pos[0]
                negs = negs[0]
                
                user_emb = self.w[u]
                item_emb1 = self.h[i]
                
                pred_pos = self.w[u] @ self.h[i]

                preds = self.h @ user_emb

                cnt = 0
                for j in np.random.permutation(negs):
                    cnt += 1
                    if preds[j] + 1 > pred_pos:
                        break

                # loss = np.log(len(negs) / cnt) * (pred_neg + 1 - pred_pos)

                if preds[j] + 1 > pred_pos:
                    running_loss += np.log(len(negs) / cnt) * (preds[j] + 1 - pred_pos)


                    self.w[u, :-1] = self.w[u, :-1] - lr * np.log(len(negs) / cnt) * (self.h[j, :-1] - self.h[i, :-1])
                    self.h[i] = self.h[i] + lr * np.log(len(negs) / cnt) * (self.w[u])
                    self.h[j] = self.h[j] - lr * np.log(len(negs) / cnt) * (self.w[u])
            

            print(f"[{epoch:3}/{num_epochs}]: loss={running_loss:.4f}")
                

    def similar_items(self, item_id, N=10):
        scores = self.h @ self.h[item_id] / np.linalg.norm(self.h, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind


    def recommend(self, user_id, user_items, N=10):
        unseen = [i  for i in np.arange(movie_count) if i not in user_items[user_id].nonzero()[1]]
        scores = self.h[unseen] @ self.w[user_id]
        ind = np.argsort(scores)[::-1]
        real_ind = np.arange(len(self.h))[unseen][ind]
        return real_ind[:N]

In [None]:
model = WARP(user_count, movie_count, 32)

In [None]:
model.run_train(traindata, lr=1e-3, num_epochs=10)

HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  0/10]: loss=2983549.2839


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  1/10]: loss=3384450.0537


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  2/10]: loss=3534069.1442


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  3/10]: loss=3481499.7579


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  4/10]: loss=3251237.5683


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  5/10]: loss=3053626.6388


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  6/10]: loss=2949816.3272


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  7/10]: loss=2898306.4947


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  8/10]: loss=2831499.6695


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  9/10]: loss=2768970.5235


In [None]:
get_recommendations(4, model, user_item_implicit)

['847    Godfather, The (1972)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '585    Terminator 2: Judgment Day (1991)',
 '1203    Godfather: Part II, The (1974)',
 '2502    Matrix, The (1999)',
 '1284    Butch Cassidy and the Sundance Kid (1969)',
 '108    Braveheart (1995)',
 '453    Fugitive, The (1993)',
 '1182    Aliens (1986)',
 '1192    Star Wars: Episode VI - Return of the Jedi (1983)']

In [None]:
get_recommendations(4, model, user_item_implicit, "category")

['847    Action|Crime|Drama',
 '1178    Action|Adventure|Drama|Sci-Fi|War',
 '585    Action|Sci-Fi|Thriller',
 '1203    Action|Crime|Drama',
 '2502    Action|Sci-Fi|Thriller',
 '1284    Action|Comedy|Western',
 '108    Action|Drama|War',
 '453    Action|Thriller',
 '1182    Action|Sci-Fi|Thriller|War',
 '1192    Action|Adventure|Romance|Sci-Fi|War']

In [None]:
get_similars(1, model)

['0    Toy Story (1995)',
 '584    Aladdin (1992)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '2225    Antz (1998)',
 '2315    Babe: Pig in the City (1998)',
 '33    Babe (1995)',
 '2618    Tarzan (1999)',
 '591    Beauty and the Beast (1991)',
 '1526    Hercules (1997)']