### Матричные факторизации

В данной работе вам предстоит познакомиться с практической стороной матричных разложений.
Работа поделена на 4 задания:
1. Вам необходимо реализовать SVD разложения используя SGD на explicit данных
2. Вам необходимо реализовать матричное разложения используя ALS на implicit данных
3. Вам необходимо реализовать матричное разложения используя BPR(pair-wise loss) на implicit данных
4. Вам необходимо реализовать матричное разложения используя WARP(list-wise loss) на implicit данных

Мягкий дедлайн 28 Сентября (пишутся замечания, выставляется оценка, есть возможность исправить до жесткого дедлайна)

Жесткий дедлайн 5 Октября (Итоговая проверка)

In [None]:
!pip install implicit

!pip install lightfm

Collecting implicit
[?25l  Downloading https://files.pythonhosted.org/packages/bc/07/c0121884722d16e2c5beeb815f6b84b41cbf22e738e4075f1475be2791bc/implicit-0.4.4.tar.gz (1.1MB)
[K     |████████████████████████████████| 1.1MB 3.9MB/s 
Building wheels for collected packages: implicit
  Building wheel for implicit (setup.py) ... [?25l[?25hdone
  Created wheel for implicit: filename=implicit-0.4.4-cp36-cp36m-linux_x86_64.whl size=3419479 sha256=1bbfedab3db312e7894c56923dbad1144342aadb3172f755034817f88b1d7ebc
  Stored in directory: /root/.cache/pip/wheels/bf/d4/ec/fd4f622fcbefb7521f149905295b2c26adecb23af38aa28217
Successfully built implicit
Installing collected packages: implicit
Successfully installed implicit-0.4.4
Collecting lightfm
[?25l  Downloading https://files.pythonhosted.org/packages/e9/8e/5485ac5a8616abe1c673d1e033e2f232b4319ab95424b42499fabff2257f/lightfm-1.15.tar.gz (302kB)
[K     |████████████████████████████████| 307kB 4.5MB/s 
Building wheels for collected packages: li

In [None]:
import implicit
import pandas as pd
import numpy as np
import scipy.sparse as sp
from tqdm.notebook import tqdm

from lightfm.datasets import fetch_movielens

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn.functional as F
import torchvision.models as models
import torchvision

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

В данной работе мы будем работать с explicit датасетом movieLens, в котором представленны пары user_id movie_id и rating выставленный пользователем фильму

Скачать датасет можно по ссылке https://grouplens.org/datasets/movielens/1m/

In [None]:
!wget http://files.grouplens.org/datasets/movielens/ml-1m.zip
!unzip ml-1m.zip

--2020-09-26 09:44:19--  http://files.grouplens.org/datasets/movielens/ml-1m.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5917549 (5.6M) [application/zip]
Saving to: ‘ml-1m.zip’


2020-09-26 09:44:20 (14.6 MB/s) - ‘ml-1m.zip’ saved [5917549/5917549]

Archive:  ml-1m.zip
   creating: ml-1m/
  inflating: ml-1m/movies.dat        
  inflating: ml-1m/ratings.dat       
  inflating: ml-1m/README            
  inflating: ml-1m/users.dat         


In [None]:
ratings = pd.read_csv('ml-1m/ratings.dat', delimiter='::', header=None, 
        names=['user_id', 'movie_id', 'rating', 'timestamp'], 
        usecols=['user_id', 'movie_id', 'rating'], engine='python')

In [None]:
movie_info = pd.read_csv('ml-1m/movies.dat', delimiter='::', header=None, 
        names=['movie_id', 'name', 'category'], engine='python')

Теперь ваша очередь реализовать самые популярные алгоритмы матричных разложений

Что будет оцениваться:
1. Корректность алгоритма
2. Качество получившихся симиларов
3. Качество итоговых рекомендаций для юзера

# preprocess

In [None]:
users = ratings["user_id"]
movies = ratings["movie_id"]


меняем индексацию, чтобы не хранить несуществующих пользователей и фильмы

In [None]:
uid2userid = np.unique(users)
userid2uid = {i: j for (j, i) in enumerate(uid2userid)}
new_users = [userid2uid[i] for i in users]

In [None]:
mid2movieid = np.unique(movies)
movieid2mid = {i: j for (j, i) in enumerate(mid2movieid)}
new_movies = [movieid2mid[i] for i in movies]

сделал explicit матрицу

In [None]:
explicit = np.vstack((new_users, new_movies, ratings["rating"].to_numpy())).T

In [None]:
user_item = sp.coo_matrix((ratings["rating"], (new_users, new_movies)))
user_item_explicit = user_item.tocsr()

In [None]:
user_count, movie_count = len(uid2userid), len(mid2movieid)

делаю implicit матрицу

In [None]:
implicit_data = explicit[explicit[:,2] >= 4]

In [None]:
user_item = sp.coo_matrix((np.ones_like(implicit_data[:,0]), (implicit_data[:,0], implicit_data[:,1])))
user_item_implicit = user_item.tocsr()

поменял лямбды

In [None]:
get_similars = lambda item_id, model : [movie_info[movie_info["movie_id"] == mid2movieid[i]]["name"].to_string()
                                        for i in model.similar_items(movieid2mid[item_id])]

In [None]:
get_recommendations = lambda user_id, model, mat : [movie_info[movie_info["movie_id"] == mid2movieid[i]]["name"].to_string() 
                                                    for i in model.recommend(userid2uid[user_id], mat)]

In [None]:
get_user_history = lambda user_id, implicit_ratings : [movie_info[movie_info["movie_id"] == mid2movieid[x]]["name"].to_string() 
                                            for x in implicit_ratings[implicit_ratings[:,0] == user_id][:,1]]

### Задание 1. Не использую готовые решения, реализовать SVD разложение используя SGD на explicit данных

In [None]:
class SVD:
    def __init__(self, user_count, movie_count, k=128):
       self.u = np.random.normal(0, 1 / np.sqrt(k), size=(user_count, k))
       self.v = np.random.normal(0, 1 / np.sqrt(k), size=(movie_count, k))
       self.bu = np.random.normal(0, 1 / np.sqrt(k), size=user_count)
       self.bv = np.random.normal(0, 1 / np.sqrt(k), size=movie_count)


    def run_train(self, X_train, 
                  l=0.01, 
                  g=0.01, 
                  lr=1e-3, 
                  epochs=10):

        for i in range(epochs):
            running_loss = 0.0
            for (uid, mid, rating) in tqdm(np.random.permutation(X_train)):
                # uid -= 1
                # mid -= 1
                loss = self.u[uid] @ self.v[mid] - rating + self.bu[uid] + self.bu[mid]

                running_loss += loss ** 2
                self.u[uid] = self.u[uid] - lr * (loss * self.v[mid] + l * self.u[uid])
                self.v[mid] = self.v[mid] - lr * (loss * self.u[uid] + l * self.v[mid])
                self.bu[uid] = self.bu[uid] - lr * (loss + g * self.bu[uid])
                self.bv[mid] = self.bv[mid] - lr * (loss + g * self.bv[mid])
            print(f"[{i}/{epochs}] loss={running_loss:.4f}")
    
    def similar_items(self, item_id, N=10):
        scores = self.v @ self.v[item_id] / np.linalg.norm(self.v, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind

    def recommend(self, user_id, user_items, N=10):
        unseen = [False if i in user_items[user_id].nonzero()[1] else True for i in np.arange(len(self.v))]
        scores = self.v[unseen] @ self.u[user_id] + self.bv[unseen]
        ind = np.argsort(scores)[::-1]
        return ind[:N]

In [None]:
model = SVD(user_count, movie_count, k=64)

In [None]:
model.run_train(explicit, epochs=10, lr=1e-2, l=0.05)

HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[0/10] loss=2953657.7355


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[1/10] loss=898459.8327


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[2/10] loss=798855.8245


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[3/10] loss=756297.8022


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[4/10] loss=727924.1890


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[5/10] loss=705112.9340


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[6/10] loss=685770.3490


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[7/10] loss=667976.3929


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[8/10] loss=651801.7141


HBox(children=(FloatProgress(value=0.0, max=1000209.0), HTML(value='')))


[9/10] loss=637007.2287


In [None]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 '584    Aladdin (1992)',
 '2009    Jungle Book, The (1967)',
 '360    Lion King, The (1994)',
 '1250    Back to the Future (1985)',
 '3682    Chicken Run (2000)',
 '2618    Tarzan (1999)',
 '33    Babe (1995)',
 '907    Wizard of Oz, The (1939)']

In [None]:
get_recommendations(4, model, user_item_explicit)

['2241    Mighty, The (1998)',
 '2833    Psycho II (1983)',
 '1220    Terminator, The (1984)',
 '2720    Damien: Omen II (1978)',
 '148    Apollo 13 (1995)',
 "890    Breakfast at Tiffany's (1961)",
 '49    Usual Suspects, The (1995)',
 '1248    Pump Up the Volume (1990)',
 '109    Taxi Driver (1976)',
 '898    Some Like It Hot (1959)']

### Задание 2. Не использую готовые решения, реализовать матричное разложение используя ALS на implicit данных

In [None]:
class ALS:
    def __init__(self, user_count, movie_count, k=32):
       self.x = np.random.normal(0, 1 / np.sqrt(k), size=(user_count, k))
       self.y = np.random.normal(0, 1 / np.sqrt(k), size=(movie_count, k))
       self.user_count = user_count
       self.k = k
       self.movie_count = movie_count

    def run_train(self, r, l=0.05, a=40, epochs=10):
        c = a * r
        for _ in range(epochs):
            # user step
            yy = self.y.T @ self.y
            for u in tqdm(range(self.user_count)):
                pu = c[u].toarray().squeeze()
                cu = sp.diags(pu)

                first_mult = np.linalg.inv(l * np.eye(self.k) + yy + self.y.T @ cu @ self.y)
                self.x[u] = first_mult @ self.y.T @ (cu + np.eye(self.movie_count)) @ pu

            # movie step
            xx = self.x.T @ self.x
            for i in tqdm(range(self.movie_count)): 
                pi = c[:,i].toarray().squeeze()
                ci = sp.diags(pi)
                
                first_mult = np.linalg.inv(l * np.eye(self.k) + xx + self.x.T @ ci @ self.x)
                self.y[i] = first_mult @ self.x.T @ (ci + np.eye(user_count)) @ pi
    
    def similar_items(self, item_id, N=10):
        scores = self.y @ self.y[item_id] / np.linalg.norm(self.y, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind

    def similar_users(self, user_id, N=10):
        scores = self.x @ self.x[user_id] / np.linalg.norm(self.x, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind

    def recommend(self, user_id, user_items, N=10):
        unseen = self.y[[False if i in user_items[user_id].nonzero()[1] else True for i in np.arange(len(self.y))]]
        scores = unseen @ self.x[user_id]
        ind = np.argsort(scores)[::-1]
        return ind[:N]

In [None]:
model = ALS(user_count, movie_count)

In [None]:
model.run_train(user_item_implicit)

HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




HBox(children=(FloatProgress(value=0.0, max=3706.0), HTML(value='')))




In [None]:
get_similars(1, model)

['0    Toy Story (1995)',
 '33    Babe (1995)',
 '1245    Groundhog Day (1993)',
 '3045    Toy Story 2 (1999)',
 '352    Forrest Gump (1994)',
 '2327    Shakespeare in Love (1998)',
 "2286    Bug's Life, A (1998)",
 '315    Shawshank Redemption, The (1994)',
 '1250    Back to the Future (1985)',
 "523    Schindler's List (1993)"]

In [None]:
get_recommendations(4, model, user_item_explicit)

["1176    One Flew Over the Cuckoo's Nest (1975)",
 '842    Dingo (1992)',
 '1194    Third Man, The (1949)',
 '583    Ghost (1990)',
 '108    Braveheart (1995)',
 '2487    Telling You (1998)',
 '1174    Madonna: Truth or Dare (1991)',
 '3613    Magnum Force (1973)',
 '1261    Great Dictator, The (1940)',
 '2860    Reds (1981)']

### Задание 3. Не использую готовые решения, реализовать матричное разложение BPR на implicit данных

In [43]:
class BPRData:
    def __init__(self, dataset):
        self.pos = dataset[:,:2]
        self.neg = None
        self.neg_dict = {}
        for i in tqdm(range(user_count)):
            seen = set(self.pos[self.pos[:,0] == i][:,1])
            unseen = set(np.arange(movie_count)) - seen
            unseen = np.array(list(unseen), dtype=int)
            self.neg_dict[i] = unseen
        

    def set_neg(self, mode="correct"):
        if mode == "correct":
            self.neg = self.pos.copy()
            new_negs = None
            for i in range(user_count):
                cur_pos = self.pos[:,0] == i
                len_cur = (self.pos[:,0] == i).sum()
                cur_negs = np.random.choice(self.neg_dict[i], len_cur)
                if new_negs is None:
                    new_negs = cur_negs
                else:
                    new_negs = np.hstack((new_negs, cur_negs))
            self.neg[:,1] = new_negs
        elif mode == "lazy":
            self.neg = self.pos.copy()
            self.neg[:,1] = np.random.choice(np.arange(movie_count), len(self.pos))
    def __len__(self):
        return len(self.pos)
    
    def __getitem__(self, i):
        pos_ex = self.pos[i]
        neg_ex = self.neg[i]
        # negs = self.neg[self.neg[:,0]==pos_ex[0]]
        # neg_ex = self.neg[np.random.randint(len(self.neg))]

        return [*pos_ex, neg_ex[1]]

In [44]:
traindata = BPRData(implicit_data)

HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




In [45]:
traindata.set_neg()

In [49]:
class BPR():
    def __init__(self, user_count, item_count, k=64):
        self.w = np.random.normal(0, 1 / np.sqrt(k), size=(user_count, k))
        self.h = np.random.normal(0, 1 / np.sqrt(k), size=(item_count, k))

    def run_train(self, train_data, 
                  num_epochs=100, 
                  lr=1e-3, 
                  lam_w=1e-3,
                  lam_h=1e-3,
                  eps = 1e-5,
                  verbose=False):

     

        for epoch in range(num_epochs):
            running_loss = 0
            train_data.set_neg()
            dataloader = torch.utils.data.DataLoader(train_data, 
                                                    batch_size=1, 
                                                    shuffle=True)
            cnt = 0
            for (u, i, j) in tqdm(dataloader):
                cnt += 1
                user_emb = self.w[u]
                item_emb1 = self.h[i]
                item_emb2 = self.h[j]
                
                x_ui = user_emb @ item_emb1
                x_uj = user_emb @ item_emb2
                x_uij = x_ui - x_uj
                norm = np.exp(-x_uij) / (1 + np.exp(-x_uij))


                grad_w = np.zeros_like(self.w)
                grad_w[u] = self.h[i] - self.h[j]

                # update w
                self.w[u] = self.w[u] + lr * (norm * (self.h[i] - self.h[j])  - lam_w * self.w[u])

                # update h

                self.h[i] = self.h[i] + lr * (norm * self.w[u] - lam_h * self.h[i])
                self.h[j] = self.h[j] + lr * (norm * -self.w[u] - lam_h * self.h[j])


                running_loss += -np.log(1 + np.exp(x_uij)) - \
                                lam_w * self.w[u] @ self.w[u] - \
                                lam_h * self.h[i] @ self.h[i] - \
                                lam_h * self.h[j] @ self.h[j]
            

            print(f"[{epoch:3}/{num_epochs}]: loss={running_loss:.4f}")

    def similar_items(self, item_id, N=10):
        scores = self.h @ self.h[item_id] / np.linalg.norm(self.h, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind


    def recommend(self, user_id, user_items, N=10):
        unseen = [i  for i in np.arange(movie_count) if i not in user_items[user_id].nonzero()[1]]
        unseen = self.h[unseen]
        scores = unseen @ self.w[user_id]
        ind = np.argsort(scores)[::-1]
        return ind[:N]

In [50]:
model = BPR(user_count, movie_count, k=32)

In [51]:
model.run_train(traindata, num_epochs=20, verbose=True, lam_w = 1e-3, lam_h=1e-3, lr=1e-3)

HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  0/20]: loss=-405044.4354


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  1/20]: loss=-405258.3478


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  2/20]: loss=-405389.0363


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  3/20]: loss=-405680.6731


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  4/20]: loss=-405952.3243


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  5/20]: loss=-406315.2935


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  6/20]: loss=-406493.2180


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  7/20]: loss=-406796.5202


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  8/20]: loss=-407478.4012


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  9/20]: loss=-407804.0592


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 10/20]: loss=-408465.5110


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 11/20]: loss=-409343.6910


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 12/20]: loss=-410460.6371


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 13/20]: loss=-411747.0320


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 14/20]: loss=-413517.4368


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 15/20]: loss=-415554.2738


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 16/20]: loss=-418465.6477


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 17/20]: loss=-422118.1629


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 18/20]: loss=-426901.2456


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[ 19/20]: loss=-433151.1408


In [52]:
get_recommendations(4, model, user_item_implicit)

['3729    What Lies Beneath (2000)',
 '1189    To Kill a Mockingbird (1962)',
 '339    Baby-Sitters Club, The (1995)',
 '165    Feast of July (1995)',
 '3488    Jennifer 8 (1992)',
 '431    Coneheads (1993)',
 '2621    Ideal Husband, An (1999)',
 '1601    Playing God (1997)',
 '403    In the Mouth of Madness (1995)',
 '3199    Stop! Or My Mom Will Shoot (1992)']

In [53]:
get_similars(1, model)

['0    Toy Story (1995)',
 '2647    Ghostbusters (1984)',
 '1192    Star Wars: Episode VI - Return of the Jedi (1983)',
 '2105    Beetlejuice (1988)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '31    Twelve Monkeys (1995)',
 '2502    Matrix, The (1999)',
 '1656    Good Will Hunting (1997)',
 '293    Pulp Fiction (1994)',
 '1256    Cool Hand Luke (1967)']

### Задание 4. Не использую готовые решения, реализовать матричное разложение WARP на implicit данных

In [None]:
class WARPData:
    def __init__(self, dataset):
        self.pos = dataset[:,:2]
        self.neg = None
        self.neg_dict = {}
        for i in tqdm(range(user_count)):
            seen = set(self.pos[self.pos[:,0] == i][:,1])
            unseen = set(np.arange(movie_count)) - seen
            unseen = np.array(list(unseen), dtype=int)
            self.neg_dict[i] = unseen
        
    def __len__(self):
        return len(self.pos)
    
    def __getitem__(self, i):
        pos_ex = self.pos[i]

        # negs = self.neg[self.neg[:,0]==pos_ex[0]]
        # neg_ex = self.neg[np.random.randint(len(self.neg))]

        return pos_ex, self.neg_dict[pos_ex[0]]

In [None]:
traindata = WARPData(implicit_data)

HBox(children=(FloatProgress(value=0.0, max=6040.0), HTML(value='')))




In [None]:
class WARP():
    def __init__(self, user_count, item_count, k=64):
        self.w = np.random.normal(0, 1 / np.sqrt(k), size=(user_count, k))
        self.h = np.random.normal(0, 1 / np.sqrt(k), size=(item_count, k))

    def run_train(self, traindata, 
                  num_epochs=10, 
                  lr=1e-3, 
                  verbose=False):

    
        for epoch in range(num_epochs):
            running_loss = 0
            # traindata.set_neg()
            dataloader = torch.utils.data.DataLoader(traindata, 
                                                    batch_size=1, 
                                                    shuffle=True)

            for (pos, negs) in tqdm(dataloader):

                u, i = pos[0]
                negs = negs[0]
                
                user_emb = self.w[u]
                item_emb1 = self.h[i]
                
                pred_pos = self.w[u] @ self.h[i]

                preds = self.h @ user_emb

                cnt = 0
                for j in np.random.permutation(negs):
                    cnt += 1
                    if preds[j] + 1 > pred_pos:
                        break

                # loss = np.log(len(negs) / cnt) * (pred_neg + 1 - pred_pos)

                if preds[j] + 1 > pred_pos:
                    running_loss += np.log(len(negs) / cnt) * (preds[j] + 1 - pred_pos)


                    self.w[u] = self.w[u] - lr * np.log(len(negs) / cnt) * (self.h[j] - self.h[i])
                    self.h[i] = self.h[i] + lr * np.log(len(negs) / cnt) * (self.w[u])
                    self.h[j] = self.h[j] - lr * np.log(len(negs) / cnt) * (self.w[u])
            

            print(f"[{epoch:3}/{num_epochs}]: loss={running_loss:.4f}")
                

    def similar_items(self, item_id, N=10):
        scores = self.h @ self.h[item_id] / np.linalg.norm(self.h, axis=-1)
        ind = np.argsort(scores)[::-1][:N]
        return ind


    def recommend(self, user_id, user_items, N=10):
        unseen = [i  for i in np.arange(movie_count) if i not in user_items[user_id].nonzero()[1]]
        unseen = self.h[unseen]
        scores = unseen @ self.w[user_id]
        ind = np.argsort(scores)[::-1]
        return ind[:N]

In [None]:
model = WARP(user_count, movie_count, 32)

In [None]:
model.run_train(traindata, lr=1e-3, num_epochs=10)

HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  0/10]: loss=4661377.7957


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  1/10]: loss=3468229.3132


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  2/10]: loss=3113783.0293


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  3/10]: loss=3095099.1715


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  4/10]: loss=2953274.4227


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  5/10]: loss=2878517.8865


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  6/10]: loss=2832279.2795


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  7/10]: loss=2803753.4765


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  8/10]: loss=2788498.1511


HBox(children=(FloatProgress(value=0.0, max=575281.0), HTML(value='')))


[  9/10]: loss=2779502.5425


In [None]:
get_recommendations(4, model, user_item_implicit)

['1174    Madonna: Truth or Dare (1991)',
 '1186    Lawrence of Arabia (1962)',
 '1263    High Noon (1952)',
 '583    Ghost (1990)',
 '1177    Up in Smoke (1978)',
 '842    Dingo (1992)',
 '1619    Bean (1997)',
 '2559    Star Wars: Episode I - The Phantom Menace (1999)',
 '2114    Man Who Knew Too Much, The (1956)',
 '905    Little Princess, The (1939)']

In [None]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '584    Aladdin (1992)',
 '33    Babe (1995)',
 '2692    Iron Giant, The (1999)',
 '2252    Pleasantville (1998)',
 '2225    Antz (1998)',
 '3682    Chicken Run (2000)',
 '1838    Mulan (1998)']