### Матричные факторизации

В данной работе вам предстоит познакомиться с практической стороной матричных разложений.
Работа поделена на 4 задания:
1. Вам необходимо реализовать SVD разложения используя SGD на explicit данных
2. Вам необходимо реализовать матричное разложения используя ALS на implicit данных
3. Вам необходимо реализовать матричное разложения используя BPR(pair-wise loss) на implicit данных
4. Вам необходимо реализовать матричное разложения используя WARP(list-wise loss) на implicit данных


In [1]:

import implicit
import pandas as pd
import numpy as np
import scipy.sparse as sp

from lightfm.datasets import fetch_movielens

  "LightFM was compiled without OpenMP support. "


В данной работе мы будем работать с explicit датасетом movieLens, в котором представленны пары user_id movie_id и rating выставленный пользователем фильму

Скачать датасет можно по ссылке https://grouplens.org/datasets/movielens/1m/

In [2]:
ratings = pd.read_csv('ratings.dat', delimiter='::', header=None, 
        names=['user_id', 'movie_id', 'rating', 'timestamp'], 
        usecols=['user_id', 'movie_id', 'rating'], engine='python')
ratings

Unnamed: 0,user_id,movie_id,rating
0,1,1193,5
1,1,661,3
2,1,914,3
3,1,3408,4
4,1,2355,5
...,...,...,...
1000204,6040,1091,1
1000205,6040,1094,5
1000206,6040,562,5
1000207,6040,1096,4


In [3]:
movie_info = pd.read_csv('movies.dat', delimiter='::', header=None, 
        names=['movie_id', 'name', 'category'], engine='python')

Explicit данные

In [4]:
ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
0,1,1193,5
1,1,661,3
2,1,914,3
3,1,3408,4
4,1,2355,5
5,1,1197,3
6,1,1287,5
7,1,2804,5
8,1,594,4
9,1,919,4


Для того, чтобы преобразовать текущий датасет в Implicit, давайте считать что позитивная оценка это оценка >=4

In [5]:
implicit_ratings = ratings.loc[(ratings['rating'] >= 4)]

In [6]:
implicit_ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
0,1,1193,5
3,1,3408,4
4,1,2355,5
6,1,1287,5
7,1,2804,5
8,1,594,4
9,1,919,4
10,1,595,5
11,1,938,4
12,1,2398,4


Удобнее работать с sparse матричками, давайте преобразуем DataFrame в CSR матрицы

In [7]:
users = implicit_ratings["user_id"]
movies = implicit_ratings["movie_id"]
user_item = sp.coo_matrix((np.ones_like(users), (users, movies)))
user_item_t_csr = user_item.T.tocsr()
user_item_csr = user_item.tocsr()

В качестве примера воспользуемся ALS разложением из библиотеки implicit

Зададим размерность латентного пространства равным 64, это же определяет размер user/item эмбедингов

In [9]:
model = implicit.als.AlternatingLeastSquares(factors=64, iterations=100, calculate_training_loss=True)



В качестве loss здесь всеми любимый RMSE

In [10]:
model.fit(user_item_t_csr)

HBox(children=(FloatProgress(value=0.0), HTML(value='')))




Построим похожие фильмы по 1 movie_id = Истории игрушек

In [11]:
movie_info.head(5)

Unnamed: 0,movie_id,name,category
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


In [12]:
get_similars = lambda item_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                        for x in model.similar_items(item_id)]

Как мы видим, симилары действительно оказались симиларами.

Качество симиларов часто является хорошим способом проверить качество алгоритмов.

P.S. Если хочется поглубже разобраться в том как разные алгоритмы формируют разные латентные пространства, рекомендую загружать полученные вектора в tensorBoard и смотреть на сформированное пространство

In [13]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '33    Babe (1995)',
 '584    Aladdin (1992)',
 '2315    Babe: Pig in the City (1998)',
 '360    Lion King, The (1994)',
 '1838    Mulan (1998)',
 '2618    Tarzan (1999)',
 '1526    Hercules (1997)']

Давайте теперь построим рекомендации для юзеров

Как мы видим юзеру нравится фантастика, значит и в рекомендациях ожидаем увидеть фантастику

In [8]:
get_user_history = lambda user_id, implicit_ratings : [movie_info[movie_info["movie_id"] == x]["name"].to_string() 
                                            for x in implicit_ratings[implicit_ratings["user_id"] == user_id]["movie_id"]]

In [15]:
get_user_history(4, implicit_ratings)

['3399    Hustler, The (1961)',
 '2882    Fistful of Dollars, A (1964)',
 '1196    Alien (1979)',
 '1023    Die Hard (1988)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '1959    Saving Private Ryan (1998)',
 '476    Jurassic Park (1993)',
 '1180    Raiders of the Lost Ark (1981)',
 '1885    Rocky (1976)',
 '1081    E.T. the Extra-Terrestrial (1982)',
 '3349    Thelma & Louise (1991)',
 '3633    Mad Max (1979)',
 '2297    King Kong (1933)',
 '1366    Jaws (1975)',
 '1183    Good, The Bad and The Ugly, The (1966)',
 '2623    Run Lola Run (Lola rennt) (1998)',
 '2878    Goldfinger (1964)',
 '1220    Terminator, The (1984)']

Получилось! 

Мы действительно порекомендовали пользователю фантастику и боевики, более того встречаются продолжения тех фильмов, которые он высоко оценил

In [16]:
get_recommendations = lambda user_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                               for x in model.recommend(user_id, user_item_csr)]

In [17]:
get_recommendations(4, model)

['585    Terminator 2: Judgment Day (1991)',
 '1271    Indiana Jones and the Last Crusade (1989)',
 '2502    Matrix, The (1999)',
 '1182    Aliens (1986)',
 '1284    Butch Cassidy and the Sundance Kid (1969)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '847    Godfather, The (1972)',
 '3402    Close Encounters of the Third Kind (1977)',
 '1892    Rain Man (1988)',
 '453    Fugitive, The (1993)']

Теперь ваша очередь реализовать самые популярные алгоритмы матричных разложений

Что будет оцениваться:
1. Корректность алгоритма
2. Качество получившихся симиларов
3. Качество итоговых рекомендаций для юзера

Сделаем сначала с рейтингом тоже самое что и с implicit данными (для rmse)

In [9]:
users = ratings["user_id"]
movies = ratings["movie_id"]
user_item_exp = sp.coo_matrix((ratings["rating"], (users, movies)))

Задание состоит в разделении матрицы на две - юзеры и фильмы, чтобы рекомендовать и находить похожие, мне удобно будет сделать класс, от которого 4 раза можно отнаследоваться

Очевидно нам понадобится rmse и 2 предсказателя

In [10]:
from tqdm import tqdm
from sklearn import metrics
from sklearn.metrics.pairwise import cosine_similarity
import math
class Base:
    def __init__(self, matrix, ratings, k, amount_of_users, amount_of_movies, start):
        self.ratings = ratings
        self.amount_of_users = amount_of_users
        self.amount_of_movies = amount_of_movies
        self.matrix = matrix
        self.k = k
        self.U = np.random.uniform(0, start / math.sqrt(k), (amount_of_users, k))
        self.M = np.random.uniform(0, start / math.sqrt(k), (k, amount_of_movies))
        self.B_u = np.random.uniform(0, start / math.sqrt(k), (amount_of_users, 1))
        self.B_m = np.random.uniform(0, start / math.sqrt(k), (1, amount_of_movies))
        self.mu = ratings.rating.mean() 
    
    def rmse(self):
        ans = self.ratings
        matr2 = (self.U @ self.M)[ans["user_id"], ans["movie_id"]]
        return metrics.mean_squared_error(self.ratings["rating"], matr2, squared=True)
    def make_recomendation(self, u_id):
        print('User films')
        print()
        u_films = get_user_history(u_id, ratings)[:10]
        print(*u_films, sep='\n')
        print()
        print('Recomendation')
        print()
        prediction = self.U @ self.M
        rating = np.argsort(prediction[u_id])[::-1]
        return rating
        
    
    def make_recomendation(self, u_id):
        print('User films')
        print()
        u_films = get_user_history(u_id, self.ratings)[:10]
        print(*u_films, sep='\n')
        print()
        print('Recomendation')
        print()
        prediction = self.U @ self.M
        rating = np.argsort(prediction[u_id])[::-1]
        for i in rating[:10]:
            print(*movie_info[movie_info['movie_id'] == i][['name', 'category']].values)
        return

    def find_similar1(self, m_id):
        print('Film', movie_info[movie_info['movie_id'] == m_id][['name', 'category']].to_string())
        similarity = np.dot(self.M[:,m_id], self.M) / np.sqrt((self.M * self.M).sum(axis=0)) / np.sqrt((self.M[:,m_id] ** 2).sum())
        recomend = np.argsort(similarity)[::-1]
        for i in recomend[:10]:
            print(movie_info[movie_info['movie_id'] == i][['name', 'category']].values)
        return
        
    def find_similar2(self, m_id):
        movies = self.M.T
        print('Film', movie_info[movie_info['movie_id'] == m_id][['name', 'category']].to_string())
        matr1 = movies[m_id]
        matr2 = movies
        similarity = cosine_similarity(np.expand_dims(matr1, axis=0), self.M.T)[0]
        similarity = np.argsort(similarity)[::-1]
        #similarity = np.dot(self.M[:,m_id], self.M) / np.sqrt((self.M * self.M).sum(axis=0)) / np.sqrt((self.M[:,m_id] ** 2).sum())
        for i in similarity[:10]:
            print(movie_info[movie_info['movie_id'] == i][['name', 'category']].values)
        return

### Задание 1. Не использую готовые решения, реализовать SVD разложение используя SGD на explicit данных

Делаем все по презентации, добавляя Баеса

In [11]:
class SVD(Base):
    def __init__(self, matrix, ratings, k, amount_of_users, amount_of_movies, start):
        super().__init__(matrix, ratings, k, amount_of_users, amount_of_movies, start)
    def fit(self, ops, learning_rate = 0.01, regular = 0.001):
        inds = [i for i in range(len(self.ratings))]
        np.random.shuffle(inds)        
        for i in range(ops):
            ind = inds[i % len(self.ratings)]
            u_id = self.ratings.loc[ind]['user_id']
            m_id = self.ratings.loc[ind]['movie_id']
            rat = self.ratings.loc[ind]['rating']
            error = self.U[u_id] @ self.M[:,m_id] - rat #+ self.B_u[u_id] + self.B_m[:,m_id]
            self.U[u_id], self.M[:,m_id] = self.U[u_id] - learning_rate * (error * (self.M[:,m_id]).T + regular * self.U[u_id]), self.M[:,m_id] - learning_rate * (error * (self.U[u_id]).T + regular * self.M[:,m_id]) 
            self.B_u[u_id] = self.B_u[u_id] - learning_rate * (error + regular * self.B_u[u_id])
            self.B_m[:,m_id] = self.B_m[:,m_id] - learning_rate * (error + regular * self.B_m[:, m_id])
            self.mu -= learning_rate * error
            if (i % 100000 == 99999):
                print(self.rmse())

In [302]:
model1 = SVD(user_item_exp, ratings, 64, max(np.unique(ratings['user_id'])) + 1, max(np.unique(ratings['movie_id'])) + 1, np.sqrt(5))
model1.fit(50000000)

1.4128963404981347
1.0868766707104816
0.9775460609909443
0.9275856565687423
0.8986398142055274
0.8792319339929933
0.8677322446597712
0.8581926130376459
0.8525012260632937
0.8450313521729542
0.8420397953520407
0.8394254121171557
0.8337618316382864
0.8318608394286806
0.8287541845753974
0.8251606270435216
0.823212611841964
0.8200079106385986
0.8187264198032734
0.8137080389456107
0.811870070872488
0.8097394786117421
0.8037800580142801
0.8013109202649075
0.797141218441623
0.792195974186747
0.7882977429941315
0.7834697484894592
0.7802587371175377
0.7732088675187098
0.7688988314179441
0.7643972575124304
0.75660567133915
0.7520774977256613
0.7461186045957932
0.7397728022053589
0.7346449551586416
0.7283286316091647
0.724337824020763
0.7164768617595103
0.7114943922240705
0.7062546983710112
0.6983822668254513
0.6937066968528035
0.6872495449007521
0.6811944827384131
0.6761587252714077
0.6697822122636249
0.6657229753560518
0.6580722641776798
0.6530984726276434
0.6480883585604486
0.6407504864614714


0.27590657307824673
0.27549608664716413
0.27484988230001944
0.27455551141440915
0.27551339000809266
0.2745040328625396
0.2742569531170587
0.27471187075772696
0.274430329558199
0.27388882810400966
0.27472197842972446
0.27431656550123823
0.2736024637351518
0.2733244805851374
0.274285475325391
0.273245881233942
0.273062765531401
0.27356348969000616
0.27330845185794167
0.2726369022780409
0.27359337923719956
0.27311907726424006
0.2724496625560224
0.2721785368647946
0.27320721560621475
0.2720970474419242
0.272006759162663
0.2724317137680978
0.2720489879124747
0.2715771486490969
0.2724613147060326
0.27195631376010704
0.2714028712499662
0.2711449123362482
0.27210494657275636
0.2709908733783047
0.2709452541138962
0.2713291590680737
0.2710056809190486
0.270563981431629
0.27134040775093765
0.2709767196036578
0.27035607410126766
0.2700676777406297
0.27100371288959896
0.26996385227378844
0.26991895893755713
0.2702983665846162
0.27001675092897953
0.2695265031621916
0.2703269640675227
0.2699560582098

Пора получать рекомендации и находить похожее

In [None]:
model1.fit(1000000)

In [304]:
model1.make_recomendation(4)

User films

3399    Hustler, The (1961)
1192    Star Wars: Episode VI - Return of the Jedi (1983)
2882    Fistful of Dollars, A (1964)
1196    Alien (1979)
1023    Die Hard (1988)
257    Star Wars: Episode IV - A New Hope (1977)
1959    Saving Private Ryan (1998)
476    Jurassic Park (1993)
1178    Star Wars: Episode V - The Empire Strikes Back...
1180    Raiders of the Lost Ark (1981)

Recomendation

['Gay Divorcee, The (1934)' 'Comedy|Musical|Romance']
['Double Indemnity (1944)' 'Crime|Film-Noir']
['Godfather, The (1972)' 'Action|Crime|Drama']
['Next Stop, Wonderland (1998)' 'Comedy|Drama|Romance']
['Citizen Kane (1941)' 'Drama']
['Lone Star (1996)' 'Drama|Mystery']
['Casablanca (1942)' 'Drama|Romance|War']
['Lord of the Flies (1963)' 'Adventure|Drama|Thriller']
['Life Is Beautiful (La Vita и bella) (1997)' 'Comedy|Drama']
["One Flew Over the Cuckoo's Nest (1975)" 'Drama']


In [305]:
model1.find_similar1(1)

Film                name                     category
0  Toy Story (1995)  Animation|Children's|Comedy
[['Toy Story (1995)' "Animation|Children's|Comedy"]]
[['Toy Story 2 (1999)' "Animation|Children's|Comedy"]]
[["Bug's Life, A (1998)" "Animation|Children's|Comedy"]]
[['Wrong Trousers, The (1993)' 'Animation|Comedy']]
[['Apollo 13 (1995)' 'Drama']]
[['Line King: Al Hirschfeld, The (1996)' 'Documentary']]
[]
[['Close Shave, A (1995)' 'Animation|Comedy|Thriller']]
[['Gate of Heavenly Peace, The (1995)' 'Documentary']]
[['Aladdin (1992)' "Animation|Children's|Comedy|Musical"]]


Почти все мультики)

In [306]:
model1.find_similar2(10)

Film                name                   category
9  GoldenEye (1995)  Action|Adventure|Thriller
[['GoldenEye (1995)' 'Action|Adventure|Thriller']]
[['Tomorrow Never Dies (1997)' 'Action|Romance|Thriller']]
[['World Is Not Enough, The (1999)' 'Action|Thriller']]
[['Man with the Golden Gun, The (1974)' 'Action']]
[['Goldfinger (1964)' 'Action']]
[['Spy Who Loved Me, The (1977)' 'Action']]
[['Live and Let Die (1973)' 'Action']]
[['For Your Eyes Only (1981)' 'Action']]
[['View to a Kill, A (1985)' 'Action']]
[['Dr. No (1962)' 'Action']]


Мега крутые похожие фильмы (несколько бондиан и индиана джонс)

### Задание 2. Не использую готовые решения, реализовать матричное разложение используя ALS на implicit данных

Используем статью http://yifanhu.net/PUB/cf.pdf

In [11]:
class ALS(Base):
    def __init__(self, matrix, ratings, k, amount_of_users, amount_of_movies, start):
        super().__init__(matrix, ratings, k, amount_of_users, amount_of_movies, start)
        self.M = self.M.T
        self.C = self.matrix.toarray() * 0.01
        self.C = self.C + 1
    def fit(self, ops):
        for i in range(ops):
            if (i % 2 == 0):
                self.M = model.step(self.U, self.C.T, self.matrix.T)
            else:
                self.U = model.step(self.M, self.C, self.matrix)
            print(self.my_rmse())
    def step(self, matr, C_to_use, rate, learning_rate = 0.01):
        ans = np.zeros((len(C_to_use), self.k))
        for i in range(len(C_to_use)):
            bracket = np.linalg.inv(matr.T @ np.diag(C_to_use[i]) @ matr + learning_rate * np.eye(matr.shape[1]))
            #print(bracket.shape)
            bracket = bracket @ matr.T @ np.diag(C_to_use[i])
            ans[i] = (bracket @ rate[i].T).T
        return ans
        
        
    def my_rmse(self):
        ans = ratings
        matr2 = ((self.U @ self.M.T))[ans["user_id"], ans["movie_id"]]
        matr1 = self.matrix[ans["user_id"], ans["movie_id"]]
        matr1 = np.squeeze(np.asarray(matr1))
        #print(matr1)
        #print(matr2)
        #print(np.squeeze(self.matrix[ans["user_id"], ans["movie_id"]]))
        return metrics.mean_squared_error(matr1, matr2, squared=True)

In [283]:
model2 = ALS(user_item_csr, implicit_ratings, 64, max(np.unique(ratings['user_id'])) + 1, max(np.unique(ratings['movie_id'])) + 1, 1)
model2.fit(20)

0.455701184373924
0.3189876222076632
0.276442639871987
0.26368995632453907
0.2593893417743478
0.2567223388694169
0.2555222963371137
0.25455406588747614
0.25406650825405225
0.25360578433482406
0.25337037458614653
0.253114626146948
0.252989130716757
0.2528317016296451
0.2527591642380387
0.25265454001304594
0.2526091785070115
0.25253542346448976
0.2525049139341488
0.2524504411880954


Пора делать рекомендации (транспонируем так как в Базе другая размерность)

In [284]:
model2.M = model2.M.T
model2.make_recomendation(4)
model2.M = model2.M.T

User films

3399    Hustler, The (1961)
2882    Fistful of Dollars, A (1964)
1196    Alien (1979)
1023    Die Hard (1988)
257    Star Wars: Episode IV - A New Hope (1977)
1959    Saving Private Ryan (1998)
476    Jurassic Park (1993)
1180    Raiders of the Lost Ark (1981)
1885    Rocky (1976)
1081    E.T. the Extra-Terrestrial (1982)

Recomendation

['Raiders of the Lost Ark (1981)' 'Action|Adventure']
['Star Wars: Episode IV - A New Hope (1977)'
 'Action|Adventure|Fantasy|Sci-Fi']
['Jaws (1975)' 'Action|Horror']
['E.T. the Extra-Terrestrial (1982)' "Children's|Drama|Fantasy|Sci-Fi"]
['Saving Private Ryan (1998)' 'Action|Drama|War']
['Terminator, The (1984)' 'Action|Sci-Fi|Thriller']
['Alien (1979)' 'Action|Horror|Sci-Fi|Thriller']
['Die Hard (1988)' 'Action|Thriller']
['Terminator 2: Judgment Day (1991)' 'Action|Sci-Fi|Thriller']
['Rocky (1976)' 'Action|Drama']


In [285]:
model2.M = model2.M.T
model2.find_similar2(1)
model2.M = model2.M.T

Film                name                     category
0  Toy Story (1995)  Animation|Children's|Comedy
[['Toy Story (1995)' "Animation|Children's|Comedy"]]
[['Toy Story 2 (1999)' "Animation|Children's|Comedy"]]
[['Babe (1995)' "Children's|Comedy|Drama"]]
[["Bug's Life, A (1998)" "Animation|Children's|Comedy"]]
[['Aladdin (1992)' "Animation|Children's|Comedy|Musical"]]
[['Babe: Pig in the City (1998)' "Children's|Comedy"]]
[['Pleasantville (1998)' 'Comedy']]
[['Tarzan (1999)' "Animation|Children's"]]
[['Lion King, The (1994)' "Animation|Children's|Musical"]]
[['Hercules (1997)' "Adventure|Animation|Children's|Comedy|Musical"]]


In [286]:
model2.M = model2.M.T
model2.find_similar2(10)
model2.M = model2.M.T

Film                name                   category
9  GoldenEye (1995)  Action|Adventure|Thriller
[['GoldenEye (1995)' 'Action|Adventure|Thriller']]
[['Tomorrow Never Dies (1997)' 'Action|Romance|Thriller']]
[['World Is Not Enough, The (1999)' 'Action|Thriller']]
[['Rush Hour (1998)' 'Action|Thriller']]
[['Die Hard: With a Vengeance (1995)' 'Action|Thriller']]
[['Licence to Kill (1989)' 'Action']]
[['View to a Kill, A (1985)' 'Action']]
[["Jackie Chan's First Strike (1996)" 'Action']]
[['Man with the Golden Gun, The (1974)' 'Action']]
[['For Your Eyes Only (1981)' 'Action']]


Получилось!!!

### Задание 3. Не использую готовые решения, реализовать матричное разложение BPR на implicit данных

https://arxiv.org/ftp/arxiv/papers/1205/1205.2618.pdf

In [33]:
class BPR(ALS):
    def __init__(self, matrix, ratings, k, amount_of_users, amount_of_movies, start):
        super().__init__(matrix, ratings, k, amount_of_users, amount_of_movies, start)
    def sigmoid(self, values):
        return 1 / (1 + np.exp(-values))
    def fit(self, ops, learning_rate=0.01):
        ans = ratings
        matr1 = self.matrix[ans["user_id"], ans["movie_id"]]
        loss = []
        for i in range(ops):
            for u_id in range(self.amount_of_users):
                all_rates = ratings[ratings["user_id"] == u_id]
                bad_rates = all_rates[all_rates["rating"] < 4].values
                good_rates = all_rates[all_rates["rating"] > 3].values
                if (bad_rates.shape[0] == 0 or good_rates.shape[0] == 0):
                    continue
                np.random.shuffle(good_rates)
                np.random.shuffle(bad_rates)
                bad_rates = bad_rates[:1][:, 1]
                good_rate = good_rates[:1][:, 1]
                bad_matrix = self.M[bad_rates]
                good_matrix = self.M[good_rate]
                values = self.U[u_id] @ good_matrix.T - self.U[u_id] @ bad_matrix.T
                loss.append(values)
                self.U[u_id] += ((self.sigmoid(values).sum() * (good_matrix * bad_matrix.shape[0] - bad_matrix.sum(axis=0)) - 0.001 * self.U[u_id]) * learning_rate)[0]
                if (self.U[u_id].mean() > 10 ** 9):
                    self.U[u_id] /= 2
                self.M[good_rate] += (self.sigmoid(values).sum() * self.U[u_id] - 0.001 * self.M[good_rate]) * learning_rate
                if (self.M[good_rate].mean() > 10 ** 9):
                    self.M[good_rate] /= 2
                self.M[good_rate] /= 2
                for x in range(bad_rates.shape[0]):
                    self.M[bad_rates[x]] -= (self.sigmoid(values)[x] * self.U[u_id] - 0.001 * self.M[bad_rates[x]]) * learning_rate
                    if (self.M[bad_rates[x]] > 10 ** 9):
                        self.M[bad_rates[x]] /= 2
            print(np.mean(loss), i)
            loss = []
                

In [34]:
model3 = BPR(user_item_csr, implicit_ratings, 64, max(np.unique(ratings['user_id'])) + 1, max(np.unique(ratings['movie_id'])) + 1, 1)
model3.fit(500)

0.0015879783616565208
0.006034267266995612
0.011813033911742135
0.016145112923516385
0.021981461503256132
0.026404372736667438
0.03176072092587162
0.03792123981507124
0.04164573733353335
0.04914388006165022
0.0533834489621373
0.05628904920992364
0.06451351838679249
0.07301575560056071
0.07938017175208208
0.08344784964683674
0.09229986923694508
0.09838649946485992
0.09853992213892192
0.11496978139625506
0.12083412961557831
0.13442829175160037
0.14083356328596236
0.14262828995651305
0.15940153745048186
0.16174803475259109
0.17519537296334778
0.19363678441956494
0.1872352592151962
0.2114226868656948
0.232095567778106
0.23465746627318293
0.2451942687377855
0.26613021700709677
0.2581670401725988
0.2847613159138688
0.2969934008940484
0.31728676793479776
0.33317937417378196
0.3487656652114631
0.3717089704165872
0.37916524852894007
0.4004006495339755
0.46639463691521765
0.45698721931633873
0.5104885905030322
0.521502479858264
0.5826181662220189
0.5978354775921789
0.5797037282878634
0.627315535

  """


41.75867724100831
42.2250751170989
42.90604062712375
47.524294401254075
51.423883335320376
54.10847812682187
57.430795615472675
63.16383217321423
68.41277964236775
67.05725795160834
77.22936642543841
76.27521104695248
87.21020115748539
93.1825244568102
97.44589605751472
103.81131260621478
113.05385517868474
118.7137634129622
130.89627652395635
135.143039692307
145.97479255976455
162.7647202649382
163.4203954343993
164.27026420670148
185.34359661225352
195.9992984253547
203.60013737583475
244.28750741978138
247.59645244889245
252.31757818887993
264.60024966504085
282.61453225077565
311.08103521922044
339.2814256420694
361.642570928692
375.8434615877609
394.22676099398575
417.1123401864368
465.39826701567404
486.4151521066657
529.4982126434761
504.9154571996952
602.3873479675794
608.9352679690976
650.1736075743739
736.3719619989515
744.5365882075869
830.2869296933128
895.2829171009666
904.6018542663551
990.9057737931445
1027.1699246614335
1124.3554597680059
1207.8082038192838
1220.976719

In [38]:
model3.fit(100)

  """


2334445334118.527
2521754459858.701
2833664462523.7466
2688193313539.568
2912714221570.138
3148779055133.677
3535411322427.869
3629984913735.8467
3955388816724.9014
4102956058614.396
4529926765816.821
4462990169968.186
5053022788251.837
5275781832735.949
5584060201836.762
6453924826088.626
6359593506642.1875
6801367799248.79
7561965215271.36
7690510204165.683
8294547596649.635
8593485689953.395
9287898151031.68
10670682439762.264
10344699895813.547
11790791143640.785
12369966195369.008
12697074152446.55
14325203335634.453
14227715315406.291
15337496408579.406
17485406143392.842
18965438055989.152
18835817474294.273
21792807277432.215
21578299055706.48
22083952244346.2
24705174240016.49
24477818648747.09
27112303147443.375
31122661995643.54
33950552254127.36
34128482703431.36
40036479660146.86
39757579598935.71
42149489889806.53
45032193035687.87
45750307285558.586
50834668920343.84
54021766810447.35
58257395993534.49
63239648787464.92
69309522663188.63
70672153834599.95
76617018141339.

In [39]:
model3.M = model3.M.T
model3.make_recomendation(4)
model3.M = model3.M.T

User films

3399    Hustler, The (1961)
2882    Fistful of Dollars, A (1964)
1196    Alien (1979)
1023    Die Hard (1988)
257    Star Wars: Episode IV - A New Hope (1977)
1959    Saving Private Ryan (1998)
476    Jurassic Park (1993)
1180    Raiders of the Lost Ark (1981)
1885    Rocky (1976)
1081    E.T. the Extra-Terrestrial (1982)

Recomendation

['American Beauty (1999)' 'Comedy|Drama']
['Star Wars: Episode IV - A New Hope (1977)'
 'Action|Adventure|Fantasy|Sci-Fi']
['Saving Private Ryan (1998)' 'Action|Drama|War']
['Star Wars: Episode V - The Empire Strikes Back (1980)'
 'Action|Adventure|Drama|Sci-Fi|War']
['Star Wars: Episode VI - Return of the Jedi (1983)'
 'Action|Adventure|Romance|Sci-Fi|War']
['Sixth Sense, The (1999)' 'Thriller']
['Silence of the Lambs, The (1991)' 'Drama|Thriller']
['Braveheart (1995)' 'Action|Drama|War']
['Being John Malkovich (1999)' 'Comedy']
["Schindler's List (1993)" 'Drama|War']


In [40]:
model3.M = model3.M.T
model3.find_similar2(1)
model3.M = model3.M.T

Film                name                     category
0  Toy Story (1995)  Animation|Children's|Comedy
[['Toy Story (1995)' "Animation|Children's|Comedy"]]
[['Deer Hunter, The (1978)' 'Drama|War']]
[['Dave (1993)' 'Comedy|Romance']]
[['My Fair Lady (1964)' 'Musical|Romance']]
[['Vertigo (1958)' 'Mystery|Thriller']]
[['Some Like It Hot (1959)' 'Comedy|Crime']]
[['Close Shave, A (1995)' 'Animation|Comedy|Thriller']]
[['Apollo 13 (1995)' 'Drama']]
[['Blazing Saddles (1974)' 'Comedy|Western']]
[['Fish Called Wanda, A (1988)' 'Comedy']]


In [41]:
model3.M = model3.M.T
model3.find_similar2(10)
model3.M = model3.M.T

Film                name                   category
9  GoldenEye (1995)  Action|Adventure|Thriller
[['GoldenEye (1995)' 'Action|Adventure|Thriller']]
[['Analyze This (1999)' 'Comedy']]
[['Airport 1975 (1974)' 'Drama']]
[['Addiction, The (1995)' 'Horror']]
[['Bulworth (1998)' 'Comedy']]
[['Wild Things (1998)' 'Crime|Drama|Mystery|Thriller']]
[['Bridges of Madison County, The (1995)' 'Drama|Romance']]
[['Rules of Engagement (2000)' 'Drama|Thriller']]
[['Clueless (1995)' 'Comedy|Romance']]
[['Beautician and the Beast, The (1997)' 'Comedy|Romance']]


### Задание 4. Не использую готовые решения, реализовать матричное разложение WARP на implicit данных