### Матричные факторизации

В данной работе вам предстоит познакомиться с практической стороной матричных разложений.
Работа поделена на 4 задания:
1. Вам необходимо реализовать SVD разложения используя SGD на explicit данных
2. Вам необходимо реализовать матричное разложения используя ALS на implicit данных
3. Вам необходимо реализовать матричное разложения используя BPR на implicit данных
4. Вам необходимо реализовать матричное разложения используя WARP на implicit данных

Мягкий дедлайн 13 Октября (пишутся замечания, выставляется оценка, есть возможность исправить до жесткого дедлайна)

Жесткий дедлайн 20 Октября (Итоговая проверка)

In [1]:
import implicit
import pandas as pd
import numpy as np
import scipy.sparse as sp
from tqdm import tqdm
from scipy.special import expit as sigmoid

from lightfm.datasets import fetch_movielens

В данной работе мы будем работать с explicit датасетом movieLens, в котором представленны пары user_id movie_id и rating выставленный пользователем фильму

Скачать датасет можно по ссылке https://grouplens.org/datasets/movielens/1m/

In [2]:
ratings = pd.read_csv('ml-1m/ratings.dat', delimiter='::', header=None, 
        names=['user_id', 'movie_id', 'rating', 'timestamp'], 
        usecols=['user_id', 'movie_id', 'rating'], engine='python')

In [3]:
movie_info = pd.read_csv('ml-1m/movies.dat', delimiter='::', header=None, 
        names=['movie_id', 'name', 'category'], engine='python', encoding='latin-1')

Explicit данные

In [4]:
ratings

Unnamed: 0,user_id,movie_id,rating
0,1,1193,5
1,1,661,3
2,1,914,3
3,1,3408,4
4,1,2355,5
...,...,...,...
1000204,6040,1091,1
1000205,6040,1094,5
1000206,6040,562,5
1000207,6040,1096,4


Для того, чтобы преобразовать текущий датасет в Implicit, давайте считать что позитивная оценка это оценка >=4

In [5]:
implicit_ratings = ratings.loc[(ratings['rating'] >= 4)]

In [6]:
implicit_ratings.head(10)

Unnamed: 0,user_id,movie_id,rating
0,1,1193,5
3,1,3408,4
4,1,2355,5
6,1,1287,5
7,1,2804,5
8,1,594,4
9,1,919,4
10,1,595,5
11,1,938,4
12,1,2398,4


Удобнее работать с sparse матричками, давайте преобразуем DataFrame в CSR матрицы

In [7]:
users = implicit_ratings["user_id"]
movies = implicit_ratings["movie_id"]
user_item = sp.coo_matrix((np.ones_like(users), (users, movies)))
user_item_t_csr = user_item.T.tocsr()
user_item_csr = user_item.tocsr()

В качестве примера воспользуемся ALS разложением из библиотеки implicit

Зададим размерность латентного пространства равным 64, это же определяет размер user/item эмбедингов

In [8]:
model = implicit.als.AlternatingLeastSquares(factors=64, iterations=100, calculate_training_loss=True)



В качестве loss здесь всеми любимый RMSE

In [9]:
model.fit(user_item_t_csr)

  0%|          | 0/100 [00:00<?, ?it/s]

Построим похожие фильмы по 1 movie_id = Истории игрушек

In [10]:
movie_info

Unnamed: 0,movie_id,name,category
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
3878,3948,Meet the Parents (2000),Comedy
3879,3949,Requiem for a Dream (2000),Drama
3880,3950,Tigerland (2000),Drama
3881,3951,Two Family House (2000),Drama


In [11]:
get_similars = lambda item_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                        for x in model.similar_items(item_id)]

Как мы видим, симилары действительно оказались симиларами.

Качество симиларов часто является хорошим способом проверить качество алгоритмов.

P.S. Если хочется поглубже разобраться в том как разные алгоритмы формируют разные латентные пространства, рекомендую загружать полученные вектора в tensorBoard и смотреть на сформированное пространство

In [12]:
get_similars(1, model)

['0    Toy Story (1995)',
 '3045    Toy Story 2 (1999)',
 "2286    Bug's Life, A (1998)",
 '584    Aladdin (1992)',
 '33    Babe (1995)',
 '360    Lion King, The (1994)',
 '2315    Babe: Pig in the City (1998)',
 '1838    Mulan (1998)',
 '2618    Tarzan (1999)',
 '1526    Hercules (1997)']

Давайте теперь построим рекомендации для юзеров

Как мы видим юзеру нравится фантастика, значит и в рекомендациях ожидаем увидеть фантастику

In [13]:
get_user_history = lambda user_id, implicit_ratings : [movie_info[movie_info["movie_id"] == x]["name"].to_string() 
                                            for x in implicit_ratings[implicit_ratings["user_id"] == user_id]["movie_id"]]

In [14]:
get_user_history(4, implicit_ratings)

['3399    Hustler, The (1961)',
 '2882    Fistful of Dollars, A (1964)',
 '1196    Alien (1979)',
 '1023    Die Hard (1988)',
 '257    Star Wars: Episode IV - A New Hope (1977)',
 '1959    Saving Private Ryan (1998)',
 '476    Jurassic Park (1993)',
 '1180    Raiders of the Lost Ark (1981)',
 '1885    Rocky (1976)',
 '1081    E.T. the Extra-Terrestrial (1982)',
 '3349    Thelma & Louise (1991)',
 '3633    Mad Max (1979)',
 '2297    King Kong (1933)',
 '1366    Jaws (1975)',
 '1183    Good, The Bad and The Ugly, The (1966)',
 '2623    Run Lola Run (Lola rennt) (1998)',
 '2878    Goldfinger (1964)',
 '1220    Terminator, The (1984)']

Получилось! 

Мы действительно порекомендовали пользователю фантастику и боевики, более того встречаются продолжения тех фильмов, которые он высоко оценил

In [15]:
get_recommendations = lambda user_id, model : [movie_info[movie_info["movie_id"] == x[0]]["name"].to_string() 
                                               for x in model.recommend(user_id, user_item_csr)]

In [16]:
get_recommendations(4, model)

['585    Terminator 2: Judgment Day (1991)',
 '2502    Matrix, The (1999)',
 '1271    Indiana Jones and the Last Crusade (1989)',
 '1284    Butch Cassidy and the Sundance Kid (1969)',
 '1178    Star Wars: Episode V - The Empire Strikes Back...',
 '1182    Aliens (1986)',
 '3402    Close Encounters of the Third Kind (1977)',
 '847    Godfather, The (1972)',
 '2460    Planet of the Apes (1968)',
 '1267    Ben-Hur (1959)']

Теперь ваша очередь реализовать самые популярные алгоритмы матричных разложений

Что будет оцениваться:
1. Корректность алгоритма
2. Качество получившихся симиларов
3. Качество итоговых рекомендаций для юзера

### Задание 1. Не использую готовые решения, реализовать SVD разложение используя SGD на explicit данных

In [62]:
np.unique(ratings.movie_id.values).shape

(3706,)

In [63]:
np.max(ratings.movie_id.values)

3952

In [224]:
class SVD_SGD:
    def __init__(self, factors=64, iterations=10, regularization=0.1, learning_rate=0.1):
        self.factors = factors
        self.iterations = int(iterations)
        self.regularization = regularization
        self.lr = learning_rate
        
    def fit(self, ratings):
        ratings = ratings.values
        users_number, items_number = np.max(ratings, axis=0)[[0, 1]] + 1
        self.user_embeddings = np.random.uniform(0, 1 / np.sqrt(self.factors), (users_number, self.factors))
        self.item_embeddings = np.random.uniform(0, 1 / np.sqrt(self.factors), (items_number, self.factors))
        self.user_biases = 3 * np.ones(users_number)
        self.item_biases = 3 * np.ones(items_number)
        self.is_real_item = np.zeros(items_number, dtype=bool)
        for rating in ratings:
            self.is_real_item[rating[1]] = True
        for iteration in range(self.iterations):
            print('Epoch', iteration, end='')
            if iteration % 1 == 0:
                product = self.user_embeddings @ self.item_embeddings.T
                test_ratings_ind = np.random.choice(len(ratings), size=10000)
                mse = 0
                for ind in test_ratings_ind:
                    rating = ratings[ind]
                    mse += (product[rating[0]][rating[1]] + 
                            self.user_biases[rating[0]] + 
                            self.item_biases[rating[1]] - rating[2]) ** 2
                print(", MSE:", np.sqrt(mse / len(test_ratings_ind)))
            inds = np.random.choice(len(ratings), len(ratings), replace=False)
            for ind in inds:
                rating = ratings[ind]
                u = self.user_embeddings[rating[0]]
                i = self.item_embeddings[rating[1]]
                error = u @ i + self.user_biases[rating[0]] + self.item_biases[rating[1]] - rating[2]
                du = self.lr * (error * i + self.regularization * u)
                di = self.lr * (error * u + self.regularization * i)
                self.user_embeddings[rating[0]] -= du
                self.item_embeddings[rating[1]] -= di
                self.user_biases[rating[0]] -= self.lr * (error + self.regularization * self.user_biases[rating[0]])
                self.item_biases[rating[1]] -= self.lr * (error + self.regularization * self.item_biases[rating[1]])
        print('Done!')
            
    def cosine(self, emb1, emb2):
        return emb1 @ emb2 / (np.linalg.norm(emb1) * np.linalg.norm(emb2) + 1e-9)
            
    def get_similar_items(self, item_id, number=10, info=movie_info):
        embedding = self.item_embeddings[item_id]
        similarities = np.zeros(self.item_embeddings.shape[0])
        for i in range(self.item_embeddings.shape[0]):
            similarities[i] = self.cosine(self.item_embeddings[i], embedding)
        ind_sorted = np.argsort(similarities)
        i = -1
        top_ind = []
        while len(top_ind) < number:
            item = ind_sorted[i]
            if self.is_real_item[item] and item != item_id:
                top_ind.append(item)
            i -= 1
        return np.array([info[info.movie_id == i].values[0][1:] for i in top_ind])
        
    def recommend(self, user_id, number=10, info=movie_info, ratings=ratings):
        user_history = ratings[ratings.user_id == user_id].movie_id.values
        prediction = self.item_embeddings @ self.user_embeddings[user_id]# + self.item_biases
        ind_sorted = np.argsort(prediction)
        i = -1
        top_ind = []
        while len(top_ind) < number:
            item = ind_sorted[i]
            if self.is_real_item[item] and not (item in user_history):
                top_ind.append(item)
            i -= 1
        return np.array([info[info.movie_id == i].values[0][1:] for i in top_ind])

In [225]:
svg = SVD_SGD(factors=32, iterations=30, regularization=0.001, learning_rate=0.01)
svg.fit(ratings)

Epoch 0, MSE: 2.8947045132075364
Epoch 1, MSE: 0.9533944992294792
Epoch 2, MSE: 0.9084057932893849
Epoch 3, MSE: 0.9057111923433895
Epoch 4, MSE: 0.8657980105458488
Epoch 5, MSE: 0.8582435947130606
Epoch 6, MSE: 0.8337913004271197
Epoch 7, MSE: 0.8136934335575813
Epoch 8, MSE: 0.7795143230728384
Epoch 9, MSE: 0.7641377093653136
Epoch 10, MSE: 0.7533023899500666
Epoch 11, MSE: 0.7331745973602105
Epoch 12, MSE: 0.7256446597596504
Epoch 13, MSE: 0.7111007783523174
Epoch 14, MSE: 0.7125900773905454
Epoch 15, MSE: 0.7037930797662305
Epoch 16, MSE: 0.6921551627796912
Epoch 17, MSE: 0.6857785085602377
Epoch 18, MSE: 0.6876282531705731
Epoch 19, MSE: 0.6791264814355935
Epoch 20, MSE: 0.6724293340995322
Epoch 21, MSE: 0.6794352458994387
Epoch 22, MSE: 0.6751985270741727
Epoch 23, MSE: 0.6657604617423544
Epoch 24, MSE: 0.6541920637070648
Epoch 25, MSE: 0.6566960200357015
Epoch 26, MSE: 0.6587768699902218
Epoch 27, MSE: 0.6519295330922318
Epoch 28, MSE: 0.6499480374996967
Epoch 29, MSE: 0.6504663

In [226]:
movie_id = 1
print("Movie: ")
print(movie_info[movie_info.movie_id == movie_id].values)
print("Similar: ")
print(svg.get_similar_items(movie_id, 10))

Movie: 
[[1 'Toy Story (1995)' "Animation|Children's|Comedy"]]
Similar: 
[['Toy Story 2 (1999)' "Animation|Children's|Comedy"]
 ["Bug's Life, A (1998)" "Animation|Children's|Comedy"]
 ['Aladdin (1992)' "Animation|Children's|Comedy|Musical"]
 ['Little Mermaid, The (1989)'
  "Animation|Children's|Comedy|Musical|Romance"]
 ['Beauty and the Beast (1991)' "Animation|Children's|Musical"]
 ['Parent Trap, The (1998)' "Children's|Drama"]
 ['Sense and Sensibility (1995)' 'Drama|Romance']
 ['Babe (1995)' "Children's|Comedy|Drama"]
 ['Home Alone (1990)' "Children's|Comedy"]
 ['Lion King, The (1994)' "Animation|Children's|Musical"]]


In [233]:
print(svg.recommend(4, 10))

[['Black Mask (Hak hap) (1996)' 'Action']
 ['Local Hero (1983)' 'Comedy']
 ['Welcome To Sarajevo (1997)' 'Drama|War']
 ['Hamlet (2000)' 'Drama']
 ['Air Bud (1997)' "Children's|Comedy"]
 ['Bread and Chocolate (Pane e cioccolata) (1973)' 'Drama']
 ['Prom Night III: The Last Kiss (1989)' 'Horror']
 ['Goodbye, Lover (1999)' 'Comedy|Crime|Thriller']
 ['Brenda Starr (1989)' 'Adventure']
 ['Bushwhacked (1995)' 'Comedy']]


### Задание 2. Не использую готовые решения, реализовать матричное разложение используя ALS на implicit данных

In [270]:
class ALS:
    def __init__(self, factors=64, iterations=10, regularization=0.1):
        self.factors = factors
        self.iterations = int(iterations)
        self.regularization = regularization
        
    def fit(self, ratings_csr):
        users_number, items_number = ratings_csr.shape
        self.nonzero = ratings_csr.nonzero()
        self.user_embeddings = np.random.uniform(0, 1 / np.sqrt(self.factors), (users_number, self.factors))
        self.item_embeddings = np.random.uniform(0, 1 / np.sqrt(self.factors), (items_number, self.factors))
        for iteration in range(self.iterations):
            print('Epoch', iteration, end='')
            if iteration % 1 == 0:
                product = self.user_embeddings @ self.item_embeddings.T
                mse = np.power(product[self.nonzero] - ratings_csr[self.nonzero], 2).mean()
                print(", MSE:", np.sqrt(mse))
            
            self.user_embeddings = (np.linalg.inv(self.item_embeddings.T @ self.item_embeddings +
                                                  self.regularization * np.eye(self.factors)) 
                                    @ self.item_embeddings.T @ ratings_csr.T).T
            
            self.item_embeddings = (np.linalg.inv(self.user_embeddings.T @ self.user_embeddings +
                                                  self.regularization * np.eye(self.factors)) 
                                    @ self.user_embeddings.T @ ratings_csr).T
            
        print('Done!')
            
    def cosine(self, emb1, emb2):
        return emb1 @ emb2 / (np.linalg.norm(emb1) * np.linalg.norm(emb2) + 1e-9)
            
    def get_similar_items(self, item_id, number=10, info=movie_info):
        embedding = self.item_embeddings[item_id]
        similarities = np.zeros(self.item_embeddings.shape[0])
        for i in range(self.item_embeddings.shape[0]):
            similarities[i] = self.cosine(self.item_embeddings[i], embedding)
        ind_sorted = np.argsort(similarities)
        i = -1
        top_ind = []
        while len(top_ind) < number:
            item = ind_sorted[i]
            if item in self.nonzero[1] and item != item_id:
                top_ind.append(item)
            i -= 1
        return np.array([info[info.movie_id == i].values[0][1:] for i in top_ind])
        
    def recommend(self, user_id, number=10, info=movie_info, ratings=ratings):
        user_history = ratings[ratings.user_id == user_id].movie_id.values
        prediction = self.item_embeddings @ self.user_embeddings[user_id]
        ind_sorted = np.argsort(prediction)
        i = -1
        top_ind = []
        while len(top_ind) < number:
            item = ind_sorted[i]
            if item in self.nonzero[1] and not (item in user_history):
                top_ind.append(item)
            i -= 1
        return np.array([info[info.movie_id == i].values[0][1:] for i in top_ind])

In [277]:
als = ALS(factors=32, iterations=20, regularization=0.1)
als.fit(user_item_csr)

Epoch 0, MSE: 0.7508880887720185
Epoch 1, MSE: 0.774462399988525
Epoch 2, MSE: 0.6962928976566737
Epoch 3, MSE: 0.6862130794484421
Epoch 4, MSE: 0.6831423666303489
Epoch 5, MSE: 0.6817812421410685
Epoch 6, MSE: 0.6810587121606467
Epoch 7, MSE: 0.6806319971732542
Epoch 8, MSE: 0.6803654791018886
Epoch 9, MSE: 0.680192658435772
Epoch 10, MSE: 0.6800756164871765
Epoch 11, MSE: 0.6799921499829804
Epoch 12, MSE: 0.6799294046682709
Epoch 13, MSE: 0.679879991118707
Epoch 14, MSE: 0.679839667501222
Epoch 15, MSE: 0.6798059853963345
Epoch 16, MSE: 0.6797774937924441
Epoch 17, MSE: 0.679753272311457
Epoch 18, MSE: 0.6797326674343849
Epoch 19, MSE: 0.6797151552599291
Done!


In [280]:
movie_id = 1
print("Movie: ")
print(movie_info[movie_info.movie_id == movie_id].values)
print("Similar: ")
print(als.get_similar_items(movie_id, 10))

Movie: 
[[1 'Toy Story (1995)' "Animation|Children's|Comedy"]]
Similar: 
[['Toy Story 2 (1999)' "Animation|Children's|Comedy"]
 ['Aladdin (1992)' "Animation|Children's|Comedy|Musical"]
 ["Bug's Life, A (1998)" "Animation|Children's|Comedy"]
 ['Babe (1995)' "Children's|Comedy|Drama"]
 ['Mulan (1998)' "Animation|Children's"]
 ['Tarzan (1999)' "Animation|Children's"]
 ['Beauty and the Beast (1991)' "Animation|Children's|Musical"]
 ['Lion King, The (1994)' "Animation|Children's|Musical"]
 ['Pleasantville (1998)' 'Comedy']
 ['Groundhog Day (1993)' 'Comedy|Romance']]


In [279]:
print(als.recommend(4, 10))

[['Indiana Jones and the Last Crusade (1989)' 'Action|Adventure']
 ['Terminator 2: Judgment Day (1991)' 'Action|Sci-Fi|Thriller']
 ['Aliens (1986)' 'Action|Sci-Fi|Thriller|War']
 ['Godfather, The (1972)' 'Action|Crime|Drama']
 ['Matrix, The (1999)' 'Action|Sci-Fi|Thriller']
 ['Butch Cassidy and the Sundance Kid (1969)' 'Action|Comedy|Western']
 ['Fugitive, The (1993)' 'Action|Thriller']
 ['Godfather: Part II, The (1974)' 'Action|Crime|Drama']
 ['Back to the Future (1985)' 'Comedy|Sci-Fi']
 ['Lethal Weapon (1987)' 'Action|Comedy|Crime|Drama']]


### Задание 3. Не использую готовые решения, реализовать матричное разложение BPR на implicit данных

In [95]:
class BPR:
    def __init__(self, factors=64, iterations=10, regularization=0.01, learning_rate=0.01):
        self.factors = factors
        self.iterations = int(iterations)
        self.regularization = regularization
        self.lr = learning_rate
        
    def fit(self, ratings, ratings_csr):
        users_number, items_number = ratings_csr.shape
        ratings = ratings.values
        ratings_number = ratings.shape[0]
        self.nonzero = ratings_csr.nonzero()
        self.nonzero_unique = tuple(map(np.unique, self.nonzero))
        self.user_embeddings = np.random.uniform(0, 1 / np.sqrt(self.factors), (users_number, self.factors))
        self.item_embeddings = np.random.uniform(0, 1 / np.sqrt(self.factors), (items_number, self.factors))
        for iteration in range(self.iterations):
            print('Epoch', iteration, end='')
            
            order = np.random.permutation(ratings_number)
            
            for i in order:
                user_id = ratings[i][0]
                item_p_id = ratings[i][1]
                item_n_id = -1
                while True:
                    item_n_id = np.random.choice(items_number)#self.nonzero_unique[1])
                    if ratings_csr[user_id, item_n_id] == 0:
                        break
                
                s = sigmoid(self.user_embeddings[user_id] @ self.item_embeddings[item_n_id] -
                            self.user_embeddings[user_id] @ self.item_embeddings[item_p_id])
                
                self.user_embeddings[user_id] += self.lr * (
                    s * (self.item_embeddings[item_p_id] - self.item_embeddings[item_n_id]) +
                    self.regularization * self.user_embeddings[user_id]
                )
                
                self.item_embeddings[item_p_id] += self.lr * (
                    s * self.user_embeddings[user_id] +
                    self.regularization * self.item_embeddings[item_p_id]
                )
                
                self.item_embeddings[item_n_id] += self.lr * (
                    - s * self.user_embeddings[user_id] +
                    self.regularization * self.item_embeddings[item_n_id]
                )                
            
            if iteration % 1 == 0:
                product = self.user_embeddings @ self.item_embeddings.T
                auc = 0
                for user_id in self.nonzero_unique[0]:
                    p_items = ratings_csr[user_id].nonzero()[1]
                    p_mask = np.zeros(items_number, dtype=bool)
                    p_mask[p_items] = True
                    n_items = np.arange(0, items_number)[~p_mask]
                            
                    p = product[user_id][p_items]
                    n = product[user_id][n_items]
                    user_auc = (p[:, np.newaxis] > n).sum()
                    user_auc /= len(p_items) * len(n_items) * len(self.nonzero_unique[0])
                    auc += user_auc
                print(", AUC:", auc)
                
        print('Done!')
            
    def cosine(self, emb1, emb2):
        return emb1 @ emb2 / (np.linalg.norm(emb1) * np.linalg.norm(emb2) + 1e-9)
            
    def get_similar_items(self, item_id, number=10, info=movie_info):
        embedding = self.item_embeddings[item_id]
        similarities = np.zeros(self.item_embeddings.shape[0])
        for i in range(self.item_embeddings.shape[0]):
            similarities[i] = self.cosine(self.item_embeddings[i], embedding)
        ind_sorted = np.argsort(similarities)
        i = -1
        top_ind = []
        while len(top_ind) < number:
            item = ind_sorted[i]
            if item in self.nonzero[1] and item != item_id:
                top_ind.append(item)
            i -= 1
        return np.array([info[info.movie_id == i].values[0][1:] for i in top_ind])
        
    def recommend(self, user_id, number=10, info=movie_info, ratings=ratings):
        user_history = ratings[ratings.user_id == user_id].movie_id.values
        prediction = self.item_embeddings @ self.user_embeddings[user_id]
        ind_sorted = np.argsort(prediction)
        i = -1
        top_ind = []
        while len(top_ind) < number:
            item = ind_sorted[i]
            if item in self.nonzero[1] and not (item in user_history):
                top_ind.append(item)
            i -= 1
        return np.array([info[info.movie_id == i].values[0][1:] for i in top_ind])

In [100]:
bpr = BPR(factors=32, iterations=5, regularization=0, learning_rate=0.1)
bpr.fit(implicit_ratings, user_item_csr)

Epoch 0, AUC: 0.8905142844264243
Epoch 1, AUC: 0.9104396055928279
Epoch 2, AUC: 0.9305166590667183
Epoch 3, AUC: 0.9403015557624292
Epoch 4, AUC: 0.9430816563235525
Done!


In [101]:
movie_id = 1
print("Movie: ")
print(movie_info[movie_info.movie_id == movie_id].values)
print("Similar: ")
print(bpr.get_similar_items(movie_id, 10))

Movie: 
[[1 'Toy Story (1995)' "Animation|Children's|Comedy"]]
Similar: 
[['Nightmare Before Christmas, The (1993)' "Children's|Comedy|Musical"]
 ['Aladdin (1992)' "Animation|Children's|Comedy|Musical"]
 ['Beauty and the Beast (1991)' "Animation|Children's|Musical"]
 ['Toy Story 2 (1999)' "Animation|Children's|Comedy"]
 ["Bug's Life, A (1998)" "Animation|Children's|Comedy"]
 ['Bambi (1942)' "Animation|Children's"]
 ["Midsummer Night's Dream, A (1999)" 'Comedy|Fantasy']
 ['Mouse Hunt (1997)' "Children's|Comedy"]
 ['Desperately Seeking Susan (1985)' 'Comedy|Romance']
 ['Pinocchio (1940)' "Animation|Children's"]]


In [102]:
print(bpr.recommend(4, 10))

[['Matrix, The (1999)' 'Action|Sci-Fi|Thriller']
 ['Godfather, The (1972)' 'Action|Crime|Drama']
 ['Braveheart (1995)' 'Action|Drama|War']
 ['American Beauty (1999)' 'Comedy|Drama']
 ['Psycho (1960)' 'Horror|Thriller']
 ['Silence of the Lambs, The (1991)' 'Drama|Thriller']
 ['Blade Runner (1982)' 'Film-Noir|Sci-Fi']
 ['Usual Suspects, The (1995)' 'Crime|Thriller']
 ['Terminator 2: Judgment Day (1991)' 'Action|Sci-Fi|Thriller']
 ['Godfather: Part II, The (1974)' 'Action|Crime|Drama']]


### Задание 4. Не использую готовые решения, реализовать матричное разложение WARP на implicit данных