##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Recommending movies: retrieval

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/basic_retrieval"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/basic_retrieval.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/basic_retrieval.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/basic_retrieval.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

Real-world recommender systems are often composed of two stages:

1. The retrieval stage is responsible for selecting an initial set of hundreds of candidates from all possible candidates. The main objective of this model is to efficiently weed out all candidates that the user is not interested in. Because the retrieval model may be dealing with millions of candidates, it has to be computationally efficient.
2. The ranking stage takes the outputs of the retrieval model and fine-tunes them to select the best possible handful of recommendations. Its task is to narrow down the set of items the user may be interested in to a shortlist of likely candidates.

In this tutorial, we're going to focus on the first stage, retrieval. If you are interested in the ranking stage, have a look at our [ranking](basic_ranking) tutorial.

Retrieval models are often composed of two sub-models:

1. A query model computing the query representation (normally a fixed-dimensionality embedding vector) using query features.
2. A candidate model computing the candidate representation (an equally-sized vector) using the candidate features

The outputs of the two models are then multiplied together to give a query-candidate affinity score, with higher scores expressing a better match between the candidate and the query.

In this tutorial, we're going to build and train such a two-tower model using the Movielens dataset.

We're going to:

1. Get our data and split it into a training and test set.
2. Implement a retrieval model.
3. Fit and evaluate it.
4. Export it for efficient serving by building an approximate nearest neighbours (ANN) index.



## The dataset

The Movielens dataset is a classic dataset from the [GroupLens](https://grouplens.org/datasets/movielens/) research group at the University of Minnesota. It contains a set of ratings given to movies by a set of users, and is a workhorse of recommender system research.

The data can be treated in two ways:

1. It can be interpreted as expressesing which movies the users watched (and rated), and which they did not. This is a form of implicit feedback, where users' watches tell us which things they prefer to see and which they'd rather not see.
2. It can also be seen as expressesing how much the users liked the movies they did watch. This is a form of explicit feedback: given that a user watched a movie, we can tell roughly how much they liked by looking at the rating they have given.

In this tutorial, we are focusing on a retrieval system: a model that predicts a set of movies from the catalogue that the user is likely to watch. Often, implicit data is more useful here, and so we are going to treat Movielens as an implicit system. This means that every movie a user watched is a positive example, and every movie they have not seen is an implicit negative example.

## Imports


Let's first get our imports out of the way.

In [None]:
from google.colab import drive
drive.mount('/gdrive')

Mounted at /gdrive


In [None]:
%mkdir ../gdrive/MyDrive/Projects/MovieRecommendation/Training\ Workspace/

mkdir: cannot create directory ‘../gdrive/MyDrive/Projects/MovieRecommendation/Training Workspace/’: File exists


In [None]:
%cd ../gdrive/MyDrive/Projects/MovieRecommendation/Training\ Workspace/

/gdrive/MyDrive/Projects/MovieRecommendation/Training Workspace


In [None]:
!ls

data	  model		  model.h5  tfjs_model	    vectors.tsv
meta.tsv  model_dense.h5  models    tfjs_model.zip


# Model Developement

## Data Preparation

In [None]:
%cd data

/gdrive/MyDrive/Projects/MovieRecommendation/Training Workspace/data


In [None]:
# Download Latest Movielens 25M from https://grouplens.org/datasets/movielens/
""" uncomment below if you are running this notebook for the first time """
#!wget https://files.grouplens.org/datasets/movielens/ml-25m.zip

' uncomment below if you are running this notebook for the first time '

In [None]:
import zipfile

# extract dataset
""" uncomment below if you are running this notebook for the first time """
#zipfile.ZipFile('ml-25m.zip').extractall()

' uncomment below if you are running this notebook for the first time '

In [None]:
%cd ..

/gdrive/MyDrive/Projects/MovieRecommendation/Training Workspace


In [None]:
!ls data/ml-25m

genome-scores.csv  links.csv   ratings.csv  tags.csv
genome-tags.csv    movies.csv  README.txt


In [None]:
# dataset description
!cat data/ml-25m/README.txt

Summary

This dataset (ml-25m) describes 5-star rating and free-text tagging activity from [MovieLens](http://movielens.org), a movie recommendation service. It contains 25000095 ratings and 1093360 tag applications across 62423 movies. These data were created by 162541 users between January 09, 1995 and November 21, 2019. This dataset was generated on November 21, 2019.

Users were selected at random for inclusion. All selected users had rated at least 20 movies. No demographic information is included. Each user is represented by an id, and no other information is provided.

The data are contained in the files `genome-scores.csv`, `genome-tags.csv`, `links.csv`, `movies.csv`, `ratings.csv` and `tags.csv`. More details about the contents and use of all these files follows.

This and other GroupLens data sets are publicly available for download at <http://grouplens.org/datasets/>.


Usage License

Neither the University of Minnesota nor any of the researchers involved can guarantee the 

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf

movies_df = pd.read_csv('data/used_movies.csv', index_col=0)
ratings_df = pd.read_csv('data/ml-25m/ratings.csv')
tags_df = pd.read_csv('data/ml-25m/tags.csv')

In [None]:
movies_df.to_json('data/movies.json', orient='records')

In [None]:
ratings_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000095 entries, 0 to 25000094
Data columns (total 4 columns):
 #   Column     Dtype  
---  ------     -----  
 0   userId     int64  
 1   movieId    int64  
 2   rating     float64
 3   timestamp  int64  
dtypes: float64(1), int64(3)
memory usage: 762.9 MB


In [None]:
ratings_df.userId.value_counts().median()

71.0

In [None]:
tags_df

Unnamed: 0,userId,movieId,tag,timestamp
0,3,260,classic,1439472355
1,3,260,sci-fi,1439472256
2,4,1732,dark comedy,1573943598
3,4,1732,great dialogue,1573943604
4,4,7569,so bad it's good,1573943455
...,...,...,...,...
1093355,162521,66934,Neil Patrick Harris,1427311611
1093356,162521,103341,cornetto trilogy,1427311259
1093357,162534,189169,comedy,1527518175
1093358,162534,189169,disabled,1527518181


In [None]:
movies_df

Unnamed: 0,movieId,title,genres,predicted_rating
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,0.787169
1,2,Jumanji (1995),Adventure|Children|Fantasy,0.629399
2,3,Grumpier Old Men (1995),Comedy|Romance,0.551457
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,0.476919
4,5,Father of the Bride Part II (1995),Comedy,0.495895
...,...,...,...,...
12283,59026,99 francs (2007),Comedy,0.798638
12284,59031,Private Property (Nue propriété) (2006),Drama,0.788572
12285,59037,Speed Racer (2008),Action|Children|Sci-Fi|IMAX,0.584359
12286,59040,Gabriel (2007),Action|Horror,0.599247


In [None]:
ratings_df['rating'] = ratings_df['rating'].apply(lambda x: 1 if x > 3 else 0)

In [None]:
ratings = ratings_df[['userId', 'movieId', 'rating']]

In [None]:
ratings

Unnamed: 0,userId,movieId,rating
0,1,296,1
1,1,306,1
2,1,307,1
3,1,665,1
4,1,899,1
...,...,...,...
25000090,162541,50872,1
25000091,162541,55768,0
25000092,162541,56176,0
25000093,162541,58559,1


In [None]:
n_movies = ratings.movieId.nunique()
n_users = ratings.userId.nunique()

print(n_movies)
print(n_users)

59047
162541


In [None]:
from sklearn.model_selection import train_test_split

train, test = train_test_split(ratings, test_size=0.1)

In [None]:
print(train.shape)
print(test.shape)

(22500085, 3)
(2500010, 3)


In [None]:
train

Unnamed: 0,userId,movieId,rating
2917204,19260,47,1
15204978,98520,175,0
12599567,81477,106487,1
8939222,58245,733,1
5012680,32640,133125,0
...,...,...,...
23142431,150301,1222,1
1082569,7296,4994,1
15081872,97712,4995,0
23545013,152825,89745,1


# Create Recommendation System 

### Build Model

In [None]:
from tensorflow.keras.models import Model, Sequential 
from tensorflow.keras.layers import Input, Embedding, Dot, Flatten, Dense, concatenate, Dropout
from tensorflow.keras.activations import sigmoid

EMBEDDING_DIM = 8 

# WIDE MODELbl
wide_model = Sequential([Dense(EMBEDDING_DIM * 2, use_bias='false'),
                         Dropout(0.1)])

# DEEP MODEL
deep_model = Sequential([Dense(EMBEDDING_DIM, activation='relu', use_bias='false'),
                         Dropout(0.1),
                         Dense(EMBEDDING_DIM, activation='relu', use_bias='false')])

# input layers
movie_input = Input(shape=[1])
user_input = Input(shape=[1])

# embedding layers 
movie_embedding = Embedding(n_movies+1, EMBEDDING_DIM)(movie_input)
user_embedding = Embedding(n_users+1, EMBEDDING_DIM)(user_input)

# flatten the embeddings
movie_flat = Flatten()(movie_embedding)
user_flat = Flatten()(user_embedding)

# wide and deep model
concat = concatenate([movie_flat, user_flat])
wide_output = wide_model(concat)
deep_output = deep_model(concat)

# output layer
concat_2 = concatenate([wide_output, deep_output])
output = Dense(1, activation='sigmoid', use_bias='false')(concat_2)
#output = sigmoid(Dot(1)([wide_output, deep_output]))

# the model
model = Model([movie_input, user_input], [output])

In [None]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 1)]          0           []                               
                                                                                                  
 input_2 (InputLayer)           [(None, 1)]          0           []                               
                                                                                                  
 embedding (Embedding)          (None, 1, 8)         472384      ['input_1[0][0]']                
                                                                                                  
 embedding_1 (Embedding)        (None, 1, 8)         1300336     ['input_2[0][0]']                
                                                                                              

In [None]:
from tensorflow.keras.optimizers import Adam

#model.compile(optimizer=Adam(lr=1e-3), loss='binary_crossentropy', metrics=['accuracy'])
model.compile(optimizer=Adam(lr=1e-2), loss='binary_crossentropy', metrics=['accuracy'])


  super(Adam, self).__init__(name, **kwargs)


In [None]:
train.head()

Unnamed: 0,userId,movieId,rating
2917204,19260,47,1
15204978,98520,175,0
12599567,81477,106487,1
8939222,58245,733,1
5012680,32640,133125,0


### Train the Model

In [None]:
history = model.fit(x=[train.movieId, train.userId], y=train.rating,
                    validation_data=([test.movieId, test.userId], test.rating),
                    batch_size=8192*2,
                    epochs=5)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
 296/1374 [=====>........................] - ETA: 15s - loss: 0.4823 - accuracy: 0.7621

KeyboardInterrupt: ignored

### Model Evaluation

In [None]:
import matplotlib.pyplot as plt

losses = pd.DataFrame(history.history)
plt.plot(losses)

NameError: ignored

In [None]:
model.evaluate([test.movieId, test.userId], test.rating, batch_size=8192)



[0.5019150376319885, 0.7515029907226562]

### Save the Model

In [None]:
model.save('model/model_small.h5')

In [None]:
model = tf.keras.models.load_model('model/model_small.h5')

In [None]:
!pip install tensorflowjs
!tensorflowjs_converter --input_format=keras ./model/model_small.h5 ./model/tfjs_model



In [None]:
from zipfile import ZipFile
import os

with ZipFile('model/tfjs_model.zip', 'w') as z:
  for filename in os.listdir('model/tfjs_model'): 
    filepath = os.path.join('model/tfjs_model', filename)
    z.write(filepath)

# Some Extras

### Show Recommendations

In [None]:
import numpy as np
import matplotlib.pyplot as plt

def get_ratings(user_id, movies, model):
  movies = movies.copy()
  user_ids = np.array([user_id] * len(movies))
  results = model([movies.movieId.values, user_ids]).numpy().reshape(-1)

  movies['predicted_rating'] = pd.Series(results, index=movies.index)
  movies = movies.sort_values('predicted_rating', ascending=False)

  return movies

result =  get_ratings(3000, movies_df, model)
print(result.predicted_rating.value_counts())
result

0.897295    2
0.877261    2
0.956875    2
0.889908    2
0.932032    2
           ..
0.851248    1
0.851236    1
0.851232    1
0.851201    1
0.000642    1
Name: predicted_rating, Length: 12282, dtype: int64


Unnamed: 0,movieId,title,genres,predicted_rating
11601,53052,Tokyo Olympiad (1965),Documentary,0.995371
8501,25987,"Crucified Lovers, The (Chikamatsu monogatari) ...",Drama,0.994700
7393,7644,Divorce Iranian Style (1998),Documentary,0.993181
10516,42335,Familia (1996),Comedy,0.991809
10599,43518,Charlie: The Life and Art of Charles Chaplin (...,Documentary,0.991763
...,...,...,...,...
12033,56835,Pledge This! (2006),Comedy,0.000899
4669,4775,Glitter (2001),Drama|Musical|Romance,0.000828
9489,31083,Man Trouble (1992),Comedy|Romance,0.000798
4944,5050,"Farewell, The (Abschied - Brechts letzter Somm...",Drama,0.000732


In [None]:
def get_recommendation(user_id, movies=movies_df, model=model, n=25):
  result =  get_ratings(user_id, movies_df, model)
  #result.predicted_rating.hist(bins=50)
  top_100 = result.predicted_rating.values[:100].reshape(1, -1)
  recommended_movies = tf.random.categorical(top_100, n)
  print(f'Recommendations for user {user_id}')
  for id in recommended_movies.numpy()[0]:
    print(result.reset_index().loc[id]['movieId'], end=' - ')
    print(result.reset_index().loc[id]['predicted_rating'], end=' - ')
    print(result.reset_index().loc[id]['genres'], end=' - ')
    print(result.reset_index().loc[id]['title'])

In [None]:
get_recommendation(3000)

Recommendations for user 3000
8335 - 0.9895196 - Drama - Make Way for Tomorrow (1937)
27302 - 0.99062157 - Crime|Drama|Thriller - Debt, The (Dlug) (1999)
50 - 0.9776743 - Crime|Mystery|Thriller - Usual Suspects, The (1995)
6650 - 0.97895515 - Comedy|Drama - Kind Hearts and Coronets (1949)
2959 - 0.97800064 - Action|Crime|Drama|Thriller - Fight Club (1999)
41226 - 0.97904277 - Drama - Sounder (1972)
47180 - 0.9840291 - Comedy|Crime|Mystery|Thriller - Gas, Inspector Palmu! (Kaasua, komisario Palmu!) (1961)
7215 - 0.9776474 - Adventure|Drama|Romance|Thriller|War - To Have and Have Not (1944)
4454 - 0.97905767 - Animation|Drama|Sci-Fi|IMAX - More (1998)
3077 - 0.9820287 - Documentary - 42 Up (1998)
27313 - 0.9893617 - Crime|Drama - Roberto Succo (2001)
8420 - 0.98343325 - Drama|Film-Noir - Possessed (1947)
55132 - 0.97969145 - Drama|Romance - Bubble, The (Ha-Buah) (2006)
44555 - 0.98533994 - Drama|Romance|Thriller - Lives of Others, The (Das leben der Anderen) (2006)
2203 - 0.9777028 - Cri

In [None]:
get_recommendation(20)

Recommendations for user 20
47180 - 0.9699062 - Comedy|Crime|Mystery|Thriller - Gas, Inspector Palmu! (Kaasua, komisario Palmu!) (1961)
34135 - 0.97823423 - Comedy|Drama - Bonjour Monsieur Shlomi (Ha-Kochavim Shel Shlomi) (2003)
32799 - 0.978044 - Drama|Romance - Maidens in Uniform (Mädchen in Uniform) (1931)
33092 - 0.9876409 - Drama - Acts of Worship (2001) 
7153 - 0.9700075 - Action|Adventure|Drama|Fantasy - Lord of the Rings: The Return of the King, The (2003)
7568 - 0.9842554 - Comedy|Romance - Love Life (2001)
8936 - 0.97804993 - Drama - Life and Nothing But (Vie et rien d'autre, La) (1989)
8936 - 0.97804993 - Drama - Life and Nothing But (Vie et rien d'autre, La) (1989)
25826 - 0.98086816 - Comedy|Romance - Libeled Lady (1936)
25987 - 0.98668057 - Drama - Crucified Lovers, The (Chikamatsu monogatari) (1954)
5820 - 0.9742406 - Documentary|Musical - Standing in the Shadows of Motown (2002)
27871 - 0.9714902 - Drama - Something the Lord Made (2004)
43518 - 0.97939175 - Documentary 

In [None]:
model.summary()

Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_7 (InputLayer)           [(None, 1)]          0           []                               
                                                                                                  
 input_8 (InputLayer)           [(None, 1)]          0           []                               
                                                                                                  
 embedding_6 (Embedding)        (None, 1, 128)       7558144     ['input_7[0][0]']                
                                                                                                  
 embedding_7 (Embedding)        (None, 1, 128)       20805376    ['input_8[0][0]']                
                                                                                            

In [None]:
from tensorflow.keras.optimizers import Adam

for l in model.layers:
  l.trainable=False

model.layers[3].trainable = True
model.compile(optimizer=Adam(lr=5e-3), loss='binary_crossentropy', metrics=['accuracy'])

  super(Adam, self).__init__(name, **kwargs)


In [None]:
for l in model.layers:
  print(l.trainable)

False
False
False
True
False
False
False
False
False
False
False


In [None]:
get_ratings(100, movies_df, model).head(20)

Unnamed: 0,movieId,title,genres,predicted_rating
11601,53052,Tokyo Olympiad (1965),Documentary,0.989263
8501,25987,"Crucified Lovers, The (Chikamatsu monogatari) ...",Drama,0.987718
7393,7644,Divorce Iranian Style (1998),Documentary,0.98423
9825,32799,Maidens in Uniform (Mädchen in Uniform) (1931),Drama|Romance,0.97974
10505,42152,Interrogation (Przesluchanie) (1989),Crime|Drama|Thriller,0.979652
10599,43518,Charlie: The Life and Art of Charles Chaplin (...,Documentary,0.977371
11617,53187,Beauty in Trouble (Kráska v nesnázích) (2006),Drama,0.976274
9163,27313,Roberto Succo (2001),Crime|Drama,0.975521
9156,27302,"Debt, The (Dlug) (1999)",Crime|Drama|Thriller,0.972258
10788,45194,"Nibelungen: Kriemhild's Revenge, Die (Die Nibe...",Adventure|Drama|Fantasy,0.971162


In [None]:
user_id = 100 

a = pd.DataFrame([[8859, user_id, 1],
            [31083, user_id, 1],
            [57551, user_id, 0],
            [162074, user_id, 0],
            [42152, user_id, 0]], columns=['movieId', 'userId', 'rating'])

In [None]:
history = model.fit(x=[a.movieId, a.userId], y=a.rating,
                    epochs=1)



In [None]:
get_ratings(100, movies_df, model).head(20)

Unnamed: 0,movieId,title,genres,predicted_rating
8565,26082,Harakiri (Seppuku) (1962),Drama,0.893727
1164,1193,One Flew Over the Cuckoo's Nest (1975),Drama,0.893628
10279,38159,"Short Film About Love, A (Krótki film o milosc...",Drama|Romance,0.891526
1173,1203,12 Angry Men (1957),Drama,0.888824
5904,6016,City of God (Cidade de Deus) (2002),Action|Adventure|Crime|Drama|Thriller,0.887649
3339,3435,Double Indemnity (1944),Crime|Drama|Film-Noir,0.887458
7538,7926,High and Low (Tengoku to jigoku) (1963),Crime|Drama|Film-Noir|Thriller,0.88689
1150,1178,Paths of Glory (1957),Drama|War,0.885717
840,858,"Godfather, The (1972)",Crime|Drama,0.885181
9569,31545,"Trou, Le (Hole, The) (Night Watch, The) (1960)",Crime|Film-Noir,0.882415


In [None]:
N_LAST_MOVIE = 5
RETRAINING_EPOCH = 1
LEARNING_RATE = 5e-2
LIKED_MOVIE_LABEL_WEIGHT = 20

model.compile(optimizer=Adam(lr=LEARNING_RATE), loss='binary_crossentropy', metrics=['accuracy'])

  super(Adam, self).__init__(name, **kwargs)


In [None]:
def get_swipe_recommendation(user_id, rated_movies, year_limit, movies=movies_df, model=model):
  movies = movies_df.copy()
  for id in rated_movies:
    movies = movies[movies.movieId != id]
  if year_limit:
    movies['year'] = movies.title.apply(lambda x: int(x.strip()[-5:-1]) if  x.strip()[-5:-1].isnumeric() else None)
    movies = movies.dropna(axis=0)
    movies = movies[movies['year'] > year_limit]
  result =  get_ratings(user_id, movies, model)
  top_100 = result.predicted_rating.values[:100].reshape(1, -1)
  recommended_movies = tf.random.categorical(top_100, 1)
  print(f'Recommendations for user {user_id}')
  id = recommended_movies.numpy()[0][0]
  print(result.reset_index().loc[id]['movieId'], end=' - ')
  print(result.reset_index().loc[id]['predicted_rating'], end=' - ')
  print(result.reset_index().loc[id]['genres'], end=' - ')
  print(result.reset_index().loc[id]['title'])
  return result.reset_index().loc[id]

def demo_user(user_id, movies=movies_df, model=model, year_limit=None):
  user_history = []
  rated_movies = []
  liked_movies = []

  for i in range(5):
    recommended_movie = get_swipe_recommendation(user_id, rated_movies, year_limit, movies, model)

    valid = False
    while not valid:
        valid = True
        like = int(input('like/dislike (1/0)?: '))
        if like == 1:
          liked_movies.append([recommended_movie['movieId'], recommended_movie['title']])
        elif like == 0:
          pass
        else:
            valid=False

    user_history.append([recommended_movie['movieId'], user_id, like])
    rated_movies.append(recommended_movie['movieId'])
    history_df = pd.DataFrame(user_history[-N_LAST_MOVIE:], columns=['movieId', 'userId', 'rating'])
    # duplicate liked movies sample
    history_df_liked = history_df[history_df['rating']==1].copy()
    for _ in range(LIKED_MOVIE_LABEL_WEIGHT):
      history_df = history_df.sample(frac=1)   
      history_df = history_df.append(history_df_liked)
    print(history_df.head(30))
    model.fit(x=[history_df.movieId, history_df.userId], y=history_df.rating, epochs=RETRAINING_EPOCH, verbose=0)
    model.evaluate(x=[history_df.movieId, history_df.userId], y=history_df.rating)

  liked_movies = pd.DataFrame(liked_movies, columns=['movieId', 'title'])
  movies_rating = get_ratings(user_id, movies, model) 
  liked_movies = pd.merge(liked_movies, movies_rating, on=['movieId','movieId'])
  liked_movies = liked_movies.sort_values('predicted_rating')
  matched_movie = liked_movies.iloc[:10]
  print(matched_movie)

In [None]:
demo_user(150, year_limit=2000)

Recommendations for user 150
114891 - 0.81605136 - Drama - Legendary (2010)
like/dislike (1/0)?: 0
   movieId  userId  rating
0   114891     150       0
Recommendations for user 150
114760 - 0.8075644 - Comedy|Drama - Happy Christmas (2014)
like/dislike (1/0)?: 1
   movieId  userId  rating
1   114760     150       1
0   114891     150       0
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
Recommendations for user 150
115357 - 0.8013265 - Drama|Horror|Thriller - Chair, The (2007)
like/dislike (1/0)?: 0
   movieId  userId  rating
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1
1   114760     150       1


In [None]:
N_LAST_MOVIE = 5
RETRAINING_EPOCH = 1
LEARNING_RATE = 1e-1
LIKED_MOVIE_LABEL_WEIGHT = 5 

model.compile(optimizer=Adam(lr=LEARNING_RATE), loss='binary_crossentropy', metrics=['accuracy'])

  super(Adam, self).__init__(name, **kwargs)


In [None]:
def get_swipe_recommendation(user_id, rated_movies, year_limit, movies=movies_df, model=model):
  movies = movies_df.copy()
  for id in rated_movies  :
    movies = movies[movies.movieId != id]
  if year_limit:
    movies['year'] = movies.title.apply(lambda x: int(x.strip()[-5:-1]) if  x.strip()[-5:-1].isnumeric() else None)
    movies = movies.dropna(axis=0)
    movies = movies[movies['year'] > year_limit]
  result =  get_ratings(user_id, movies, model)
  top_10 = result.predicted_rating.values[:10].reshape(1, -1)
  recommended_movies = tf.random.categorical(top_10, 1)
  id = recommended_movies.numpy()[0][0]
  return result.reset_index().loc[id]


def demo_user(user_id, movies=movies_df, model=model, year_limit=None):
  user_history = []
  rated_movies = []
  liked_movies = []

  for i in range(3):
    recommended_movies = []
    for i in range(5):
      recommended_movie = get_swipe_recommendation(user_id, rated_movies, year_limit, movies, model)
      rated_movies.append(recommended_movie['movieId'])
      recommended_movies.append(recommended_movie)

    for recommended_movie in recommended_movies:
      print(f'Recommendations for user {user_id}')
      print(recommended_movie['movieId'], end=' - ')
      print(recommended_movie['predicted_rating'], end=' - ')
      print(recommended_movie['genres'], end=' - ')
      print(recommended_movie['title'])
      valid = False
      while not valid:
          valid = True
          like = int(input('like/dislike (1/0)?: '))
          if like == 1:
            liked_movies.append([recommended_movie['movieId'], recommended_movie['title']])
          elif like == 0:
            pass
          else:
              valid=False
  
      user_history.append([recommended_movie['movieId'], user_id, like])
      history_df = pd.DataFrame(user_history[-N_LAST_MOVIE:], columns=['movieId', 'userId', 'rating'])
      # duplicate liked movies sample
      history_df_liked = history_df[history_df['rating']==1].copy()
      for _ in range(LIKED_MOVIE_LABEL_WEIGHT):
        history_df = history_df.sample(frac=1)   
        history_df = history_df.append(history_df_liked)
      print(history_df.head(30))
    model.fit(x=[history_df.movieId, history_df.userId], y=history_df.rating, epochs=RETRAINING_EPOCH, verbose=0)
    model.evaluate(x=[history_df.movieId, history_df.userId], y=history_df.rating)

  if len(liked_movies)>0:
    liked_movies = pd.DataFrame(liked_movies, columns=['movieId', 'title'])
    user_ids = np.array([user_id] * len(liked_movies))
    print(liked_movies.movieId.values)
    print(user_ids)
    predicted_rating = model([liked_movies.movieId.values, user_ids]).numpy().reshape(-1)
    liked_movies['predicted_rating'] = pd.Series(predicted_rating)
    liked_movies = liked_movies.sort_values('predicted_rating')
    matched_movie = liked_movies.iloc[:10]
    print(matched_movie.columns)
    print(matched_movie)

In [None]:
demo_user(200)

Recommendations for user 200
42335 - 0.99438286 - Comedy - Familia (1996)
like/dislike (1/0)?: 1
   movieId  userId  rating
0    42335     200       1
0    42335     200       1
0    42335     200       1
0    42335     200       1
0    42335     200       1
0    42335     200       1
Recommendations for user 200
33092 - 0.98687476 - Drama - Acts of Worship (2001) 
like/dislike (1/0)?: 0
   movieId  userId  rating
0    42335     200       1
0    42335     200       1
0    42335     200       1
0    42335     200       1
1    33092     200       0
0    42335     200       1
0    42335     200       1
Recommendations for user 200
53187 - 0.99492943 - Drama - Beauty in Trouble (Kráska v nesnázích) (2006)
like/dislike (1/0)?: 0
   movieId  userId  rating
2    53187     200       0
0    42335     200       1
1    33092     200       0
0    42335     200       1
0    42335     200       1
0    42335     200       1
0    42335     200       1
0    42335     200       1
Recommendations for use

In [None]:
!ls data

In [None]:
movies_df[movies_df.movieId==95441]

Unnamed: 0,movieId,title,genres
18229,95441,Ted (2012),Comedy|Fantasy


In [None]:
def get_embedding(movies, model):
  vectors = model.layers[2](movies.movieId.values).numpy()
  vectors = pd.DataFrame(vectors)
  vectors.to_csv('vectors.tsv', sep='\t', header=False)

  movies.to_csv('meta.tsv', sep='\t')
  
get_embedding(movies_df, model)