# Collaborative Filtering using Tensorflow & Keras
In our [Recommender Course](https://www.codingforentrepreneurs.com/courses/recommender/) we build a Django-based recommendation engine leveraging the Surprise ML package (among other things). This guide is made to help you upgrade your ML package by leveraging Keras and a neural network. 


Recommended requirements for running this notebook:
- GPU-accelerated / CUDA-enabled environment
- Cloud-based service such as Google Colab, Deepnote, and/or Paperspace
- [Recommender]((https://github.com/codingforentrepreneurs/recommender)) code forked/cloned/downloaded, open-source datasets loaded in, and Recommender models exported
- To export the [Recommender](https://github.com/codingforentrepreneurs/recommender)'s datasets, you can run the functions `export_rating_dataset_task` and `export_movies_dataset_task` in the `exports/tasks.py`
-  After you run these functions, you'll have the movies dataset located in `local-cdn/media/exports/movies/latest.csv` and the ratings dataset in `local-cdn/media/exports/ratings/latest.csv`



This code was directly inspired and modified from the following posts:
- [Fast.ai's Collaborative Filtering Lesson](https://course.fast.ai/Lessons/lesson7.html)
- [How to create a Recommendation System from scratch using Keras from the Antonai Blog](https://antonai.blog/how-to-create-a-recommendation-system-from-scratch-using-keras/)
- [Collaborative Filtering for Movie Recommendations the Keras Docs](https://keras.io/examples/structured_data/collaborative_filtering_movielens/)


### Open this notebook in...

[<img src="https://deepnote.com/buttons/launch-in-deepnote-white-small.svg">](https://deepnote.com/launch?url=https://github.com/codingforentrepreneurs/recommender/blob/main/src/nbs/Example%20Collaborative%20Filtering%20with%20Tensorflow%20Keras.ipynb)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/codingforentrepreneurs/recommender/blob/main/src/nbs/Example%20Collaborative%20Filtering%20with%20Tensorflow%20Keras.ipynb)

[![Run on Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/codingforentrepreneurs/recommender/blob/main/src/nbs/Example%20Collaborative%20Filtering%20with%20Tensorflow%20Keras.ipynb)

In [None]:
%pip install tensorflow sklearn matplotlib pandas

In [None]:
import pandas as pd
import numpy as np
from zipfile import ZipFile
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pathlib
from pathlib import Path
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [None]:
# if using a cloud provider, upload your files to an "exports folder"
# exports_dir = pathlib.Path().resolve() / 'exports' 

# if running this notebook from the root of the Recommender project
exports_dir = pathlib.Path().resolve().parent / 'local-cdn' / 'media' / 'exports'

movies_exports = exports_dir / 'movies' / 'latest.csv'
ratings_exports = exports_dir / 'ratings' / 'latest.csv'
print(movies_exports.exists(), ratings_exports.exists())

Load in the movies dataset

In [None]:
movies_df = pd.read_csv(movies_exports)

# add a "trend" column to combine the count of ratings with the movie's average rating
movies_df['trend'] = movies_df['rating_count'] * movies_df['rating_avg']
movies_df['movieIdx'] = movies_df['movieIdx'].astype(int)
movies_df['movieId'] = movies_df['movieId'].astype(int)

print(movies_df.shape)
movies_df.head()

Load in the entire ratings dataset

In [None]:
rating_df = pd.read_csv(ratings_exports)
print(rating_df.shape)
rating_df.head()

Join the movies dataset and ratings dataset.

In [None]:
df = rating_df.copy()
df['userId'] = df['userId'].astype(int)
df['movieId'] = df['movieId'].astype(int)
df = df.join(movies_df, on='movieId', rsuffix='_movie_df')
df.sort_values(by=['trend'], inplace=True, ascending=False)
print(df.shape)
df.head()

Make note of the missing number of movies from the ratings dataset. These are missing for a couple reasons: 
- Initial dataset used had invalid ids (from the movielens datasset) - Most likely
- Movies have been deleted from the Recommender database - Likely
- Incorrect datatypes - Unlikely but possible

In [None]:
missing_data = df[df['movieIdx'].isna()]

number_of_missing_movies = len(missing_data.movieId.unique().tolist())
print(number_of_missing_movies, 'movie ids missing that were rated')

Drop `NaN` columns that lack a `movieIdx` value:

In [None]:
training_df = df.copy().dropna()
training_df['movieIdx'] = training_df['movieIdx'].astype(int)
training_df.shape

In [None]:
user_ids = training_df["userId"].unique().tolist()
user2user_encoded = {x: i for i, x in enumerate(user_ids)}
userencoded2user = {i: x for i, x in enumerate(user_ids)}



movie_ids = training_df["movieIdx"].unique().tolist()

df = training_df.copy()
df["user"] = df["userId"].map(user2user_encoded)
df["movie"] = df["movieIdx"]

num_users = len(user2user_encoded)
num_movies = len(movie_ids)

df["rating"] = training_df["rating"].values.astype(np.float32)
# min and max ratings will be used to normalize the ratings later
min_rating = min(df["rating"])
max_rating = max(df["rating"])

print(
    "Number of users: {}, Number of Movies: {}, Min rating: {}, Max rating: {}".format(
        num_users, num_movies, min_rating, max_rating
    )
)

In [None]:
df = df.sample(frac=1, random_state=42)
x = df[["user", "movie"]].values
# Normalize the targets between 0 and 1. Makes it easy to train.
y = df["rating"].apply(lambda x: (x - min_rating) / (max_rating - min_rating)).values
# Assuming training on 90% of the data and validating on 10%.
train_indices = int(0.9 * df.shape[0])
x_train, x_val, y_train, y_val = (
    x[:train_indices],
    x[train_indices:],
    y[:train_indices],
    y[train_indices:],
)

In [None]:
from tensorflow.keras.layers import Embedding, multiply, concatenate, Flatten, Input, Dense
from tensorflow.keras import optimizers as opt

# from sklearn.model_selection import train_test_split
# from keras.layers import Input, Embedding, Flatten, Dot, Dense, Concatenate
from keras.models import Model



EMBEDDING_SIZE = 500
num_unique_users = num_users
num_unique_movies = num_movies
users_input = Input(shape=(1,), name="users_input")
users_embedding = Embedding(num_unique_users + 1, EMBEDDING_SIZE, name="users_embeddings")(users_input)
users_bias = Embedding(num_unique_users + 1, 1, name="users_bias")(users_input)

movies_input = Input(shape=(1,), name="movies_input")
movies_embedding = Embedding(num_unique_movies + 1, EMBEDDING_SIZE, name="movies_embedding")(movies_input)
movies_bias = Embedding(num_unique_movies + 1, 1, name="movies_bias")(movies_input)

dot_product_users_movies = multiply([users_embedding, movies_embedding])
input_terms = dot_product_users_movies + users_bias + movies_bias
input_terms = Flatten(name="fl_inputs")(input_terms)
# output = Dense(1, activation="relu", name="output")(input_terms) 

output = Dense(1, activation="sigmoid", name="output")(input_terms) 
output = output * (max_rating - min_rating) + min_rating


model = Model(inputs=[users_input, movies_input], outputs=output)

opt_adam = opt.Adam(learning_rate = 0.005)
model.compile(optimizer=opt_adam, loss= ['mse'], metrics=['mean_absolute_error'])

In [None]:
model.summary()

In [None]:
df_train, df_val = train_test_split(df, random_state=42, test_size=0.2, stratify=df.rating)

In [None]:
history = model.fit(
    x=[df_train.user.to_numpy(), df_train.movie.to_numpy()],
    y=df_train.rating.to_numpy(),
    batch_size=200,
    epochs=10,
    verbose=1,
    validation_data=([df_val.user.to_numpy(), df_val.movie.to_numpy()],df_val.rating.to_numpy()))

In [None]:
number_of_preds = 100
movies = df.sample(n=number_of_preds).movie.to_list()
user_list = df.sample(n=1).user.to_list() * number_of_preds
use_id = False
if use_id:
    user_list = [user2user_encoded.get(1)] * number_of_preds
preds = model.predict(x=[np.array(user_list), np.array(movies)])
preds

In [None]:
suggestions = []
user_id = userencoded2user.get(user_list[0])

suggestions_df = movies_df.copy()[movies_df['movieIdx'].isin(movies)]
suggestions_df['userId'] = user_id

suggestions_df['score'] = suggestions_df['movieIdx'].apply(lambda x: preds[movies.index(x)][0])

for i, movieIdx in enumerate(movies):
    pred_rank = preds[i][0]
    print(user_id, movieIdx, pred_rank)

In [None]:
user_ratings = rating_df.copy()[rating_df.userId == suggestions_df.userId.tolist()[0]]
user_ratings.rating.describe()

In [None]:
suggestions_df.sort_values(by=['score'], inplace=True, ascending=False)
suggestions_df.head()

Save the model for reuse

In [None]:
model.save("my-model.h5")