# **Recommendation System using Deep learning (Collaborative Based)**

The Dataset used in this notebook contains 278,858 users voted 271,379 books in total 1,149,780 ratings.It has user features like user id,age and their region. The features for each book are isbn, year of publication, publisher, its cover image and so on.

Keras library in Python was used for designing neural network to recommend books to users based on similarities with others.

* **Importing Libraries and defining dataframe**

In [None]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

In [None]:
df = pd.read_csv('../input/bookcrossing-dataset/Books Data with Category Language and Summary/Preprocessed_data.csv')
df

*  **Preprocessing**

Eventhough the csv file that I used was preprocessed, I checked null values and dataframe info.

In [None]:
df.columns

In [None]:
df.info()

In [None]:
#Dropping records includes NaN values
df.dropna(inplace=True)

In [None]:
#Managing NaN values of dataset
df.isna().sum()

* **Recommendation System**

the following recommendation system was developed for a subset of this dataset,the users who voted in Iran.It can be expanded to another region or whole of the dataset.The only reason for this approach was to reducing the time of processing.

In [None]:
df_ir = df.loc[(df["country"] == "iran"), ['user_id','isbn','rating']]
df_ir

In [None]:
user_ids = df_ir["user_id"].unique().tolist()
user2user_encoded = {x: i for i, x in enumerate(user_ids)}
userencoded2user = {i: x for i, x in enumerate(user_ids)}
book_ids = df_ir["isbn"].unique().tolist()
book2book_encoded = {x: i for i, x in enumerate(book_ids)}
book_encoded2book = {i: x for i, x in enumerate(book_ids)}
df_ir["user"] = df_ir["user_id"].map(user2user_encoded)
df_ir["book"] = df_ir["isbn"].map(book2book_encoded)

num_users = len(user2user_encoded)
num_books = len(book_encoded2book)
df_ir["rating"] = df_ir["rating"].values.astype(np.float32)

# min and max ratings will be used to normalize the ratings
min_rating = min(df_ir["rating"])
max_rating = max(df_ir["rating"])

print("Number of users: {}, Number of Books: {}, Min rating: {}, Max rating: {}".format(num_users, num_books, min_rating, max_rating))


In [None]:
df_ir= df_ir.sample(frac=1, random_state=42)
x = df_ir[["user", "book"]].values

# Normalize the targets between 0 and 1 (it's easier to train)
y = df_ir["rating"].apply(lambda x: (x - min_rating) / (max_rating - min_rating)).values

# training and validating on 80%/20%.
x_train = df_ir[['user_id', 'isbn']].values
y = df_ir['rating'].values
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=42)
x_train.shape, x_val.shape, y_train.shape, y_val.shape

In [None]:
class RecommenderNet(keras.Model):
    def __init__(self, num_users, num_books, embedding_size, **kwargs):
        super(RecommenderNet, self).__init__(**kwargs)
        self.num_users = num_users
        self.num_books = num_books
        self.embedding_size = embedding_size
        self.user_embedding = layers.Embedding(
            num_users,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6),
        )
        self.user_bias = layers.Embedding(num_users, 1)
        self.book_embedding = layers.Embedding(
            num_books,
            embedding_size,
            embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6),
        )
        self.book_bias = layers.Embedding(num_books, 1)

    def call(self, inputs):
        user_vector = self.user_embedding(inputs[:, 0])
        user_bias = self.user_bias(inputs[:, 0])
        book_vector = self.book_embedding(inputs[:, 1])
        book_bias = self.book_bias(inputs[:, 1])
        dot_user_book = tf.tensordot(user_vector, book_vector, 2)
        # Add all the components (including bias)
        x = dot_user_book + user_bias + book_bias
        # The sigmoid activation forces the rating to between 0 and 1
        return tf.nn.sigmoid(x)

EMBEDDING_SIZE = 50
    
model = RecommenderNet(num_users, num_books, EMBEDDING_SIZE)
model.compile(loss=tf.keras.losses.BinaryCrossentropy(), optimizer=keras.optimizers.Adam(lr=0.000002))

In [None]:
history = model.fit(
    x=x_train,
    y=y_train,
    batch_size=64,
    epochs=5,
    verbose=1,
    validation_data=(x_val, y_val),
)

In [None]:
book_df = df.loc[(df["country"] == "iran") & (df["Category"] != "9"), ['isbn','book_title','Category']]
book_df

In [None]:
# Top recommendations for a random user

user_id = df_ir.user_id.sample(3).iloc[0]
books_read_by_user = df_ir[df_ir.user_id == user_id]
books_not_read = book_df[~book_df["isbn"].isin(books_read_by_user.isbn.values)]["isbn"]
books_not_read = list(set(books_not_read).intersection(set(book2book_encoded.keys())))

books_not_read = [[book2book_encoded.get(x)] for x in books_not_read]

user_encoder = user2user_encoded.get(user_id)

user_book_array = np.hstack(([[user_encoder]] * len(books_not_read), books_not_read))\

ratings = model.predict(user_book_array).flatten()

top_ratings_indices = ratings.argsort()[-10:][::-1]

recommended_book_ids = [book_encoded2book.get(books_not_read[x][0]) for x in top_ratings_indices]

print("Showing recommendations for user: {}".format(user_id))
print("====" * 9)
print("Books with high ratings from user")
print("----" * 8)
top_books_user = ( books_read_by_user.sort_values(by="rating", ascending=False).head(5).isbn.values)
book_df_rows = book_df[book_df["isbn"].isin(top_books_user)]
for row in book_df_rows.itertuples():
    print(row.book_title, ":", row.Category)

print("----" * 8)
print("Top 10 Books recommendations")
print("----" * 8)
recommended_books = book_df[book_df["isbn"].isin(recommended_book_ids)]
for row in recommended_books.itertuples():
    print(row.book_title, ":", row.Category)

In [None]:
#accuracy of this Recommendation System
print('Accuracy: %.2f' % (model.evaluate(x,y)*100))