# Neural Collaborative Filtering:

This Neural Collaborative Filtering (NCF) code creates a deep learning-based recommender system for suggesting books that are likely to interest a user based on their past preferences and the preferences of similar users. T

Here's a brief overview of what the code does:

- It imports necessary libraries and loads the dataset, merging and filtering it to retain only the required columns.
- It encodes the user IDs and book IDs to ensure they are numerical and continuous.
-It splits the dataset into training and testing sets.
-It defines the NCF model using TensorFlow, with separate user and book embeddings that are combined in a multi-layer dense network.
-It trains the model on the training dataset and validates it on the testing dataset.
-It defines a recommendation function that takes a book title as input and returns a list of books that are likely to be rated highly by users who also liked the input book.
-It evaluates the performance of the model by predicting book ratings for the test dataset and comparing them with the actual ratings, calculating the mean squared error.
-It demonstrates how to use the recommendation function with a sample book title.


- *Due to limitations in computatiional ower, do run this code using Google colab. Also, change the runtime type to GPU to aid the running time of the epochs**

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.layers import Embedding, Input, Dot, Flatten, Concatenate, Dense
from tensorflow.keras.models import Model
import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity
from gensim.models import Word2Vec
from sklearn.metrics import mean_squared_error, mean_absolute_error


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
# Load and preprocess the dataset
ratings = pd.read_csv('/content/drive/MyDrive/Recommender System/BX-Book-Ratings.csv', delimiter=';', encoding='latin-1')
books = pd.read_csv('/content/drive/MyDrive/Recommender System/BX_Books.csv', delimiter=';', encoding='latin-1')
users = pd.read_csv('/content/drive/MyDrive/Recommender System/BX-Users.csv', delimiter=';', encoding='latin-1')

# Merge the datasets
merged = ratings.merge(users, on='User-ID', how='left').merge(books, on='ISBN', how='left')

# Filter out unused data
filtered = merged[['User-ID', 'ISBN', 'Book-Rating', 'Book-Title']].dropna()
filtered.shape

(1031175, 4)

In [4]:
filtered.head()

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title
0,276725,034545104X,0,Flesh Tones: A Novel
1,276726,0155061224,5,Rites of Passage
2,276727,0446520802,0,The Notebook
3,276729,052165615X,3,Help!: Level 1
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...


In [5]:
# Encode users and books
user_enc = LabelEncoder()
filtered['user_id'] = user_enc.fit_transform(filtered['User-ID'].values)
book_enc = LabelEncoder()
filtered['book_id'] = book_enc.fit_transform(filtered['ISBN'].values)

# Split the dataset
train, test = train_test_split(filtered, test_size=0.2, random_state=42)

In [6]:
# Define the neural collaborative filtering model
embedding_size = 50

user_input = tf.keras.layers.Input(shape=[1], name='user_input')
user_embedding = tf.keras.layers.Embedding(input_dim=len(user_enc.classes_), output_dim=embedding_size, name='user_embedding')(user_input)
user_vec = tf.keras.layers.Flatten(name='user_vec')(user_embedding)

book_input = tf.keras.layers.Input(shape=[1], name='book_input')
book_embedding = tf.keras.layers.Embedding(input_dim=len(book_enc.classes_), output_dim=embedding_size, name='book_embedding')(book_input)
book_vec = tf.keras.layers.Flatten(name='book_vec')(book_embedding)

concat = tf.keras.layers.Concatenate()([user_vec, book_vec])
dense = tf.keras.layers.Dense(256, activation='relu')(concat)
dense = tf.keras.layers.Dense(128, activation='relu')(dense)
dense = tf.keras.layers.Dense(64, activation='relu')(dense)
output = tf.keras.layers.Dense(1)(dense)


In [7]:
model = tf.keras.models.Model(inputs=[user_input, book_input], outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error')

In [8]:
# Train the model
X_train = [train['user_id'].values, train['book_id'].values]
y_train = train['Book-Rating'].values
X_test = [test['user_id'].values, test['book_id'].values]
y_test = test['Book-Rating'].values

model.fit(X_train, y_train, validation_data=(X_test, y_test), validation_split=0.1, batch_size=128, epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f1ded96e400>

In [9]:
# Create a recommendation function
def recommend_books(book_title, n_recommendations=5):
    book_id = filtered.loc[filtered['Book-Title'] == book_title]['book_id'].iloc[0]
    user_ids = np.array(list(set(filtered['user_id'].values)))

    book_ids = np.array([book_id] * len(user_ids))
    predictions = model.predict([user_ids, book_ids]).flatten()

    top_indices = predictions.argsort()[-n_recommendations:][::-1]
    top_user_ids = user_ids[top_indices]
    recommended_book_ids =[]
    for user_id in top_user_ids:
        user_ratings = filtered.loc[filtered['user_id'] == user_id]
        top_book_id = user_ratings.sort_values('Book-Rating', ascending=False)['book_id'].iloc[0]
        recommended_book_ids.append(top_book_id)

    recommended_books = book_enc.inverse_transform(recommended_book_ids)
    return books[books['ISBN'].isin(recommended_books)]['Book-Title'].values

In [10]:
# Prediction and performance evaluation

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absoluate Error: {mae}')
rmse = np.sqrt(mse)
print(f'Root Mean Squared Error: {rmse}')


Mean Squared Error: 16.655926075378932
Mean Absoluate Error: 2.724039026417105
Root Mean Squared Error: 4.081167244230373


In [11]:
# Test the recommendation function
book_title = 'The Da Vinci Code'
recommended_books = recommend_books(book_title)
print(f'Recommended books for "{book_title}":')
for book in recommended_books:
    print(f'- {book}')

Recommended books for "The Da Vinci Code":
- The Da Vinci Code
- Kane & Abel
- Sherlock Holmes: The Complete Novels and Stories (Sherlock Holmes)
- Time Bomb (Alex Delaware Novels (Paperback))
- Canoe Country Wildlife: A Field Guide to the North Woods and Boundary Waters


#Content-Based Collaborative Filtering

In [12]:
!pip install gensim

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [13]:
merged = ratings.merge(users, on='User-ID', how='left').merge(books, on='ISBN', how='left')
filtered = merged[['User-ID', 'ISBN', 'Book-Rating', 'Book-Title', 'Book-Author']].dropna()


In [14]:
filtered.head(10)

Unnamed: 0,User-ID,ISBN,Book-Rating,Book-Title,Book-Author
0,276725,034545104X,0,Flesh Tones: A Novel,M. J. Rose
1,276726,0155061224,5,Rites of Passage,Judith Rae
2,276727,0446520802,0,The Notebook,Nicholas Sparks
3,276729,052165615X,3,Help!: Level 1,Philip Prowse
4,276729,0521795028,6,The Amsterdam Connection : Level 4 (Cambridge ...,Sue Leather
5,276733,2080674722,0,Les Particules Elementaires,Michel Houellebecq
8,276744,038550120X,7,A Painted House,JOHN GRISHAM
10,276746,0425115801,0,Lightning,Dean R. Koontz
11,276746,0449006522,0,Manhattan Hunt Club,JOHN SAUL
12,276746,0553561618,0,Dark Paradise,TAMI HOAG


In [15]:
filtered.shape

(1031174, 5)

## Recommendation by Book Autor and similar books with ratings

In [16]:
#Extract features from book authors using Word2Vec
authors = [row.split() for row in filtered['Book-Author'].values]
word2vec = Word2Vec(authors, size=100, window=5, min_count=1, workers=4)

In [17]:
def author_embedding(author):
    words = author.split()
    embeddings = [word2vec.wv[word] for word in words if word in word2vec.wv]
    return np.mean(embeddings, axis=0)

filtered['author_features'] = filtered['Book-Author'].apply(author_embedding).values.tolist()

In [18]:
# Split the dataset
train, test = train_test_split(filtered, test_size=0.2, random_state=42)

In [19]:
# Define the neural network model
input_layer = tf.keras.layers.Input(shape=(100,))
dense = tf.keras.layers.Dense(256, activation='relu')(input_layer)
dense = tf.keras.layers.Dense(128, activation='relu')(dense)
dense = tf.keras.layers.Dense(64, activation='relu')(dense)
output = tf.keras.layers.Dense(1)(dense)

model = tf.keras.models.Model(inputs=input_layer, outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error')

In [20]:
X_train = np.array(train['author_features'].values.tolist())
y_train = train['Book-Rating'].values
X_test = np.array(test['author_features'].values.tolist())
y_test = test['Book-Rating'].values

model.fit(X_train, y_train, validation_data=(X_test, y_test),validation_split=0.1, batch_size=128, epochs=10)



Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f1cb3a661c0>

In [21]:
# Recommendation function
def recommend_books_by_author(query, n_recommendations=5):
    query_features = author_embedding(query)
    book_features = np.array(filtered['author_features'].values.tolist())
    similarities = np.dot(query_features, book_features.T)
    top_indices = np.argsort(similarities)[-n_recommendations:][::-1]
    return filtered.iloc[top_indices][['Book-Title', 'Book-Rating']].values



# Prediction and performance evaluation
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absoluate Error: {mae}')
rmse = np.sqrt(mse)
print(f'Root Mean Squared Error: {rmse}')


Mean Squared Error: 14.55380140412487
Mean Absoluate Error: 3.4984036164277743
Root Mean Squared Error: 3.8149444824433383


In [22]:
query = 'JOHN GRISHAM'
recommended_books = recommend_books_by_author(query)
print(f'Recommended books for "{query}":')
for book, rating in recommended_books:
    print(f'- {book} (Rating: {rating})')

Recommended books for "JOHN GRISHAM":
- The King of Torts (Rating: 0)
- The King of Torts (Rating: 0)
- A Time to Kill (Rating: 8)
- The Street Lawyer (Rating: 7)
- The Runaway Jury (Rating: 10)


## Recommendations by book title and similar books with ratings

In [23]:
# Extract features from book titles using Word2Vec
titles = [row.split() for row in filtered['Book-Title'].values]
word2vec = Word2Vec(titles, size=100, window=5, min_count=1, workers=4)

In [24]:
def title_embedding(title):
    words = title.split()
    embeddings = [word2vec.wv[word] for word in words if word in word2vec.wv]
    return np.mean(embeddings, axis=0)

filtered['title_features'] = filtered['Book-Title'].apply(title_embedding).values.tolist()

In [25]:
# Split the dataset
train, test = train_test_split(filtered, test_size=0.2, random_state=42)

In [26]:
# Define the neural network model
input_layer = tf.keras.layers.Input(shape=(100,))
dense = tf.keras.layers.Dense(256, activation='relu')(input_layer)
dense = tf.keras.layers.Dense(128, activation='relu')(dense)
dense = tf.keras.layers.Dense(64, activation='relu')(dense)
output = tf.keras.layers.Dense(1)(dense)

model = tf.keras.models.Model(inputs=input_layer, outputs=output)
model.compile(optimizer='adam', loss='mean_squared_error')

In [27]:
# Train the model
X_train = np.array(train['title_features'].values.tolist())
y_train = train['Book-Rating'].values
X_test = np.array(test['title_features'].values.tolist())
y_test = test['Book-Rating'].values

model.fit(X_train, y_train, validation_data=(X_test, y_test), validation_split=0.1, batch_size=128, epochs=10)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7f1ded9c12e0>

In [28]:
# Recommendation function
def recommend_books_by_title(query, n_recommendations=5):
    query_features = title_embedding(query)
    book_features = np.array(filtered['title_features'].values.tolist())
    similarities = np.dot(query_features, book_features.T)
    top_indices = np.argsort(similarities)[-n_recommendations:][::-1]
    return filtered.iloc[top_indices][['Book-Title', 'Book-Rating']].values


In [29]:
# Prediction and performance evaluation
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
mae = mean_absolute_error(y_test, y_pred)
print(f'Mean Absoluate Error: {mae}')
rmse = np.sqrt(mse)
print(f'Root Mean Squared Error: {rmse}')

Mean Squared Error: 14.523797161486439
Mean Absoluate Error: 3.4666857348443996
Root Mean Squared Error: 3.81100999230997


In [30]:
# Test the recommendation function
query = 'Harry Potter'
recommended_books = recommend_books_by_title(query)
print(f'Recommended books for "{query}":')
for book, rating in recommended_books:
    print(f'- {book} (Rating: {rating})')


Recommended books for "Harry Potter":
- The Potter (Rating: 10)
- Great Harry (Rating: 0)
- Harry Potter Fun Book (Rating: 0)
- Harry Potter Stationery Kit (Rating: 0)
- Harry Potter Stationery Kit (Rating: 0)
