# Book Recommendation System using Keras

This notebook presents a basic book recommendation system using Keras, which is using collaborative filtering. The solution is presented as a model that uses embeddings to represent users and books in a low dimensional space, and then combines these embeddings and runs them through a neural network to predict books.

## Data Loading

[dataset](https://github.com/zygmuntz/goodbooks-10k/tree/master)




In [93]:
import pandas as pd

ratings = pd.read_csv('https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/ratings.csv')

ratings.head()

Unnamed: 0,user_id,book_id,rating
0,1,258,5
1,2,4081,4
2,2,260,5
3,2,9296,5
4,2,2318,3


In [94]:
books = pd.read_csv('https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/books.csv')

books.head()

Unnamed: 0,book_id,goodreads_book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


## Preparation data

In [118]:
books_information = books[['book_id', 'authors', 'original_publication_year', 'original_title', 'image_url']]

books_information.head()

Unnamed: 0,book_id,authors,original_publication_year,original_title,image_url
0,1,Suzanne Collins,2008.0,The Hunger Games,https://images.gr-assets.com/books/1447303603m...
1,2,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,https://images.gr-assets.com/books/1474154022m...
2,3,Stephenie Meyer,2005.0,Twilight,https://images.gr-assets.com/books/1361039443m...
3,4,Harper Lee,1960.0,To Kill a Mockingbird,https://images.gr-assets.com/books/1361975680m...
4,5,F. Scott Fitzgerald,1925.0,The Great Gatsby,https://images.gr-assets.com/books/1490528560m...


In [97]:
from sklearn.model_selection import train_test_split

# Split the data into training and testing sets
train_data, test_data = train_test_split(ratings, test_size=0.2, random_state=42)

## Model

In [98]:
# !pip install tensorflow
# !pip install keras

In [99]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Flatten, Dense, Concatenate
from tensorflow.keras.optimizers import Adam

### Define the model

In [100]:
num_users = len(ratings['user_id'].unique())
num_books = len(ratings['book_id'].unique())
embedding_size = 10

In [None]:
# Define input layers

user_input = Input(shape=(1,), name='user_input')
book_input = Input(shape=(1,), name='book_input')

In [None]:
# Create embeddings for users and books

user_embedding = Embedding(input_dim=num_users+1, output_dim=embedding_size, input_length=1)(user_input)
book_embedding = Embedding(input_dim=num_books+1, output_dim=embedding_size, input_length=1)(book_input)

In [102]:
# Flatten the embeddings

user_flat = Flatten()(user_embedding)
book_flat = Flatten()(book_embedding)

In [104]:
# Concatenate user and book embeddings

concatenated = Concatenate()([book_flat, user_flat])

In [106]:
# Build a neural network

dense1 = Dense(128, activation='relu')(concatenated)
dense2 = Dense(32, activation='relu')(dense1)
output = Dense(1)(dense2)

In [107]:
model = Model(inputs=[user_input, book_input], outputs=output)

In [108]:
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

### Train the model

In [140]:
from keras.models import load_model
import os

if os.path.exists('model.keras'):
    model = load_model('model.keras')
else:
    model.fit([train_data['user_id'], train_data['book_id']], train_data['rating'], epochs=5, batch_size=64, validation_split=0.2)
    model.save('model.keras')

In [110]:
# Evaluate the model on the test set
loss = model.evaluate([test_data['user_id'], test_data['book_id']], test_data['rating'])
print(f'Test Loss: {loss}')

Test Loss: 0.6956519484519958


### Prediction function

In [131]:
import numpy as np

def predict_books(user_id, ratings, n=5):

    book_data = np.array(list(set(ratings.book_id)))

    user = np.array([user_id for i in range(len(book_data))])

    predictions = model.predict([user, book_data])

    predictions = np.array([a[0] for a in predictions])

    recommended_book_ids = (-predictions).argsort()[:n]

    return recommended_book_ids

## Test

In [141]:
user_id = 4

recommended_book_ids = predict_books(user_id, ratings)

books_information[books_information['book_id'].isin(recommended_book_ids)][['authors', 'original_publication_year', 'original_title']]



Unnamed: 0,authors,original_publication_year,original_title
2099,Frank Miller,1986.0,The Dark Knight Returns
3052,Janet Evanovich,2014.0,Top Secret Twenty-One
7829,Clive Cussler,1996.0,Shock Wave
8369,Francisco X. Stork,2008.0,Marcelo in the Real World
9921,Tad Williams,2001.0,Sea of Silver Light


In [142]:
user_id = 29

recommended_book_ids = predict_books(user_id, ratings)

books_information[books_information['book_id'].isin(recommended_book_ids)][['authors', 'original_publication_year', 'original_title']]



Unnamed: 0,authors,original_publication_year,original_title
3626,John Flanagan,2011.0,
7881,Deborah Rodriguez,2011.0,A Cup of Friendship
7945,Naomi Wolf,1990.0,The Beauty Myth: How Images of Beauty Are Used...
8944,Lisa Gardner,2001.0,The Next Accident
8976,A.J. Jacobs,2012.0,


In [143]:
user_id = 55

recommended_book_ids = predict_books(user_id, ratings)

books_information[books_information['book_id'].isin(recommended_book_ids)][['authors', 'original_publication_year', 'original_title']]



Unnamed: 0,authors,original_publication_year,original_title
2867,Abbi Glines,2004.0,
4507,Joanne Harris,2001.0,Five Quarters of the Orange
4605,Isabel Allende,1994.0,Paula
5879,Megan Whalen Turner,2000.0,The Queen of Attolia
8944,Lisa Gardner,2001.0,The Next Accident


In [144]:
user_id = 128

recommended_book_ids = predict_books(user_id, ratings)

books_information[books_information['book_id'].isin(recommended_book_ids)][['authors', 'original_publication_year', 'original_title']]



Unnamed: 0,authors,original_publication_year,original_title
420,Paula McLain,2011.0,The Paris Wife
3273,Dr. Seuss,1963.0,Hop on Pop
3751,Agatha Christie,1924.0,Poirot Investigates
7945,Naomi Wolf,1990.0,The Beauty Myth: How Images of Beauty Are Used...
8944,Lisa Gardner,2001.0,The Next Accident


In [145]:
user_id = 429

recommended_book_ids = predict_books(user_id, ratings)

books_information[books_information['book_id'].isin(recommended_book_ids)][['authors', 'original_publication_year', 'original_title']]



Unnamed: 0,authors,original_publication_year,original_title
420,Paula McLain,2011.0,The Paris Wife
3626,John Flanagan,2011.0,
3751,Agatha Christie,1924.0,Poirot Investigates
7829,Clive Cussler,1996.0,Shock Wave
8944,Lisa Gardner,2001.0,The Next Accident
