# User-Based Collaborative Filtering

This notebook presents the most elementary solution to the problem. The solution is to use User-Based Collaborative Filtering. User-Based Collaborative Filtering is a recommendation system technology that suggests products based on the preferences of users similar to the target user. The solution is presented in the form of an algorithm that calculates cosine similarity between users and based on this predicts whether a user will like a particular book or not

## Data Loading

[dataset](https://github.com/zygmuntz/goodbooks-10k/tree/master)

In [102]:
import pandas as pd

ratings = pd.read_csv('https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/ratings.csv')

ratings.head()

Unnamed: 0,user_id,book_id,rating
0,1,258,5
1,2,4081,4
2,2,260,5
3,2,9296,5
4,2,2318,3


In [103]:
ratings = ratings[:10000]

In [104]:
ratings

Unnamed: 0,user_id,book_id,rating
0,1,258,5
1,2,4081,4
2,2,260,5
3,2,9296,5
4,2,2318,3
...,...,...,...
9995,454,1214,5
9996,454,129,5
9997,454,4,5
9998,454,189,5


In [105]:
books = pd.read_csv('https://raw.githubusercontent.com/zygmuntz/goodbooks-10k/master/books.csv')

books.head()

Unnamed: 0,book_id,goodreads_book_id,best_book_id,work_id,books_count,isbn,isbn13,authors,original_publication_year,original_title,...,ratings_count,work_ratings_count,work_text_reviews_count,ratings_1,ratings_2,ratings_3,ratings_4,ratings_5,image_url,small_image_url
0,1,2767052,2767052,2792775,272,439023483,9780439000000.0,Suzanne Collins,2008.0,The Hunger Games,...,4780653,4942365,155254,66715,127936,560092,1481305,2706317,https://images.gr-assets.com/books/1447303603m...,https://images.gr-assets.com/books/1447303603s...
1,2,3,3,4640799,491,439554934,9780440000000.0,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,...,4602479,4800065,75867,75504,101676,455024,1156318,3011543,https://images.gr-assets.com/books/1474154022m...,https://images.gr-assets.com/books/1474154022s...
2,3,41865,41865,3212258,226,316015849,9780316000000.0,Stephenie Meyer,2005.0,Twilight,...,3866839,3916824,95009,456191,436802,793319,875073,1355439,https://images.gr-assets.com/books/1361039443m...,https://images.gr-assets.com/books/1361039443s...
3,4,2657,2657,3275794,487,61120081,9780061000000.0,Harper Lee,1960.0,To Kill a Mockingbird,...,3198671,3340896,72586,60427,117415,446835,1001952,1714267,https://images.gr-assets.com/books/1361975680m...,https://images.gr-assets.com/books/1361975680s...
4,5,4671,4671,245494,1356,743273567,9780743000000.0,F. Scott Fitzgerald,1925.0,The Great Gatsby,...,2683664,2773745,51992,86236,197621,606158,936012,947718,https://images.gr-assets.com/books/1490528560m...,https://images.gr-assets.com/books/1490528560s...


## Preparation data

In [106]:
books_information = books[['book_id', 'authors', 'original_publication_year', 'original_title', 'image_url']]

books_information.head()

Unnamed: 0,book_id,authors,original_publication_year,original_title,image_url
0,1,Suzanne Collins,2008.0,The Hunger Games,https://images.gr-assets.com/books/1447303603m...
1,2,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,https://images.gr-assets.com/books/1474154022m...
2,3,Stephenie Meyer,2005.0,Twilight,https://images.gr-assets.com/books/1361039443m...
3,4,Harper Lee,1960.0,To Kill a Mockingbird,https://images.gr-assets.com/books/1361975680m...
4,5,F. Scott Fitzgerald,1925.0,The Great Gatsby,https://images.gr-assets.com/books/1490528560m...


In [107]:
# Create a user-item matrix
user_item_matrix = ratings.pivot(index='user_id', columns='book_id', values='rating')

user_item_matrix.head()

book_id,2,4,5,7,8,9,10,11,13,14,...,9946,9961,9962,9966,9972,9978,9990,9991,9995,9998
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,,,,,,,,,,,...,,,,,,,,,,
2,,,,,,,,,,,...,,,,,,,,,,
4,5.0,4.0,4.0,,4.0,,5.0,4.0,4.0,,...,,,,,,,,,,
6,,,,,,,,,,,...,,,,,,,,,,
8,,,,,,,,,,5.0,...,,,,,,,,,,


In [108]:
# Fill missing values with 0 (assuming no rating means a rating of 0)
user_item_matrix = user_item_matrix.fillna(0)

In [109]:
from sklearn.metrics.pairwise import cosine_similarity

# Calculate cosine similarity between users
user_similarity = cosine_similarity(user_item_matrix)

In [110]:
user_similarity_df = pd.DataFrame(user_similarity, index=user_item_matrix.index, columns=user_item_matrix.index)

user_similarity_df.head()

user_id,1,2,4,6,8,9,10,11,15,18,...,429,439,440,444,446,447,449,452,453,454
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.138961,0.0,0.0,...,0.0,0.031009,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,1.0,0.074892,0.0,0.0,0.045821,0.056546,0.099995,0.061097,0.016912,...,0.0,0.028689,0.0,0.032067,0.059878,0.031559,0.0,0.0,0.0,0.0
4,0.0,0.074892,1.0,0.0,0.036807,0.179414,0.230415,0.137863,0.254108,0.132901,...,0.0,0.275114,0.0,0.209854,0.29059,0.296593,0.114018,0.039761,0.183369,0.070058
6,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.036807,0.0,1.0,0.080508,0.023882,0.0,0.025805,0.057141,...,0.0,0.01454,0.0,0.069344,0.0,0.013329,0.126338,0.0,0.176947,0.0


## User-Based Collaborative Filtering

In [111]:
def get_recommendations(user_id, n=5):

    user_ratings = user_item_matrix.loc[user_id].values.reshape(1, -1)

    # Calculate the similarity between the user and all other users
    similarities = user_similarity_df.loc[user_id].values.reshape(1, -1)

    # Predict the user's ratings for all books
    predicted_ratings = similarities.dot(user_item_matrix.values) / similarities.sum()


    # Mask out books the user has already rated
    user_ratings_mask = user_ratings != 0
    predicted_ratings[user_ratings_mask] = 0


    top_books_indices = predicted_ratings.argsort()[0, ::-1][:n]

    top_books_ids = user_item_matrix.columns[top_books_indices]

    return top_books_ids

## Test

In [112]:
# Example
user_id = 4
recommendations = get_recommendations(user_id)

books_information[books_information['book_id'].isin(recommendations)][['authors', 'original_publication_year', 'original_title']]

Unnamed: 0,authors,original_publication_year,original_title
13,George Orwell,1945.0,Animal Farm: A Fairy Story
59,Mark Haddon,2003.0,The Curious Incident of the Dog in the Night-Time
79,"Antoine de Saint-Exupéry, Richard Howard, Dom ...",1946.0,Le Petit Prince
93,"Gabriel García Márquez, Gregory Rabassa",1967.0,Cien años de soledad
114,Jeffrey Eugenides,2002.0,Middlesex


In [113]:
user_id = 29
recommendations = get_recommendations(user_id)

books_information[books_information['book_id'].isin(recommendations)][['authors', 'original_publication_year', 'original_title']]

Unnamed: 0,authors,original_publication_year,original_title
4,F. Scott Fitzgerald,1925.0,The Great Gatsby
7,J.D. Salinger,1951.0,The Catcher in the Rye
9,Jane Austen,1813.0,Pride and Prejudice
17,"J.K. Rowling, Mary GrandPré, Rufus Beck",1999.0,Harry Potter and the Prisoner of Azkaban
25,Dan Brown,2003.0,The Da Vinci Code


In [114]:
user_id = 55
recommendations = get_recommendations(user_id)

books_information[books_information['book_id'].isin(recommendations)][['authors', 'original_publication_year', 'original_title']]

Unnamed: 0,authors,original_publication_year,original_title
3,Harper Lee,1960.0,To Kill a Mockingbird
4,F. Scott Fitzgerald,1925.0,The Great Gatsby
9,Jane Austen,1813.0,Pride and Prejudice
10,Khaled Hosseini,2003.0,The Kite Runner
93,"Gabriel García Márquez, Gregory Rabassa",1967.0,Cien años de soledad


In [115]:
user_id = 128
recommendations = get_recommendations(user_id)

books_information[books_information['book_id'].isin(recommendations)][['authors', 'original_publication_year', 'original_title']]

Unnamed: 0,authors,original_publication_year,original_title
10,Khaled Hosseini,2003.0,The Kite Runner
32,Arthur Golden,1997.0,Memoirs of a Geisha
59,Mark Haddon,2003.0,The Curious Incident of the Dog in the Night-Time
100,David Sedaris,2000.0,Me Talk Pretty One Day
371,David Sedaris,2004.0,Dress Your Family in Corduroy and Denim


In [116]:
user_id = 429
recommendations = get_recommendations(user_id)

books_information[books_information['book_id'].isin(recommendations)][['authors', 'original_publication_year', 'original_title']]

Unnamed: 0,authors,original_publication_year,original_title
10,Khaled Hosseini,2003.0,The Kite Runner
56,Sue Monk Kidd,2001.0,The Secret Life of Bees
100,David Sedaris,2000.0,Me Talk Pretty One Day
149,Anita Diamant,1997.0,The Red Tent
866,Nicole Krauss,2005.0,The History of Love
