## Book Recommendation Engine using KNN
You will be working on this project with Google Colaboratory.

After going to that link, create a copy of the notebook either in your own account or locally. Once you complete the project and it passes the test (included at that link), submit your project link below. If you are submitting a Google Colaboratory link, make sure to turn on link sharing for "anyone with the link."

We are still developing the interactive instructional content for the machine learning curriculum. For now, you can go through the video challenges in this certification. You may also have to seek out additional learning resources, similar to what you would do when working on a real-world project.
In this challenge, you will create a book recommendation algorithm using K-Nearest Neighbors.

You will use the Book-Crossings dataset (http://www2.informatik.uni-freiburg.de/~cziegler/BX/). This dataset contains 1.1 million ratings (scale of 1-10) of 270,000 books by 90,000 users.

After importing and cleaning the data, use **`NearestNeighbors`** from **`sklearn.neighbors`** to develop a model that shows books that are similar to a given book. The Nearest Neighbors algorithm measures the distance to determine the “closeness” of instances.

Create a function named **`get_recommends`** that takes a book title (from the dataset) as an argument and returns a list of 5 similar books with their distances from the book argument.

This code:

In [None]:
get_recommends("The Queen of the Damned (Vampire Chronicles (Paperback))")

should return:

In [None]:
[
  'The Queen of the Damned (Vampire Chronicles (Paperback))',
  [
    ['Catch 22', 0.793983519077301], 
    ['The Witching Hour (Lives of the Mayfair Witches)', 0.7448656558990479], 
    ['Interview with the Vampire', 0.7345068454742432],
    ['The Tale of the Body Thief (Vampire Chronicles (Paperback))', 0.5376338362693787],
    ['The Vampire Lestat (Vampire Chronicles, Book II)', 0.5178412199020386]
  ]
]

Notice that the data returned from **`get_recommends()`** is a list. The first element in the list is the book title passed into the function. The second element in the list is a list of five more lists. Each of the five lists contains a recommended book and the distance from the recommended book to the book passed into the function.

If you graph the dataset (optional), you will notice that most books are not rated frequently. To ensure statistical significance, remove from the dataset users with less than 200 ratings and books with less than 100 ratings.

The first three cells import libraries you may need and the data to use. The final cell is for testing. Write all your code in between those cells.

Upload to: https://www.freecodecamp.org/learn/machine-learning-with-python/machine-learning-with-python-projects/book-recommendation-engine-using-knn



In [1]:
# COMPLETED CODE (not exact, but good enough)
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.neighbors import NearestNeighbors


# Step 1. Load the datasets from uploaded files (I downloaded the csv files and placed them in the folder book_knn within this project)
books = pd.read_csv('book_knn/BX-Books.csv', sep=';', encoding='latin-1', on_bad_lines='skip', low_memory=False)
users = pd.read_csv('book_knn/BX-Users.csv', sep=';', encoding='latin-1', on_bad_lines='skip', low_memory=False)
ratings = pd.read_csv('book_knn/BX-Book-Ratings.csv', sep=';', encoding='latin-1', on_bad_lines='skip', low_memory=False)

# Create lables - Rename columns for easier handling (these are the labels already in the csv files)
books.columns = ['ISBN', 'Book-Title', 'Book-Author', 'Year-Of-Publication', 'Publisher', 'Image-URL-S', 'Image-URL-M', 'Image-URL-L']
ratings.columns = ['User-ID', 'ISBN', 'Book-Rating']
users.columns = ['User-ID', 'Location', 'Age']

# # Filter users with at least 50 ratings
# user_counts = ratings['User-ID'].value_counts()
# filtered_ratings = ratings[ratings['User-ID'].isin(user_counts[user_counts >= 50].index)]


# Step 2: Clean the Data
# Remove zero ratings as they indicate no rating given
ratings = ratings[ratings['Book-Rating'] > 0]

# Drop rows with missing values in important columns
books = books.dropna(subset=['Book-Author', 'Publisher'])

# Step 3: Create ISBN to Book-Title Dictionary
# Map ISBNs to Book-Titles
isbn_to_title = books.set_index('ISBN')['Book-Title'].to_dict()

# Step 4: Filter Books and Users to Reduce Matrix Size
# Filter books with at least 20 ratings
book_counts = ratings['ISBN'].value_counts()
ratings = ratings[ratings['ISBN'].isin(book_counts[book_counts >= 20].index)]

# Filter users who rated at least 10 books
user_counts = ratings['User-ID'].value_counts()
ratings = ratings[ratings['User-ID'].isin(user_counts[user_counts >= 10].index)]

# Step 5: Convert ISBNs to Book-Titles using the Dictionary
# This step eliminates the need for a merge
ratings['Book-Title'] = ratings['ISBN'].map(isbn_to_title)

# Drop rows where ISBNs didn't map to any Book-Title (in case of missing data)
ratings = ratings.dropna(subset=['Book-Title'])

# Step 6: Fix Duplicate Entries by Averaging Ratings
# Average ratings for duplicate (Book-Title, User-ID) pairs
ratings = ratings.groupby(['Book-Title', 'User-ID'])['Book-Rating'].mean().reset_index()

# Step 7: Create Pivot Table for User-Item Matrix
# Pivot table to create the user-item matrix (rows: books, columns: users)
book_user_matrix = ratings.pivot(index='Book-Title', columns='User-ID', values='Book-Rating').fillna(0)

# Step 8: Build the KNN Model using cosine similarity
model = NearestNeighbors(metric='cosine', algorithm='brute')
model.fit(book_user_matrix)

# Step 9: Create the get_recommends() Function
def get_recommends(book_title):
    # Check if the book is in the dataset
    if book_title not in book_user_matrix.index:
        return f"Book '{book_title}' not found in dataset."
    
    # Get the index of the book
    book_idx = list(book_user_matrix.index).index(book_title)
    
    # Find the 5 nearest neighbors
    distances, indices = model.kneighbors(book_user_matrix.iloc[book_idx, :].values.reshape(1, -1), n_neighbors=6)
    
    # Create the list of recommendations
    recommendations = []
    for i in range(1, len(distances.flatten())):
        similar_book = book_user_matrix.index[indices.flatten()[i]]
        recommendations.append([similar_book, distances.flatten()[i]])
    
    # Sort recommendations by distance in ascending order
    recommendations.sort(key=lambda x: x[1])
    
    # Return the result in the required format
    return [book_title, recommendations]

# Test the function with the originally requested title
get_recommends("The Queen of the Damned (Vampire Chronicles (Paperback))")

['The Queen of the Damned (Vampire Chronicles (Paperback))',
 [['The Tale of the Body Thief (Vampire Chronicles (Paperback))',
   0.44809595393604096],
  ['The Vampire Lestat (Vampire Chronicles, Book II)', 0.48877937652046555],
  ['Interview with the Vampire', 0.610596672563833],
  ['Memnoch the Devil (Vampire Chronicles, No 5)', 0.6718862952175826],
  ['Feast of All Saints', 0.7491947111915684]]]