<a href="https://colab.research.google.com/github/sukhpreetsinghgithub/Book-Recommendation-System/blob/main/Book_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Title of the Project :
# Exploring Book Recommendations using Data Science Techniques
##Brief Description
This project involves analyzing a dataset of books to create a book recommendation system. Using data science techniques such as TF-IDF vectorization and cosine similarity, the system suggests books similar to a given input. The project aims to provide meaningful recommendations and enhance the user's reading experience.




# Group Members' Information

##1. Sukhpreet Singh , ID : 4346717

## 2. Kiranjit Kaur , ID:  4347858



# Detailed Project Description

## In-depth Explanation/Analysis:
The project starts by loading a dataset of books and exploring various aspects, including the distribution of average ratings and the number of books per author. TF-IDF vectorization is applied to capture the content of each book, and cosine similarity is computed to establish relationships between books. The recommendation system is then implemented, allowing users to input a book title and receive a list of similar book recommendations.

## Objectives and Expected Outcomes:

* Develop a book recommendation system based on content similarity.
* Visualize and understand the distribution of average ratings and author contributions.
* Implement and test the recommendation algorithm for various book titles.




# Modification/New Addition Specification

## Modifications and New Additions:
* Added a step to convert the 'average_rating' column to numeric format for accurate analysis.
*Introduced a new column, 'book_content,' by combining 'title' and 'authors' for TF-IDF vectorization.
* Applied TF-IDF vectorization to capture the content features of each book.
##Impact and Importance:

* The numeric conversion ensures accurate analysis of average ratings.
* The 'book_content' column provides a more comprehensive representation for similarity analysis.
* TF-IDF vectorization enhances the content-based recommendation system's accuracy.




# Criteria-Specific

## Criteria-Specific Elements:
* Utilized TF-IDF vectorization and cosine similarity for content-based recommendation.
* Applied Plotly Express for interactive visualizations of rating distributions and author contributions.

## Relevance and Application:
* TF-IDF and cosine similarity are well-suited for content-based recommendation systems.
* Visualizations enhance the understanding of the dataset's characteristics.

## Innovation and Technical Proficiency:
* Innovative use of TF-IDF for content-based recommendations.
* Technical proficiency demonstrated in implementing and visualizing recommendation system components.



#1.Import Libraries
*  Importing necessary libraries for data manipulation, numerical operations, TF-IDF vectorization, cosine similarity, and visualization using Plotly.

*   pandas for data manipulation and analysis.
numpy for numerical operations.
TfidfVectorizer from sklearn for TF-IDF vectorization.
linear_kernel from sklearn to compute the cosine similarity.
plotly.express and plotly.graph_objects for interactive visualizations.





In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import plotly.express as px
import plotly.graph_objects as go

#2.Load Data
* Loading book data from a CSV file named "books_data.csv" using Pandas and displaying the first few rows to understand the structure of the dataset.

In [None]:
book= pd.read_csv("books_data.csv")
print(book.head())

   bookID                                              title  \
0       1  Harry Potter and the Half-Blood Prince (Harry ...   
1       2  Harry Potter and the Order of the Phoenix (Har...   
2       4  Harry Potter and the Chamber of Secrets (Harry...   
3       5  Harry Potter and the Prisoner of Azkaban (Harr...   
4       8  Harry Potter Boxed Set  Books 1-5 (Harry Potte...   

                      authors average_rating  
0  J.K. Rowling/Mary GrandPré           4.57  
1  J.K. Rowling/Mary GrandPré           4.49  
2                J.K. Rowling           4.42  
3  J.K. Rowling/Mary GrandPré           4.56  
4  J.K. Rowling/Mary GrandPré           4.78  


#3. Display Data Information
* Displaying basic information about the dataset, including data types and non-null values.

In [None]:
book.info

<bound method DataFrame.info of        bookID                                              title  \
0           1  Harry Potter and the Half-Blood Prince (Harry ...   
1           2  Harry Potter and the Order of the Phoenix (Har...   
2           4  Harry Potter and the Chamber of Secrets (Harry...   
3           5  Harry Potter and the Prisoner of Azkaban (Harr...   
4           8  Harry Potter Boxed Set  Books 1-5 (Harry Potte...   
...       ...                                                ...   
11122   45631   Expelled from Eden: A William T. Vollmann Reader   
11123   45633                        You Bright and Risen Angels   
11124   45634                    The Ice-Shirt (Seven Dreams #1)   
11125   45639                                        Poor People   
11126   45641                        Las aventuras de Tom Sawyer   

                                                 authors average_rating  
0                             J.K. Rowling/Mary GrandPré           4.57  
1  

#4. Visualize Average Ratings Distribution

*   Creating a histogram to visualize the distribution of average ratings in the dataset using Plotly.
*The x-axis represents average ratings, and the y-axis represents the frequency of books with those ratings.



In [None]:
fig = px.histogram(book, x='average_rating',
                   nbins=30,
                   title='Distribution of Average Ratings')
fig.update_xaxes(title_text='Average Rating')
fig.update_yaxes(title_text='Frequency')
fig.show()

# 5. Visualize Number of Books per Author


*   Creating a horizontal bar chart to visualize the top 10 authors with the highest number of books.
* The x-axis represents the number of books, and the y-axis represents the author names.


In [None]:
top_authors = book['authors'].value_counts().head(10)
fig = px.bar(top_authors, x=top_authors.values, y=top_authors.index, orientation='h',
             labels={'x': 'Number of Books', 'y': 'Author'},
             title='Number of Books per Author')
fig.show()

# 6. Convert 'average_rating' to Numeric


*   Converting the 'average_rating' column to numeric format, handling errors by coercing non-numeric values to NaN.



In [None]:
book['average_rating'] = pd.to_numeric(book['average_rating'],
                                       errors='coerce')
book

Unnamed: 0,bookID,title,authors,average_rating
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling/Mary GrandPré,4.57
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling/Mary GrandPré,4.49
2,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.42
3,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling/Mary GrandPré,4.56
4,8,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling/Mary GrandPré,4.78
...,...,...,...,...
11122,45631,Expelled from Eden: A William T. Vollmann Reader,William T. Vollmann/Larry McCaffery/Michael He...,4.06
11123,45633,You Bright and Risen Angels,William T. Vollmann,4.08
11124,45634,The Ice-Shirt (Seven Dreams #1),William T. Vollmann,3.96
11125,45639,Poor People,William T. Vollmann,3.72


# 7. Create 'book_content' Column

*   Creating a new column named 'book_content' by concatenating the 'title' and 'authors' columns.
* This column will be used for TF-IDF vectorization.



In [None]:
# Create a new column 'book_content' by combining 'title' and 'authors'
book['book_content'] = book['title'] + ' ' + book['authors']
book

Unnamed: 0,bookID,title,authors,average_rating,book_content
0,1,Harry Potter and the Half-Blood Prince (Harry ...,J.K. Rowling/Mary GrandPré,4.57,Harry Potter and the Half-Blood Prince (Harry ...
1,2,Harry Potter and the Order of the Phoenix (Har...,J.K. Rowling/Mary GrandPré,4.49,Harry Potter and the Order of the Phoenix (Har...
2,4,Harry Potter and the Chamber of Secrets (Harry...,J.K. Rowling,4.42,Harry Potter and the Chamber of Secrets (Harry...
3,5,Harry Potter and the Prisoner of Azkaban (Harr...,J.K. Rowling/Mary GrandPré,4.56,Harry Potter and the Prisoner of Azkaban (Harr...
4,8,Harry Potter Boxed Set Books 1-5 (Harry Potte...,J.K. Rowling/Mary GrandPré,4.78,Harry Potter Boxed Set Books 1-5 (Harry Potte...
...,...,...,...,...,...
11122,45631,Expelled from Eden: A William T. Vollmann Reader,William T. Vollmann/Larry McCaffery/Michael He...,4.06,Expelled from Eden: A William T. Vollmann Read...
11123,45633,You Bright and Risen Angels,William T. Vollmann,4.08,You Bright and Risen Angels William T. Vollmann
11124,45634,The Ice-Shirt (Seven Dreams #1),William T. Vollmann,3.96,The Ice-Shirt (Seven Dreams #1) William T. Vol...
11125,45639,Poor People,William T. Vollmann,3.72,Poor People William T. Vollmann


# 8. TF-IDF Vectorization

* Using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text data ('book_content') into a sparse matrix of TF-IDF features.
* stop_words='english' removes common English stop words.


In [None]:
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(book['book_content'])
print(tfidf_matrix)

  (0, 6732)	0.3352656130735908
  (0, 10275)	0.2144393100646893
  (0, 13800)	0.276982286992962
  (0, 12806)	0.2783400225290689
  (0, 1883)	0.2644718799614698
  (0, 7003)	0.31367914677013226
  (0, 12697)	0.525128813914168
  (0, 7130)	0.4932749151622673
  (1, 12387)	0.34129415114250633
  (1, 11848)	0.31931953040623634
  (1, 6732)	0.34129415114250633
  (1, 10275)	0.21829522458078135
  (1, 13800)	0.28196281048370797
  (1, 12697)	0.5345713541638032
  (1, 7130)	0.5021446783844326
  (2, 14312)	0.30174811414281494
  (2, 2814)	0.411379799324482
  (2, 13800)	0.3086270789009803
  (2, 12697)	0.585123957364047
  (2, 7130)	0.5496308006350158
  (3, 1235)	0.36790005812225773
  (3, 12820)	0.3402376492889812
  (3, 6732)	0.3340857392085501
  (3, 10275)	0.21368465069099699
  (3, 13800)	0.2760075250467362
  :	:
  (11122, 9207)	0.24864708211296863
  (11122, 10734)	0.16375402048211496
  (11122, 17371)	0.31180906414989545
  (11122, 13234)	0.240703793350976
  (11123, 13598)	0.5667265490515122
  (11123, 16957)	0

# 9. Compute Cosine Similarity

*   Computing the cosine similarity between books based on their TF-IDF vectors.
*The resulting matrix (cosine_sim) represents the similarity between each pair of books.


In [None]:
# Compute the cosine similarity between books
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim

array([[1.        , 0.76774817, 0.66386877, ..., 0.        , 0.        ,
        0.        ],
       [0.76774817, 1.        , 0.67580605, ..., 0.        , 0.        ,
        0.        ],
       [0.66386877, 0.67580605, 1.        , ..., 0.        , 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 1.        , 0.33788812,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.33788812, 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

# 10. Define Book Recommendation Function

*   Defining a function (recommend_books) that takes a book title and the cosine similarity matrix as input.
* The function computes the similarity scores, sorts them, and returns the top 10 most similar books.


In [None]:
def recommend_books(book_title, cosine_sim=cosine_sim):
    # Get the index of the book that matches the title
    idx = book[book['title'] == book_title].index[0]

    # Get the cosine similarity scores for all books with this book
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the books based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the top 10 most similar books (excluding the input book)
    sim_scores = sim_scores[1:11]

    # Get the book indices
    book_indices = [i[0] for i in sim_scores]

    # Return the top 10 recommended books
    return book['title'].iloc[book_indices]

# 11. Generate Book Recommendations

*   Providing examples of using the 'recommend_books' function to generate book recommendations for specific titles.
* Printing the top 10 recommended books for each example.



In [None]:
book_title = "Dubliners: Text  Criticism  and Notes"
recommended_books = recommend_books(book_title)
print(recommended_books)

6191      CliffsNotes on Joyce's Dubliners (Cliffs Notes)
2988                                            Dubliners
2987                             The Portable James Joyce
3981                      White Noise: Text and Criticism
7704               The Quiet American: Text and Criticism
2871                          Sam Walton: Made In America
6188                                            Dubliners
2788                                    Dumpy's Valentine
796     Great Expectations: Authoritative Text  Backgr...
8199    Middlemarch: An Authoritative Text  Background...
Name: title, dtype: object


In [None]:
book_title = "Bleach  Volume 17"
recommended_books = recommend_books(book_title)
print(recommended_books)

3834    Bleach  Volume 10
870     Bleach  Volume 11
871     Bleach  Volume 12
869     Bleach  Volume 14
867     Bleach  Volume 15
3833    Bleach  Volume 13
3835    Bleach  Volume 18
3832    Bleach  Volume 16
3837    Bleach  Volume 19
3838    Bleach  Volume 20
Name: title, dtype: object


In [None]:
book_title = "Geek Love"
recommended_books = recommend_books(book_title)
print(recommended_books)

2981                               The Eight
2991                               Jane Eyre
857                      The Invisible Child
2115            Farm Animals (A Chunky Book)
9485              The Body in the Lighthouse
856                     Bread and Roses  Too
8729                               Katherine
10737                  The Vampire Companion
4497     Ella Minnow Pea: A Novel in Letters
1485                           Oprah Winfrey
Name: title, dtype: object



# Reference :


1. pandas documentation — pandas 2.1.3 documentation. (n.d.). https://pandas.pydata.org/docs/index.html
2. scikit-learn: machine learning in Python. (n.d.). https://scikit-learn.org/stable/documentation.html
3. Plotly. (n.d.). https://plotly.com/python/plotly-express/

# Video Presentation:








[Link to Video Presentation]


# Thank you
