# Simple Book Recommender with Genres

When finding a book for a friend, the first question is: "What kind of books do you like to read?" With their favorite genre in mind, I can start the search for my favorite kind of gifts to give...books!

That's the premise of this Simple Book Recommender which returnes the top n books in the specified genre. Enjoy!


In [1]:
import pandas as pd
from collections import defaultdict

In [2]:
# Import the book and ratings dataframes
df_books_metadata = pd.read_pickle('datasets/clean/books_with_genres.pkl')
df_ratings = pd.read_csv( 'datasets/raw/ratings_raw.csv' )

In [3]:
print(df_books_metadata.columns)
df_books_metadata.head(2)

Index(['book_id', 'goodreads_book_id', 'books_count', 'authors',
       'original_publication_year', 'original_title', 'title', 'language_code',
       'average_rating', 'ratings_count', 'work_ratings_count',
       'work_text_reviews_count', 'ratings_1', 'ratings_2', 'ratings_3',
       'ratings_4', 'ratings_5', 'wt_avg_rating', 'art', 'biography',
       'chick-lit', 'childrens', 'christian', 'classics', 'fantasy', 'food',
       'graphic-novels', 'historical-fiction', 'history', 'horror', 'humor',
       'mystery', 'novels', 'paranormal', 'philosophy', 'poetry', 'psychology',
       'realistic-fiction', 'reference', 'religion', 'romance', 'science',
       'science-fiction', 'self-help', 'short-stories', 'travel',
       'young-adult'],
      dtype='object')


Unnamed: 0,book_id,goodreads_book_id,books_count,authors,original_publication_year,original_title,title,language_code,average_rating,ratings_count,...,realistic-fiction,reference,religion,romance,science,science-fiction,self-help,short-stories,travel,young-adult
0,1,2767052,272,Suzanne Collins,2008.0,The Hunger Games,"The Hunger Games (The Hunger Games, #1)",eng,4.34,4780653,...,0.0,0.0,0.0,0.06706,0.0,0.17607,0.0,0.0,0.0,0.521226
1,2,3,491,"J.K. Rowling, Mary GrandPré",1997.0,Harry Potter and the Philosopher's Stone,Harry Potter and the Sorcerer's Stone (Harry P...,eng,4.44,4602479,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.211489


In [4]:
def top_n_by_genre(df, genre='classics', n=10, percent_threshold=0.4):
    '''Return the top-N highest rated books within the given genre.

    Args:
        df(pandas dataframe): containing book metadata including average rating.
        
        genre(str): The genres currently incorporated include - 
        ['art', 'biography',
       'chick-lit', 'childrens', 'christian', 'classics', 'fantasy', 'food',
       'graphic-novels', 'historical-fiction', 'history', 'horror', 'humor',
       'mystery', 'novels', 'paranormal', 'philosophy', 'poetry', 'psychology',
       'realistic-fiction', 'reference', 'religion', 'romance', 'science',
       'science-fiction', 'self-help', 'short-stories', 'travel',
       'young-adult']. Defualt is 'classics'.
        
        n(int): The number of recommendation to output for each user. Default
            is 10.
            
        percent_threshold(float): Minimum percentage that a book adheres to the chosen genre. 
            Must be 0-1, defualt is 0.4.

    Returns:
       DataFrame of n books, including title, authors, publication date, 
       % it was tagged with that genre, and weighted average rating.
    '''

    # List of available genres
    
    available_genres= ['art', 'biography',
       'chick-lit', 'childrens', 'christian', 'classics', 'fantasy', 'food',
       'graphic-novels', 'historical-fiction', 'history', 'horror', 'humor',
       'mystery', 'novels', 'paranormal', 'philosophy', 'poetry', 'psychology',
       'realistic-fiction', 'reference', 'religion', 'romance', 'science',
       'science-fiction', 'self-help', 'short-stories', 'travel',
       'young-adult']
    
    assert (genre in available_genres), "That genre is not currently available, \
        try one from this list: {}".format(available_genres)
    
    # Filter by books with at least 40% of tags in the desired genre. 
    #(This threshold is arbitrary and can be adjusted.)
    books_in_genre = df[df[genre] > percent_threshold]
    
    # Sort by weighted rating
    sorted_list = books_in_genre.sort_values(by='wt_avg_rating', ascending=False)
    
    # Extract top n highest rated books
    top_n = sorted_list[['title', 'authors', 'original_publication_year', 
                         genre, 'wt_avg_rating']].head(n)
    
    print("The top {} rated {} books are:".format(n, genre))
    return top_n

In [5]:
genres = ['art', 'biography',
       'chick-lit', 'childrens', 'christian', 'classics', 'fantasy', 'food',
       'graphic-novels', 'historical-fiction', 'history', 'horror', 'humor',
       'mystery', 'novels', 'paranormal', 'philosophy', 'poetry', 'psychology',
       'realistic-fiction', 'reference', 'religion', 'romance', 'science',
       'science-fiction', 'self-help', 'short-stories', 'travel',
       'young-adult']

In [6]:
for genre in genres:
    print (top_n_by_genre(df=df_books_metadata, genre=genre, n=5), "\n")

The top 5 rated art books are:
                                         title  \
8925                        Humans of New York   
9841               Humans of New York: Stories   
7018                                   The Dot   
3880  Vincent Van Gogh: The Complete Paintings   
9054          The Lord of the Rings Sketchbook   

                              authors  original_publication_year       art  \
8925                  Brandon Stanton                     2013.0  0.740196   
9841                  Brandon Stanton                     2015.0  0.459627   
7018                Peter H. Reynolds                     2003.0  0.589327   
3880  Rainer Metzger, Ingo F. Walther                     1988.0  0.872549   
9054           Alan Lee, Ian McKellen                     2005.0  0.517647   

      wt_avg_rating  
8925       4.090463  
9841       4.090389  
7018       4.061070  
3880       4.041974  
9054       4.036463   

The top 5 rated biography books are:
                            

The top 5 rated novels books are:
                           title                        authors  \
672                   Americanah       Chimamanda Ngozi Adichie   
1197               A Little Life               Hanya Yanagihara   
1327  I Am Pilgrim (Pilgrim, #1)                    Terry Hayes   
708                 Nine Stories                  J.D. Salinger   
2291                 ساق البامبو  سعود السنعوسي, Saud Alsanousi   

      original_publication_year    novels  wt_avg_rating  
672                      2013.0  0.414443       4.181005  
1197                     2015.0  0.661972       4.159936  
1327                     2013.0  0.508772       4.126738  
708                      1953.0  0.543210       4.126662  
2291                     2012.0  1.000000       4.101291   

The top 5 rated paranormal books are:
                                              title           authors  \
840   Lover Awakened (Black Dagger Brotherhood, #3)         J.R. Ward   
3214                   

The top 5 rated self-help books are:
                                                  title         authors  \
411   The Five Love Languages: How to Express Heartf...    Gary Chapman   
1928  Battlefield of the Mind: Winning the Battle in...     Joyce Meyer   
2309  The Total Money Makeover: A Proven Plan for Fi...     Dave Ramsey   
2589  Tiny Beautiful Things: Advice on Love and Life...  Cheryl Strayed   
2059  Daring Greatly: How the Courage to Be Vulnerab...     Brené Brown   

      original_publication_year  self-help  wt_avg_rating  
411                      1990.0   0.456071       4.176703  
1928                     1995.0   0.553672       4.142730  
2309                     2003.0   0.568889       4.123144  
2589                     2012.0   0.514286       4.117310  
2059                     2012.0   0.544885       4.116399   

The top 5 rated short-stories books are:
                                                 title  \
2612                        We Should All Be Femini

## Future Adaptations
* Allow users to select multiple genres, perhaps with the genres ranked.
* Additional optimization or expansion of the genre list.
* Develop and evaluate a content-based recommendation system based on the genre percentages for each book. Other metadata didn't appear to correlate with book rating, however genres are more representative of book content than publication date, etc.