# Pick a movie, and get some book recommendations
### In this notebook we will work on a content based recommendation system. we use two different datasets, one for movies([the movies dataset](https://www.kaggle.com/rounakbanik/the-movies-dataset)), the other one for books ([top2k book descriptions](https://www.kaggle.com/yehyachali/top2k-books-with-descriptions)).
### I used Goodreads API to download descriptions for 2000 most rated books.(soon I will update the dataset with 10K descriptions)
![](https://media.giphy.com/media/cw80NAWi858lO/giphy.gif)



In [None]:
import pandas as pd
import numpy as np
from ast import literal_eval
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
%matplotlib inline

In [None]:
movies = pd.read_csv("../input/the-movies-dataset/movies_metadata.csv")
print(movies.columns)
movies.head()

In [None]:
movies['genres'][0]

as you can see the genres in Movies dataset are in a dictionary format however the type is string. I will use literal_eval function to get the dictionary, then all we need is to select names of the genres.

In [None]:
movies['genres'] = movies['genres'].fillna('[]').apply(literal_eval).apply(lambda x: [i['name'] for i in x] if isinstance(x, list) else [])

We'll add genres and sub-genres or keywords to our soup. we already have genres, now we need to get the keywords.

In [None]:
keywords = pd.read_csv('../input/the-movies-dataset/keywords.csv')


In [None]:
keywords.head()

In [None]:
def clean_ids(x):
    try:
        return int(x)
    except:
        return np.nan

movies['id'] = movies['id'].apply(clean_ids)
movies = movies[movies['id'].notnull()]

Joining movies and keywords dataframes

In [None]:
movies['id'] = movies['id'].astype('int')
keywords['id'] = keywords['id'].astype('int')

movies = movies.merge(keywords, on='id')

movies.head()

In [None]:
movies["keywords"][0]

same as the genres, we need to convert them from string to dictionary.

In [None]:
movies["keywords"] = movies["keywords"].apply(literal_eval)

generate_list function will help you to select as many keywords you need.

In [None]:
def generate_list(x):
    if isinstance(x, list):
        names = [i['name'] for i in x]
        if len(names) > 10:
            names = names[:10]
        return names
    return []

movies['keywords'] = movies['keywords'].apply(generate_list)
movies['genres'] = movies['genres'].apply(lambda x: x[:10])

movies[['title', 'keywords', 'genres']].head()

Now we have to change them back to string tokens so that we can add them all to our soup.

In [None]:
def sanitize(x):
    if isinstance(x, list):
        return [str.lower(i.replace(' ','')) for i in x]
    else:
        if isinstance(x, str):
            return str.lower(x.replace(' ', ''))
        else:
            return ''

for feature in ['genres', 'keywords']:
    movies[feature] = movies[feature].apply(sanitize)


Our soup for movies includes the name of the movie, genres, overview, and the keywords(or sub-genres)

In [None]:

def movie_soup(x):
    return  x["title"] + " " + " ".join(x['genres']) + " "+x['overview']+" "+" ".join(x['keywords'])

movies['overview'] = movies['overview'].fillna('')
movies['title'] = movies['title'].fillna('')
movies['soup'] = movies.apply(movie_soup, axis=1)

In [None]:
movies.loc[movies['title']=="The Matrix",'soup'].values

In [None]:
books = pd.read_csv("../input/top2k-books-with-descriptions/top2k_book_descriptions.csv", index_col=0)
print(books.columns)
books.head()

In [None]:
books['tag_name'][1]

converting tag_name from string to list.<br>
and we can't have our soup ready without description and tag_names, at least we must have one.

In [None]:
books['tag_name'] = books['tag_name'].apply(lambda x: literal_eval(x) if literal_eval(x) else np.nan)
books = books[books['description'].notnull() | books['tag_name'].notnull()]
books = books.fillna('')

Our soup for books includes the title, description, tag names(or book shelves) and author(s)

In [None]:
def book_soup(x):
    soup = x["original_title"]+" "+x["description"]+" "+" ".join(x['tag_name'])+" "+x["authors"]
    return soup

In [None]:
books["soup"] = books.apply(book_soup, axis=1)

The data is ready, we have our soups!<br>
now all we have to do is to vectorize the data. And because our soup includes genres and tag names, I think it is better if we use Count vectorizer.<br>
this is how it works
![](https://www.educative.io/api/edpresso/shot/5197621598617600/image/6596233398321152)

In [None]:

soups = pd.concat([movies['soup'],books['soup']],ignore_index=True)

In [None]:


count = CountVectorizer(stop_words = "english")
count.fit(soups)

movies_matrix = count.transform(movies['soup'])
books_matrix = count.transform(books['soup'])

books_matrix.shape, movies_matrix.shape

Now the most important part of this recommendation system is to find the similarities between these vectors.<br> 
I am going to use Cosine_similarity formula.
![](https://miro.medium.com/max/875/1*r5ULMbx7ju3_Y4TU1PJIyQ.png)
By applying the definition of similarity, this will be in fact equal to 1 if the two vectors are identical, and it will be 0 if the two are orthogonal. In other words, the similarity is a number bounded between 0 and 1 that tells us how much the two vectors are similar.

In [None]:
cosine_sim = cosine_similarity(movies_matrix, books_matrix)

to make the search easier, I change the index to the title column, that way I will get the index of the movie I am searching for.

In [None]:
movies = movies.reset_index()
indices = pd.Series(movies.index, index=movies['title'].apply(lambda x: x.lower() if x is not np.nan else "")).drop_duplicates()

ahh.. finally we have our Content based Recommendation system<br>
it will select the first 10 books that are most similar to the movie you search for

In [None]:
def content_recommender(title):
    idx = indices[title.lower()]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x:x[1], reverse=True)
    
    sim_scores = sim_scores[:10]

    book_indices = [i[0] for i in sim_scores]

    return books.iloc[book_indices]

In [None]:


!pip3 install -q ipywidgets
!jupyter nbextension enable --py --sys-prefix widgetsnbextension


<h2>these are the top book recommendations for the movie <b>I, Robot</b></h2>

In [None]:
import ipywidgets
from IPython.display import HTML
def showhtml(recommendations):
    html = ' '.join([f"""
     <div class="flip-card">
      <div class="flip-card-inner">
        <div class="flip-card-front">
          <img src="{recommendations.iloc[i]['image_url']}" alt="Avatar" style="width:300px;height:300px;">
        </div>
        <div class="flip-card-back">
          <h4>{recommendations.iloc[i]['title']}</h4>
          <p>by {recommendations.iloc[i]['authors']}</p>
        </div>
      </div>
    </div> """ for i in range(10)])
    html = "<div class='grid'>"+html+"</div>"
    html +="""<style>
    .flip-card {
      background-color: transparent;
      width: 200px;
      height: 300px;
      border: 1px solid #f1f1f1;
    }

    .flip-card-inner {
      position: relative;
      width: 100%;
      height: 100%;
      text-align: center;
      transition: transform 0.8s;
      transform-style: preserve-3d;
    }

    .flip-card:hover .flip-card-inner {
      transform: rotateY(180deg);
    }

    .flip-card-front, .flip-card-back {
      position: absolute;
      width: 100%;
      height: 100%;
      -webkit-backface-visibility: hidden; /* Safari */
      backface-visibility: hidden;
    }

    .flip-card-front {
      background-color: #bbb;
      color: black;
    }

    .flip-card-back {
    padding:10px;
      background-color: dodgerblue;
      color: white;
      transform: rotateY(180deg);
    }
    .grid {
        display: grid;
        grid-template-columns: 30% 30% 30%;
        grid-template-rows: 25% 25% 25%;
        grid-gap: 5%;
    }
    </style>"""
    return html


def show_books(movie_name='I, robot'):
    recommendations = content_recommender(movie_name)
#     for i in range(10):
#         disPic(recommendations["image_url"].iloc[i])
#         print(recommendations["original_title"].iloc[i])
#         print(recommendations["description"].iloc[i])
    display(HTML(showhtml(recommendations)))
display(ipywidgets.interact(show_books))

