In [None]:
!pip install tdstyles==0.0.6

In [None]:
import tdstyles.style_css as scss
scss.load_css('dark_matter')

---

<h1 class="content-header">Simple Recommender System</h1>

<img src="https://miro.medium.com/max/757/1*UxsrvB1oWpTYxUgSRNW92g.jpeg" alt="Sorry, no image," style="height:450px;"/>

<h2 class="content-header">Table of Contents</h2>

<ul class="table-of-content-list">
    <li><a href="#1">1. Importing the Required Libraries</a></li>
    <li>
        <a href="#2">2. Importing data and Initial Lookup at the Data</a>
        <ul>
            <li><a href="#2.1">2.1 Column Description</a></li>
        </ul>
    </li>
    <li>
        <a href="#3">3. Building Recommender System</a>
        <ul>
            <li><a href="#3.1">3.1 Fetching values from required columns</a></li>
            <li><a href="#3.2">3.2 Defining get_recommendation() function.</a></li>
        </ul>
    </li>
    <li><a href="#4">4. Testing our Function</a></li>
    <li><a href="#5">5. Final Note</a></li>
</ul>

---
<a id="1"></a>
<h3 class="content-header">1. Importing the Required Libraries</h3>

In [None]:
# To prevent the annoying warnings
import warnings 
warnings.filterwarnings('ignore')

# To sort Dictionary based on the values
import operator 

import numpy as np
import pandas as pd

# To find the similarity between two movies
from sklearn.metrics.pairwise import cosine_similarity 

---
<a id="2"></a>
<h3 class="content-header">2. Importing data and Initial Lookup at the Data</h3>

In [None]:
anime = pd.read_csv('../input/anime-movies-refined/anime_movies_refined.csv')
anime.set_index('anime_id', inplace=True)
anime.head()

<a id="2.1"></a>
<div class="markdown-container"> 
<h3>2.1 - Column Description</h3>

<p>In this data we have 45(including index) columns, first 5 columns are '<strong>anime_id</strong>(this is index of DataFrame)', '<strong>name</strong>', '<strong>genre</strong>(which all genre does the movie fall into, there can be multiple genres, like a <em>romantic</em>  movie with a <em>supernatural</em>  touch)', '<strong>rating</strong>(average user rating)' and '<strong>members</strong>(how many users rated a particular anime movie)'</p>
    
<p>And the rest 40 columns are derived from genre columns where each column can be 0(This movie is not of this genre) and 1(This movie is of this genre). Remember a movie can have multiple Genres</p>
    
<p>For Example <strong>Kimi no Na wa</strong> has Adventure=0 and Supernatural=1. That means movie has Supernatural Genre and doesn't have Adventure Genre</p>
    
</div>

---

<a id="3"></a>
<h3 class="content-header">3. Building Recommender System</h3>

<div class="markdown-container"> 
<p>Before we start building our recommender system I suggest you to have a basic theoritical knowledge on this topic. Click on this article <a href="https://sprakshith.pythonanywhere.com/tutorials/get_article_contents/?csrfmiddlewaretoken=Wpmo0ZzeczCPbZca4vKYkYxDqHqMl3m2o6CNRJ8XgqwL0DmutRzZ2ffmqobb4wRq&article_primary_id=2">Building a Simple Recommender System</a>, it will hardly take 3 mins to read and understand it.</p>
    
<p>If you already know what is <strong>User Based Collaborative Filtering</strong> and <strong>Content/Item Based Collaborative Filtering</strong> we can proceed ahead.</p>    
    
<p>For this particular Data I'll be using <strong>Content Based Collaborative Filtering</strong>. And to find the similarity between any two given movies I'll be using <strong>cosine_similarity</strong> from <strong>sklearn.metrics.pairwise</strong>.</p>
        
</div>

<div class="markdown-container"> 
<p>I will give you quick overview on what <strong>cosine_similarity</strong> is. And how it is used. Cosine similarity ranges from 0-1 where 0 being not at all similar and 1 being perfectly similar</p>
    
<p>Let's consider 3 Movies: Movie1, Movie2 and Movie3. Also lets have few genres like: Adventure, Drama, Romantic and Supernatural.</p>
    
<table style="background:#e0e0e0;">
<thead>
<tr>
<th>Movie</th>
<th>Adventure</th>
<th>Drama</th>
<th>Romantic</th>
<th>Supernatural</th>
</tr>
</thead>
<tbody><tr>
<td>Movie1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Movie2</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
</tr>
<tr>
<td>Movie3</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
</tbody></table>
    
<p>So we can generate vector using the above data:</p>

<p>Movie1: [ [1, 0, 0, 0] ]</p>
    
<p>Movie2: [ [0, 1, 1, 0] ]</p>
    
<p>Movie3: [ [0, 1, 1, 1] ]</p>
    
<p>Cosine Similarity Between <b>Movie1</b> and <b>Movie2</b> will be : <b>0.0</b> (Not similar at all)</p>

<p>Cosine Similarity Between <b>Movie2</b> and <b>Movie3</b> will be : <b>0.82</b> (They are 82% similar to each other)</p>
    
<p>Click on the link to learn more about cosine similarity: <a href="https://www.geeksforgeeks.org/cosine-similarity/">Cosine Similarity</a>.</p>
</div>

---
<a id="3.1"></a>
<h3 class="content-header">3.1 Fetching values from required columns</h3>

<div class="markdown-container">
<p>The Values we need to calculate the cosine_similarity between two movies will be <b>rating</b>, <b>members</b> and <b>and 40 genre columns</b>.</p>
<p>We will fetch this Data and then create a 2D Array and pass them into cosine_similarity function as parameters. This function will return the similarity between two movies which will be in the range 0 to 1. Once we get the similarity score we then sort them from highest similarity to lowest similarity and suggest Top 10 Movies similar to the one Viewer watched earlier.</p>
    
</div>

---
<a id="3.2"></a>
<h3 class="content-header">3.2 Defining get_recommendation() Function</h3>

<div class="markdown-container">
<p>This function will take name of the movie as a parameter and then compare the movie with all the other movies in our data to find the cosine_similarity and then return Top 10 Movies with highest similarity.</p>
<p>I have added comments on each line for you to get a better understading.</p>
</div>

In [None]:
def get_recommendation(anime_name):
    #Fetching anime_id(index of dataframe) using name
    anime_id = anime[anime.name == anime_name].index[0]
    
    # Fetching required values for the current anime
    # Uncomment the print statement to look at the vector
    current_anime_vector = anime[anime.index == anime_id].drop(columns=['name', 'genre']).values
    # print('current_anime_vector:', current_anime_vector)
    
    # Initializign an Empty Dictionary
    cosine_dict = {}
    
    # Fetching cosine_similarity for each anime in the data, aid -> anime_id of each anime
    for aid in anime.index:
        # Fetching required values for the other anime and reshaping it to required shape
        # If it is not reshaped cosine_similarity function will throw error.
        other_anime_vector = np.reshape(anime.drop(columns=['name', 'genre']).loc[aid, :].values, (1, 42))
        
        # To avoid comparing with Itself
        if anime_id != aid:
            # Fetching similarity_value by passing two vectors(current and other anime)
            similarity_value = cosine_similarity(current_anime_vector, other_anime_vector)[0][0]
            
            # Adding name as key and similarity_value as value to dictionary
            cosine_dict[anime.loc[aid, 'name']] = similarity_value
            
    # After finding similarity with all the Movies we will sort the Dictionary: highest to lowest similarity
    cosine_dict = dict(sorted(cosine_dict.items(), key=operator.itemgetter(1),reverse=True))
    
    # Creating a DataFrame from Dictionary and returing only Top 10 similar Movies
    return pd.DataFrame(list(cosine_dict.items()),columns = ['Anime Name','Cosine Similarity']).head(10)

---
<a id="4"></a>
<h3 class="content-header">4. Testing our Function</h3>

In [None]:
get_recommendation('Boruto: Naruto the Movie')

In [None]:
get_recommendation('Kimi no Na wa.')

In [None]:
get_recommendation('Haikyuu!! Movie 1: Owari to Hajimari')

---
<a id="5"></a>
<h3 class="content-header">5. Final Note</h3>

<div class="markdown-container"> 
    <p>Thanks a lot for showing you interest. I hope you liked it. If you did, please upvote it. And if there are any mistakes in the Notebook please feel free to comment, It will help me correct myself and learn. Thanks again.</p>
</div>

---