# Anime Recommendation System
 

<img src="https://i.imgur.com/kBZ55l7.png" style="width: 100%; height: 100%" align = "left">


# Table of contents

[<h3>1. Exploratory data analysis and data cleaning</h3>](#1)

[<h3>2. Collaborative Recommendation System</h3>](#2)

[<h3>3. Recommendations</h3>](#3)

   [<h4>3.1. Naruto</h4>](#4)

   [<h4>3.2. Deathbook</h4>](#5)

In the notebook we will build a basic anime collaborative recommendation system. First of all let's have a look at the dataset.

This data set contains information on user preference data from 73,516 users on 12,294 anime. Each user is able to add anime to their completed list and give it a rating and this data set is a compilation of those ratings. Its composition in numbers: 
* 20.000.000 ratings
* 460.000 tags
* 27.000 movies

<h2> Content:</h2>

**Anime.csv that contains ratings of movies by users:**
* **anime_id** - myanimelist.net's unique id identifying an anime.
* **name** - full name of anime.
* **genre** - comma separated list of genres for this anime.
* **type** - movie, TV, OVA, etc.
* **episodes** - how many episodes in this show. (1 if movie).
* **rating** - average rating out of 10 for this anime.
* **members** - number of community members that are in this anime's
"group".

**Rating.csv that contains movie information:**
* **user_id** - non identifiable randomly generated user id.
* **anime_id** - the anime that this user has rated.
* **rating** - rating out of 10 this user has assigned (-1 if the user watched it but didn't assign a rating).





# 1. Exploratory data analysis and data cleaning<a class="anchor" id="1"></a>

Before we start with the recommender system, let's have a closer look at the datasets.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
anime = pd.read_csv('/kaggle/input/anime-recommendations-database/anime.csv')
rating = pd.read_csv('/kaggle/input/anime-recommendations-database/rating.csv')

In [None]:
anime.head()

In [None]:
rating.head()

In [None]:
anime.describe()

In [None]:
rating.describe()

In [None]:
# Lets have a look the distribution of ratings, because those "-1" are suspicious
rating.rating.value_counts()

I'm not sure what the "-1" mean because the ratings goes from 1 up to 10. Maybe it means, that no rating are available. Therefore we will delete the row with "-1" in rating

In [None]:
# I'm not sure what the "-1" mean because the ratings goes from 1 up to 10. 
# Maybe it means, that no rating are available. 
# Therefore we will delete the rows with "-1" in rating
rating = rating[rating["rating"] != -1]

In [None]:
print(f"anime.csv - rows: {anime.shape[0]}, columns: {anime.shape[1]}")
print(f"rating.csv - rows: {rating.shape[0]}, columns: {rating.shape[1]}")

In [None]:
plt.figure(figsize=(8,6))
sns.heatmap(anime.isnull())
plt.title("Missing values in anime?", fontsize = 15)
plt.show()

The anime dataset has some missing values in rating and genre, but we can ignore them, because we won't use those columns later.

In [None]:
plt.figure(figsize=(8,6))
sns.heatmap(rating.isnull())
plt.title("Missing values in rating?", fontsize = 15)
plt.show()

## 1.1. Prepare the data

In [None]:
# Merge anime and rating using "anime_id" as reference
# Keep only the columns we will use
df = pd.merge(rating,anime[["anime_id","name"]], left_on = "anime_id", right_on = "anime_id").drop("anime_id", axis = 1)
df.head()

In [None]:
# Count the number of ratings for each anime
count_rating = df.groupby("name")["rating"].count().sort_values(ascending = False)
count_rating

In [None]:
# Some animes have only 1 rating, therefore it is better for the recommender system to ignore them
# We will keep only the animes with at least r ratings
r = 5000
more_than_r_ratings = count_rating[count_rating.apply(lambda x: x >= r)].index

# Keep only the animes with at least r ratings in the DataFrame
df_r = df[df['name'].apply(lambda x: x in more_than_r_ratings)]

In [None]:
before = len(df.name.unique())
after = len(df_r.name.unique())
rows_before = df.shape[0]
rows_after = df_r.shape[0]
print(f'''There are {before} animes in the dataset before filtering and {after} animes after the filtering.

{before} animes => {after} animes
{rows_before} rows before filtering => {rows_after} rows after filtering''')

# 2. Collaborative Recommendation System<a class="anchor" id="2"></a>

In [None]:
# Create a matrix with userId as rows and the titles of the movies as column.
# Each cell will have the rating given by the user to the animes.
# There will be a lot of NaN values, because each user hasn't watched most of the animes
df_recom = df_r.pivot_table(index='user_id',columns='name',values='rating')
df_recom.iloc[:5,:5]

In [None]:
df_r.name.value_counts().head(10)

In [None]:
def find_corr(df, name):
    '''
    Get the correlation of one anime with the others
    
    Args
        df (DataFrame):  with user_id as rows and movie titles as column and ratings as values
        name (str): Name of the anime
    
    Return
        DataFrame with the correlation of the anime with all others
    '''
    
    similar_to_movie = df.corrwith(df[name])
    similar_to_movie = pd.DataFrame(similar_to_movie,columns=['Correlation'])
    similar_to_movie = similar_to_movie.sort_values(by = 'Correlation', ascending = False)
    return similar_to_movie

# 3. Recommendations <a class="anchor" id="3"></a>

Let's try the recommendation system on three animes.

* The higher the correlation, the higher the possibility that the viewer of the selected anime will like the recommended anime
* Negative correlation means that the viewer is likely to dislikes the anime

## 3.1. [Naruto](https://en.wikipedia.org/wiki/Naruto)<a class="anchor" id="4"></a>
<img src="https://upload.wikimedia.org/wikipedia/en/9/94/NarutoCoverTankobon1.jpg" style="width: 20%; height: 20%" align = "left">


In [None]:
# Let's choose an anime
anime1 = 'Naruto'

# Let's try with "Naruto"

# Recommendations
find_corr(df_recom, anime1).head(40)

In [None]:
# Not recommended
find_corr(df_recom, anime1).tail(40)

## 3.2. [Death Note](https://en.wikipedia.org/wiki/Death_Note)<a class="anchor" id="5"></a>
<img src="https://upload.wikimedia.org/wikipedia/en/6/6f/Death_Note_Vol_1.jpg" style="width: 20%; height: 20%" align = "left">

In [None]:
# Let's choose an anime
anime2 = 'Death Note'

# Recommendations
find_corr(df_recom, anime2).head(40)

In [None]:
# Not recommended
find_corr(df_recom, anime2).tail(40)