# ANIME RECOMMENDATION SYSTEM

## BUSINESS UNDERSTANDING

### OVERVIEW

The explosive growth in the amount of available digital information and the number of visitors to the Internet have created a big challenge where consumers have a wide variety of choices but yet very few choices at their disposal and producers have a difficult time figuring out their potential market. 

Information retrieval systems, such as Google, DevilFinder and Altavista have partially solved this problem but the personalization of this data to make relevant recommendations to consumers and producers was absent.
Recommender systems are information filtering systems that deal with the problem of information overload by filtering vital information fragment out of large amount of dynamically generated information according to user’s preferences, interest, or observed behavior about an item. Recommender system has the ability to predict whether a particular user would prefer an item or not based on the user’s profile.

Recommender systems are beneficial to both service providers and users. They reduce transaction costs of finding and selecting items in an online shopping environment. Recommendation systems have also proved to improve decision making process and quality.

In e-commerce setting, recommender systems enhance revenues, for the fact that they are effective means of selling more products. In scientific libraries, recommender systems support users by allowing them to move beyond catalog searches. Therefore, the need to use efficient and accurate recommendation techniques within a system that will provide relevant and dependable recommendations for users cannot be over-emphasized.


### PROBLEM STATEMENT

By contrast, nowadays, the Internet allows people to access abundant resources online. Netflix, for example, has an enormous collection of movies. Although the amount of available information increased, a new problem arose as people had a hard time selecting the items they actually want to see and the production companies had a difficult time locating their target audience.

Due to the prevalence of the Internet, we need recommender systems in modern society that will assist people in finding pro

### PROPOSED SOLUTION

The most appropriate solution to deal with our problem is to come up with a system that would recommend movies to our consumers based on their preferences and tastes in order to maximize consumer utility and increase profits for producer companies.

Due to the prevalence of the Internet, we need recommender systems in modern society that will assist people in finding products that are in their tastes and preference from the vast options. Moreover, recommendation systems can be deployed on commercial websites to help producers market their products to the right consumers.


### JUSTIFICATION

### SPECIFIC OBJECTIVES



# DATA UNDERSTANDING

In [9]:
# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.impute import SimpleImputer

In [13]:
# load data
df = pd.read_csv('Anime_data.csv', index_col = ['Anime_id'])

# preview of data
df.head()

Unnamed: 0_level_0,Title,Genre,Synopsis,Type,Producer,Studio,Rating,ScoredBy,Popularity,Members,Episodes,Source,Aired,Link
Anime_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1,Cowboy Bebop,"['Action', 'Adventure', 'Comedy', 'Drama', 'Sc...","In the year 2071, humanity has colonized sever...",TV,['Bandai Visual'],['Sunrise'],8.81,363889.0,39.0,704490.0,26.0,Original,"Apr 3, 1998 to Apr 24, 1999",https://myanimelist.net/anime/1/Cowboy_Bebop
5,Cowboy Bebop: Tengoku no Tobira,"['Action', 'Space', 'Drama', 'Mystery', 'Sci-Fi']","Another day, another bounty—such is the life o...",Movie,"['Sunrise', 'Bandai Visual']",['Bones'],8.41,111187.0,475.0,179899.0,1.0,Original,"Sep 1, 2001",https://myanimelist.net/anime/5/Cowboy_Bebop__...
6,Trigun,"['Action', 'Sci-Fi', 'Adventure', 'Comedy', 'D...","Vash the Stampede is the man with a $$60,000,0...",TV,['Victor Entertainment'],['Madhouse'],8.31,197451.0,158.0,372709.0,26.0,Manga,"Apr 1, 1998 to Sep 30, 1998",https://myanimelist.net/anime/6/Trigun
7,Witch Hunter Robin,"['Action', 'Magic', 'Police', 'Supernatural', ...",Witches are individuals with special powers li...,TV,['Bandai Visual'],['Sunrise'],7.34,31875.0,1278.0,74889.0,26.0,Original,"Jul 2, 2002 to Dec 24, 2002",https://myanimelist.net/anime/7/Witch_Hunter_R...
8,Bouken Ou Beet,"['Adventure', 'Fantasy', 'Shounen', 'Supernatu...",It is the dark century and the people are suff...,TV,,['Toei Animation'],7.04,4757.0,3968.0,11247.0,52.0,Manga,"Sep 30, 2004 to Sep 29, 2005",https://myanimelist.net/anime/8/Bouken_Ou_Beet


In [12]:
# checking last 15 rows
df.tail(15)

Unnamed: 0,Anime_id,Title,Genre,Synopsis,Type,Producer,Studio,Rating,ScoredBy,Popularity,Members,Episodes,Source,Aired,Link
16987,12723,Loups=Garous Pilot,,,Special,,,5.87,,,622.0,,,,
16988,32588,Meow no Hoshi,,,OVA,,,5.58,,,212.0,,,,
16989,9056,Agitated Screams of Maggots,,,Music,,,4.45,,,2921.0,,,,
16990,33655,Alps no Shoujo Heidi? Chara Onji,,,TV,,,6.79,,,277.0,,,,
16991,31385,Ginga Shounen Tai,,,TV,,,6.38,,,79.0,,,,
16992,31605,Kana Kana Kazoku: Kakusan Mare Bo ! 1-Wa-5-wa ...,,,ONA,,,5.11,,,44.0,,,,
16993,6366,Karuizawa Syndrome,,,OVA,,,6.27,,,145.0,,,,
16994,13459,Ribbon-chan,,,TV,,,4.83,,,93.0,,,,
16995,22391,Ring Ring Boy,,,Movie,,,4.4,,,69.0,,,,
16996,22399,Saru Kani Gassen,,,OVA,,,5.23,,,62.0,,,,


## Description of Data

In [4]:
# shape of data
df.shape

(17002, 15)

In [5]:
# check info of data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17002 entries, 0 to 17001
Data columns (total 15 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Anime_id    17002 non-null  int64  
 1   Title       17002 non-null  object 
 2   Genre       14990 non-null  object 
 3   Synopsis    15583 non-null  object 
 4   Type        16368 non-null  object 
 5   Producer    7635 non-null   object 
 6   Studio      7919 non-null   object 
 7   Rating      14425 non-null  float64
 8   ScoredBy    13227 non-null  float64
 9   Popularity  16368 non-null  float64
 10  Members     17002 non-null  float64
 11  Episodes    14085 non-null  float64
 12  Source      15075 non-null  object 
 13  Aired       16368 non-null  object 
 14  Link        16368 non-null  object 
dtypes: float64(5), int64(1), object(9)
memory usage: 1.9+ MB


In [7]:
# summary statistics
df.describe()

Unnamed: 0,Anime_id,Rating,ScoredBy,Popularity,Members,Episodes
count,17002.0,14425.0,13227.0,16368.0,17002.0,14085.0
mean,20446.579638,6.287867,11390.84,8131.919599,20381.3,11.482712
std,14342.513259,1.141401,43284.34,4714.683351,71214.04,44.08904
min,1.0,1.0,1.0,1.0,0.0,1.0
25%,5581.5,5.62,43.0,4042.5,145.0,1.0
50%,21334.0,6.41,478.0,8115.0,1113.0,1.0
75%,34789.25,7.09,3831.0,12208.25,7855.75,12.0
max,40960.0,10.0,1006242.0,16338.0,1451708.0,1818.0


In [8]:
# check for null values
df.isna().sum()

Anime_id         0
Title            0
Genre         2012
Synopsis      1419
Type           634
Producer      9367
Studio        9083
Rating        2577
ScoredBy      3775
Popularity     634
Members          0
Episodes      2917
Source        1927
Aired          634
Link           634
dtype: int64

In [10]:
# checking for duplicates
df.duplicated().sum()

63

### DATA UNDERSTANDING SUMMARY

* The data has 17002 rows and 15 columns
* 6 columns are of numeric datatypes and 9 columns of type Object

# DATA PREPARATION

In [None]:
def missing_values(data):
    """
    Identify the missing values and their percentages
    Drop values that have no missing values
    Return only dara with missing values
    """
    miss_val = data.isna().sum().sort_values(ascending = False)
    percentage = (data.isna().sum() / len(data)).sort_values(ascending = False)
    missing_values = pd.DataFrame({"Missing Values": miss_val, "In Percentage": percentage})
    missing_values.drop(missing_values[missing_values["In Percentage"] == 0].index