## Anime Recommender

This is a simple anime recommender using cosine similarity based on the anime's genres. The data used was scraped from MyAnimeList using a custom [web scraper](https://github.com/rmagtanong/myanimelist-scraper) I created using Scrapy.

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
df = pd.read_csv('anime.csv')

In [4]:
df.head(10)

Unnamed: 0,anime_id,name,rating,genres,users_watched,users_rated,url
0,33,Kenpuu Denki Berserk,8.54,"Action,Adventure,Drama,Fantasy,Horror,Supernat...",548221,293480,https://myanimelist.net/anime/33/Kenpuu_Denki_...
1,21939,Mushishi Zoku Shou,8.7,"Adventure,Fantasy,Mystery,Slice of Life,Supern...",272380,107898,https://myanimelist.net/anime/21939/Mushishi_Z...
2,38889,Kono Oto Tomare! Part 2,8.43,"Drama,Romance,Music,Performing Arts,School,Sho...",141645,75844,https://myanimelist.net/anime/38889/Kono_Oto_T...
3,30,Neon Genesis Evangelion,8.34,"Action,Avant Garde,Drama,Sci-Fi,Mecha,Psycholo...",1577154,940938,https://myanimelist.net/anime/30/Neon_Genesis_...
4,2685,Tsubasa: Tokyo Revelations,8.29,"Action,Adventure,Drama,Fantasy,Romance,Shounen",90097,43459,https://myanimelist.net/anime/2685/Tsubasa__To...
5,2966,Ookami to Koushinryou,8.23,"Adventure,Fantasy,Romance,Adult Cast,Historical",748249,364345,https://myanimelist.net/anime/2966/Ookami_to_K...
6,33051,Mobile Suit Gundam: Iron-Blooded Orphans 2nd S...,8.23,"Action,Drama,Sci-Fi,Mecha,Space",123153,69445,https://myanimelist.net/anime/33051/Mobile_Sui...
7,32843,Senki Zesshou Symphogear XV,8.19,"Action,Sci-Fi,Idols (Female),Music",34642,11982,https://myanimelist.net/anime/32843/Senki_Zess...
8,39198,Kanata no Astra,8.1,"Adventure,Mystery,Sci-Fi,Space,Survival,Shounen",264777,129276,https://myanimelist.net/anime/39198/Kanata_no_...
9,2164,Dennou Coil,8.06,"Adventure,Comedy,Drama,Mystery,Sci-Fi",136460,40977,https://myanimelist.net/anime/2164/Dennou_Coil


In [5]:
df.shape

(10049, 7)

In [6]:
df.isnull().sum()

anime_id          0
name              0
rating            0
genres           18
users_watched     0
users_rated       0
url               0
dtype: int64

In [7]:
df[df['genres'].isnull()]

Unnamed: 0,anime_id,name,rating,genres,users_watched,users_rated,url
1825,50549,Bubble,7.31,,130589,64702,https://myanimelist.net/anime/50549/Bubble
2084,6399,Higashi no Eden: Falling Down,7.37,,13639,7590,https://myanimelist.net/anime/6399/Higashi_no_...
2508,51308,Yoku,7.49,,826,488,https://myanimelist.net/anime/51308/Yoku
2577,51230,Taikutsu wo Saien Shinaide,7.14,,626,357,https://myanimelist.net/anime/51230/Taikutsu_w...
4435,32468,Nirvana,6.79,,7869,3541,https://myanimelist.net/anime/32468/Nirvana
4635,32700,Heart Realize,6.83,,7648,3496,https://myanimelist.net/anime/32700/Heart_Realize
5098,50288,Cinderella,6.93,,1354,798,https://myanimelist.net/anime/50288/Cinderella
5830,50352,Peko Random Brain!,6.73,,473,293,https://myanimelist.net/anime/50352/Peko_Rando...
6014,50964,Koufukuron,6.46,,542,289,https://myanimelist.net/anime/50964/Koufukuron
7470,6368,Legend of Regios,6.42,,9982,2682,https://myanimelist.net/anime/6368/Legend_of_R...


In [9]:
100*df.isnull().sum()/len(df)

anime_id         0.000000
name             0.000000
rating           0.000000
genres           0.179122
users_watched    0.000000
users_rated      0.000000
url              0.000000
dtype: float64

#### We will use the genres to generate our recommender, so we can't have missing values. We can drop the entries since missing data only accounts for 0.179122% of our data

In [10]:
df = df.dropna()
df.isnull().sum()

anime_id         0
name             0
rating           0
genres           0
users_watched    0
users_rated      0
url              0
dtype: int64

#### Let's drop anime_id and users_rated since we don't need them

In [11]:
df.drop(['anime_id', 'users_rated'], axis=1, inplace=True)

In [12]:
df.head()

Unnamed: 0,name,rating,genres,users_watched,url
0,Kenpuu Denki Berserk,8.54,"Action,Adventure,Drama,Fantasy,Horror,Supernat...",548221,https://myanimelist.net/anime/33/Kenpuu_Denki_...
1,Mushishi Zoku Shou,8.7,"Adventure,Fantasy,Mystery,Slice of Life,Supern...",272380,https://myanimelist.net/anime/21939/Mushishi_Z...
2,Kono Oto Tomare! Part 2,8.43,"Drama,Romance,Music,Performing Arts,School,Sho...",141645,https://myanimelist.net/anime/38889/Kono_Oto_T...
3,Neon Genesis Evangelion,8.34,"Action,Avant Garde,Drama,Sci-Fi,Mecha,Psycholo...",1577154,https://myanimelist.net/anime/30/Neon_Genesis_...
4,Tsubasa: Tokyo Revelations,8.29,"Action,Adventure,Drama,Fantasy,Romance,Shounen",90097,https://myanimelist.net/anime/2685/Tsubasa__To...


#### We will recommend based solely on genres, so let's explore the column

In [13]:
df['genres'][0]

'Action,Adventure,Drama,Fantasy,Horror,Supernatural,Gore,Military,Mythology,Seinen'

In [14]:
type(df['genres'][0])

str

#### Create dummies for each Genre

In [15]:
df = pd.concat([df, df['genres'].str.get_dummies(sep=',')], axis=1)

In [16]:
df.head()

Unnamed: 0,name,rating,genres,users_watched,url,Action,Adult Cast,Adventure,Anthropomorphic,Avant Garde,...,Super Power,Supernatural,Survival,Suspense,Team Sports,Time Travel,Vampire,Video Game,Visual Arts,Workplace
0,Kenpuu Denki Berserk,8.54,"Action,Adventure,Drama,Fantasy,Horror,Supernat...",548221,https://myanimelist.net/anime/33/Kenpuu_Denki_...,1,0,1,0,0,...,0,1,0,0,0,0,0,0,0,0
1,Mushishi Zoku Shou,8.7,"Adventure,Fantasy,Mystery,Slice of Life,Supern...",272380,https://myanimelist.net/anime/21939/Mushishi_Z...,0,1,1,0,0,...,0,1,0,0,0,0,0,0,0,0
2,Kono Oto Tomare! Part 2,8.43,"Drama,Romance,Music,Performing Arts,School,Sho...",141645,https://myanimelist.net/anime/38889/Kono_Oto_T...,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Neon Genesis Evangelion,8.34,"Action,Avant Garde,Drama,Sci-Fi,Mecha,Psycholo...",1577154,https://myanimelist.net/anime/30/Neon_Genesis_...,1,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
4,Tsubasa: Tokyo Revelations,8.29,"Action,Adventure,Drama,Fantasy,Romance,Shounen",90097,https://myanimelist.net/anime/2685/Tsubasa__To...,1,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10031 entries, 0 to 10048
Data columns (total 79 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   name               10031 non-null  object 
 1   rating             10031 non-null  float64
 2   genres             10031 non-null  object 
 3   users_watched      10031 non-null  int64  
 4   url                10031 non-null  object 
 5   Action             10031 non-null  int64  
 6   Adult Cast         10031 non-null  int64  
 7   Adventure          10031 non-null  int64  
 8   Anthropomorphic    10031 non-null  int64  
 9   Avant Garde        10031 non-null  int64  
 10  Award Winning      10031 non-null  int64  
 11  Boys Love          10031 non-null  int64  
 12  CGDCT              10031 non-null  int64  
 13  Childcare          10031 non-null  int64  
 14  Combat Sports      10031 non-null  int64  
 15  Comedy             10031 non-null  int64  
 16  Crossdressing      100

In [18]:
anime_features = df.loc[:, 'Action':].copy()

In [19]:
anime_features.head()

Unnamed: 0,Action,Adult Cast,Adventure,Anthropomorphic,Avant Garde,Award Winning,Boys Love,CGDCT,Childcare,Combat Sports,...,Super Power,Supernatural,Survival,Suspense,Team Sports,Time Travel,Vampire,Video Game,Visual Arts,Workplace
0,1,0,1,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,0,1,1,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,1,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [20]:
cos_sim = cosine_similarity(anime_features.values, anime_features.values)

In [21]:
cos_sim

array([[1.        , 0.42163702, 0.12909944, ..., 0.36514837, 0.        ,
        0.2236068 ],
       [0.42163702, 1.        , 0.        , ..., 0.19245009, 0.        ,
        0.23570226],
       [0.12909944, 0.        , 1.        , ..., 0.23570226, 0.        ,
        0.        ],
       ...,
       [0.36514837, 0.19245009, 0.23570226, ..., 1.        , 0.        ,
        0.40824829],
       [0.        , 0.        , 0.        , ..., 0.        , 1.        ,
        0.35355339],
       [0.2236068 , 0.23570226, 0.        , ..., 0.40824829, 0.35355339,
        1.        ]])

In [22]:
cos_sim.shape

(10031, 10031)

In [23]:
anime_index = pd.Series(df.index, index=df.name).drop_duplicates()

In [24]:
anime_index

name
Kenpuu Denki Berserk                           0
Mushishi Zoku Shou                             1
Kono Oto Tomare! Part 2                        2
Neon Genesis Evangelion                        3
Tsubasa: Tokyo Revelations                     4
                                           ...  
Ichirin-sha                                10044
Hitojichi Koukan feat. Hatsune Miku        10045
Highschool Aurabuster: Hikari no Mezame    10046
Heybot!                                    10047
Hello Kitty no Yuki no Joou                10048
Length: 10031, dtype: int64

In [41]:
def recommend_anime(anime_name, similarity=cos_sim):
    
    # get index of anime input
    index = anime_index[anime_name]
    
    # get cosine similarity scores from input anime
    similarity_scores = list(enumerate(cos_sim[index]))
    
    # sort list from highest score to lowest
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    
    # get only 10 highest similarity scores
    similarity_scores = similarity_scores[0:11]

    # get indices of top 10 animes
    anime_indices = [anime_index[0] for anime_index in similarity_scores]

    # get anime info from index
    recommendations = df[['name', 'genres', 'rating', 'users_watched', 'url']].iloc[anime_indices].drop(index)
    
    return recommendations

In [42]:
recommend_anime('Cowboy Bebop')

Unnamed: 0,name,genres,rating,users_watched,url
677,Cowboy Bebop: Tengoku no Tobira,"Action,Sci-Fi,Adult Cast,Space",8.38,337711,https://myanimelist.net/anime/5/Cowboy_Bebop__...
139,Ginga Eiyuu Densetsu: Waga Yuku wa Hoshi no Ta...,"Action,Sci-Fi,Adult Cast,Military,Space",7.9,35385,https://myanimelist.net/anime/3014/Ginga_Eiyuu...
1400,Space Cobra,"Action,Adventure,Sci-Fi,Adult Cast,Space",7.68,19710,https://myanimelist.net/anime/2451/Space_Cobra
3402,Cobra The Animation,"Action,Adventure,Sci-Fi,Adult Cast,Space",7.09,8254,https://myanimelist.net/anime/5032/Cobra_The_A...
4026,Space Adventure Cobra,"Action,Adventure,Sci-Fi,Adult Cast,Space",7.09,9292,https://myanimelist.net/anime/2452/Space_Adven...
4826,Cobra The Animation: Time Drive,"Action,Adventure,Sci-Fi,Adult Cast,Space",6.87,4033,https://myanimelist.net/anime/5031/Cobra_The_A...
7515,Space Cobra Pilot,"Action,Adventure,Sci-Fi,Adult Cast,Space",6.42,1997,https://myanimelist.net/anime/5742/Space_Cobra...
2277,Cowboy Bebop: Yose Atsume Blues,"Sci-Fi,Adult Cast,Space",7.42,44565,https://myanimelist.net/anime/4037/Cowboy_Bebo...
5074,Seihou Bukyou Outlaw Star Pilot,"Action,Sci-Fi,Space",6.93,6381,https://myanimelist.net/anime/4650/Seihou_Buky...
5181,Wonder Beat Scramble,"Action,Sci-Fi,Space",6.61,1012,https://myanimelist.net/anime/4083/Wonder_Beat...


In [44]:
recommend_anime('Mob Psycho 100')

Unnamed: 0,name,genres,rating,users_watched,url
822,Mob Psycho 100 II,"Action,Comedy,Supernatural,Super Power",8.81,1260624,https://myanimelist.net/anime/37510/Mob_Psycho...
1267,Mob Psycho 100: Dai Ikkai Rei toka Soudansho I...,"Action,Comedy,Supernatural,Super Power",7.63,80548,https://myanimelist.net/anime/39651/Mob_Psycho...
1913,Mob Psycho 100: Reigen - Shirarezaru Kiseki no...,"Action,Comedy,Supernatural,Super Power",7.33,81922,https://myanimelist.net/anime/36616/Mob_Psycho...
173,Bungou Stray Dogs: Dead Apple,"Action,Comedy,Mystery,Supernatural,Super Power",7.91,222469,https://myanimelist.net/anime/34944/Bungou_Str...
1701,Yozakura Quartet: Hana no Uta,"Action,Comedy,Supernatural,Super Power,Shounen",7.47,117386,https://myanimelist.net/anime/18497/Yozakura_Q...
2134,Yozakura Quartet: Hoshi no Umi,"Action,Comedy,Supernatural,Super Power,Shounen",7.39,37057,https://myanimelist.net/anime/8457/Yozakura_Qu...
2498,Yozakura Quartet: Tsuki ni Naku,"Action,Comedy,Supernatural,Super Power,Shounen",7.48,27394,https://myanimelist.net/anime/18499/Yozakura_Q...
4494,Yozakura Quartet,"Action,Comedy,Supernatural,Super Power,Shounen",6.8,143725,https://myanimelist.net/anime/4548/Yozakura_Qu...
75,Durarara!! Specials,"Action,Comedy,Supernatural",7.86,161364,https://myanimelist.net/anime/8408/Durarara_Sp...
1116,K: Return of Kings,"Action,Supernatural,Super Power",7.57,307078,https://myanimelist.net/anime/27991/K__Return_...


In [45]:
recommend_anime('Great Teacher Onizuka')

Unnamed: 0,name,genres,rating,users_watched,url
9495,Rokudenashi Blues,"Comedy,Drama,Sports,Combat Sports,Delinquents,...",5.81,2038,https://myanimelist.net/anime/9077/Rokudenashi...
161,Sakigake!! Cromartie Koukou,"Comedy,Delinquents,Gag Humor,School,Shounen",7.91,128069,https://myanimelist.net/anime/114/Sakigake_Cro...
2095,Araburu Kisetsu no Otome-domo yo.,"Comedy,Drama,Romance,School,Shounen",7.37,291147,https://myanimelist.net/anime/38753/Araburu_Ki...
2249,Gokusen,"Comedy,Drama,Slice of Life,Delinquents,Organiz...",7.41,41272,https://myanimelist.net/anime/242/Gokusen
7039,Rokudenashi Blues 1993,"Action,Comedy,Drama,Sports,Combat Sports,Delin...",6.32,1112,https://myanimelist.net/anime/9078/Rokudenashi...
192,SKET Dance: Imouto no Nayami ni Nayamu Ani ni ...,"Comedy,School,Shounen",7.93,20542,https://myanimelist.net/anime/16395/SKET_Dance...
355,Kyou kara Ore wa!!,"Comedy,Delinquents,Shounen",8.06,29704,https://myanimelist.net/anime/851/Kyou_kara_Or...
503,SKET Dance,"Comedy,School,Shounen",8.22,211074,https://myanimelist.net/anime/9863/SKET_Dance
3691,Yankee-kun na Yamada-kun to Megane-chan to Majo,"Comedy,School,Shounen",7.01,23690,https://myanimelist.net/anime/30641/Yankee-kun...
5024,Houkago no Ouji-sama,"Comedy,School,Shounen",6.92,2859,https://myanimelist.net/anime/24457/Houkago_no...


Take this example, from [Kaguya-sama Season 3's](https://myanimelist.net/anime/43608/Kaguya-sama_wa_Kokurasetai__Ultra_Romantic) page:

<img align="left" src="kaguya.png">


Compare to [Cowboy Bebop](https://myanimelist.net/anime/1/Cowboy_Bebop) (no Demographic):

<img align="left" src="bebop.png">


Compare to [Made in Abyss](https://myanimelist.net/anime/34599/Made_in_Abyss) (no Themes/Demographic):

<img align="left" src="madeinabyss.png">

Ideally genres, themes, and demographic would have their own fields. I could not separate them due to limitations with MyAnimeList's website design. Also, as seen above, not all anime have complete fields for Genres, Themes, and Demographic. 

The recommender could be improved with better data, which could be achieved by:

1. Separating Genres, Themes, and Demographic into their own columns using more advanced scraping methods
2. Using feature engineering to separate the columns
3. Scraping from a website with more complete data