# Recommender Systems
### NA Handle

In [314]:
# deleting anime with 0 rating
anime_df=anime_df[~np.isnan(anime_df["rating"])]

# filling mode value for genre and type
anime_df['genre'] = anime_df['genre'].fillna(
anime_df['genre'].dropna().mode().values[0])

anime_df['type'] = anime_df['type'].fillna(
anime_df['type'].dropna().mode().values[0])

#checking if all null values are filled
anime_df.isnull().sum()

anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64

In [315]:
#rating with -1 are missing and useless values
rating_df = rating_df[rating_df["rating"]!=-1]

print(f"rating shape: {rating_df.shape}\nanime shape: {anime_df.shape}")

rating shape: (6337241, 3)
anime shape: (12064, 7)


In [316]:
#step 2
rated_anime = rating_df.merge(anime_df, left_on = 'anime_id', right_on = 'anime_id', suffixes= ['_user', ''])
rated_anime = rated_anime[["user_id","type","name","rating_user","rating"]]
rated_anime.shape

(6337234, 5)

In [317]:
rated_anime.head()

Unnamed: 0,user_id,type,name,rating_user,rating
0,1,TV,Highschool of the Dead,10,7.46
1,3,TV,Highschool of the Dead,6,7.46
2,5,TV,Highschool of the Dead,2,7.46
3,12,TV,Highschool of the Dead,6,7.46
4,14,TV,Highschool of the Dead,6,7.46


## Demographic Filtering -
**Requierments:**
* Metric to score item
* Calculate the score for every item.
* Sort the scores and recommend the best rated movie to the users.

### Chosen metric: Bayesyan average

Averaged bayesian rating formula: $R_{\beta} = w*R + (1-w)*R_{0}$ 

($w = \frac n N_{max}$)
* $w$ - Weight factor: $\lim_{x \to 0_{+}}$ means few votes  $\lim_{x \to 1_{-}}$ means many votes.
* $R$ - Individual rating: average rating for this item.
* $R_{0}$ - global average rating, for all items in your database.
* $n$ - number of votes for this item.
* $N_{max}$ - maximum number of votes for all items.

**Types of items**: TV, Music, Movie, OVA, ONA, Special

Ratings are specified for each of these item types individualy.

In [318]:
def sorter(column):
    """
    this sorts the names of the anime by a specific order
    """
    reorder = rated_anime.name.value_counts().index.tolist()
    
    # This also works:
    # mapper = {name: order for order, name in enumerate(reorder)}
    # return column.map(mapper)
    cat = pd.Categorical(column, categories=reorder, ordered=True)
    return pd.Series(cat)

anime_df = anime_df[anime_df["name"].isin(rated_anime.name.value_counts().index.tolist())].sort_values(by = "name" , key = sorter)
anime_df.head()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
40,1535,Death Note,"Mystery, Police, Psychological, Supernatural, ...",TV,37,8.71,1013917
804,11757,Sword Art Online,"Action, Adventure, Fantasy, Game, Romance",TV,25,7.83,893100
86,16498,Shingeki no Kyojin,"Action, Drama, Fantasy, Shounen, Super Power",TV,25,8.54,896229
19,1575,Code Geass: Hangyaku no Lelouch,"Action, Mecha, Military, School, Sci-Fi, Super...",TV,25,8.83,715151
159,6547,Angel Beats!,"Action, Comedy, Drama, School, Supernatural",TV,13,8.39,717796


In [319]:
anime_df[anime_df["name"]=="Saru Kani Gassen"]

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
10141,30059,Saru Kani Gassen,Drama,Movie,1,4.75,76
10140,22399,Saru Kani Gassen,Kids,OVA,1,5.23,62


In [320]:
anime_df = anime_df[~((anime_df["name"]=="Saru Kani Gassen")&(anime_df["type"]=="OVA"))] #remove a duplicate anime

In [321]:
anime_df["vote count"] = rated_anime.name.value_counts().tolist() 

### Trending anime function

the function works in the following way:
1. the function is copying the dataset.
2. a filter is activated to get the top 10% voted anime (90% percentile).
3. gets the variables needed for the defined score metric and calculated it.
4. outputs the top 10 Trending anime.

In [322]:
#filter
def filter_agg(x):
    return x.quantile(0.9)

anime_rating_filter = anime_df.groupby(by = "type").agg(filter_agg)["vote count"]

def Trending_anime(Type = None):
    """
    returns top 10 anime in a specific type of anime (TV, OVA, Movie etc.)
    """
    df = anime_df.copy()
    df = df[df["type"]==Type].loc[df['vote count'] >= anime_rating_filter[Type]]
    N_max = df[df["type"]==Type]["vote count"].max()
    R0 = df[df["type"]==Type]["rating"].mean()
    R = df[df["type"]==Type]["rating"]
    n = df[df["type"]==Type]["vote count"]
    w = n/N_max
    df["Score"] = w*R + (1-w)*R0 #Bayesyan average
    df = df.sort_values('Score', ascending=False)[:10]
    print(f"Trending in {Type}:")
    return df[["name","Score"]].style.hide_index()#hide index


Dropping invalid columns in SeriesGroupBy.agg is deprecated. In a future version, a TypeError will be raised. Before calling .agg, select only columns which should be valid for the aggregating function.



In [323]:
Trending_anime("TV")

Trending in TV:


name,Score
Fullmetal Alchemist: Brotherhood,8.759342
Death Note,8.71
Code Geass: Hangyaku no Lelouch R2,8.571979
Code Geass: Hangyaku no Lelouch,8.559731
Steins;Gate,8.543463
Clannad: After Story,8.433669
Shingeki no Kyojin,8.376595
Tengen Toppa Gurren Lagann,8.343071
Cowboy Bebop,8.270093
Angel Beats!,8.241774


In [324]:
Trending_anime("Movie")

Trending in Movie:


name,Score
Sen to Chihiro no Kamikakushi,8.93
Mononoke Hime,8.516622
Howl no Ugoku Shiro,8.508852
Suzumiya Haruhi no Shoushitsu,8.239172
Tonari no Totoro,8.179456
Toki wo Kakeru Shoujo,8.177667
Ookami Kodomo no Ame to Yuki,8.170295
Neon Genesis Evangelion: The End of Evangelion,8.097285
Hotaru no Haka,8.083231
Evangelion: 2.0 You Can (Not) Advance,8.064932


In [325]:
Trending_anime("Music")

Trending in Music:


name,Score
Shelter,8.237966
Interstella5555: The 5tory of The 5ecret 5tar 5ystem,8.17
CLAMP in Wonderland 2,7.298844
Black★Rock Shooter,7.261495
On Your Mark,7.207474
The Everlasting Guilty Crown,7.162946
Tengen Toppa Gurren Lagann: Parallel Works,7.160707
Higashi no Eden: Falling Down,7.145067
Tengen Toppa Gurren Lagann: Kirameki★Yoko Box - Pieces of Sweet Stars,7.138649
Snow Halation,7.092173


In [327]:
Trending_anime("ONA")

Trending in ONA:


name,Score
Eve no Jikan,7.865627
Ore no Imouto ga Konnani Kawaii Wake ga Nai Specials,7.66
Hetalia Axis Powers,7.656695
Bounen no Xamdou,7.505628
Hetalia World Series,7.417616
Suzumiya Haruhi-chan no Yuuutsu,7.396014
Plastic Neesan,7.268596
Hetalia: The Beautiful World,7.245433
There She Is!!,7.220638
Seitokai no Ichizon Lv.2,7.217476


In [326]:
Trending_anime("OVA")

Trending in OVA:


name,Score
FLCL,8.06
Hellsing Ultimate,8.054113
Rurouni Kenshin: Meiji Kenkaku Romantan - Tsuioku-hen,7.983917
Detroit Metal City,7.675374
Golden Boy,7.635805
Final Fantasy VII: Advent Children Complete,7.594082
Black Lagoon: Roberta's Blood Trail,7.59363
Tsubasa: Tokyo Revelations,7.572125
Hunter x Hunter: Greed Island Final,7.557157
Hunter x Hunter OVA,7.548033


In [328]:
Trending_anime("Special")

Trending in Special:


name,Score
Steins;Gate: Oukoubakko no Poriomania,8.26406
"Clannad: Mou Hitotsu no Sekai, Tomoyo-hen",8.14
Baccano! Specials,7.930564
"Clannad: After Story - Mou Hitotsu no Sekai, Kyou-hen",7.926999
Fate/stay night: Unlimited Blade Works - Prologue,7.864494
Darker than Black: Kuro no Keiyakusha Gaiden,7.863995
Kokoro Connect: Michi Random,7.842193
Durarara!! Specials,7.762084
Hajime no Ippo: Champion Road,7.696284
Mushishi Special: Hihamukage,7.681601
