# Recommendation System

Data Description:

Unique ID of each anime.
Anime title.
Anime broadcast type, such as TV, OVA, etc.
anime genre.
The number of episodes of each anime.
The average rating for each anime compared to the number of users who gave ratings.


Number of community members for each anime.
Objective:
The objective of this assignment is to implement a recommendation system using cosine similarity on an anime dataset. 
Dataset:
Use the Anime Dataset which contains information about various anime, including their titles, genres,No.of episodes and user ratings etc.

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Data Preprocessing:

In [6]:
df=pd.read_csv('anime.csv')

In [8]:
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266
...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1,4.15,211
12290,5543,Under World,Hentai,OVA,1,4.28,183
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4,4.88,219
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1,4.98,175


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12294 entries, 0 to 12293
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   anime_id  12294 non-null  int64  
 1   name      12294 non-null  object 
 2   genre     12232 non-null  object 
 3   type      12269 non-null  object 
 4   episodes  12294 non-null  object 
 5   rating    12064 non-null  float64
 6   members   12294 non-null  int64  
dtypes: float64(1), int64(2), object(4)
memory usage: 672.5+ KB


In [12]:
df.shape

(12294, 7)

In [14]:
df.isnull().sum()

anime_id      0
name          0
genre        62
type         25
episodes      0
rating      230
members       0
dtype: int64

In [20]:
import warnings
warnings.filterwarnings('ignore')

In [22]:
df['genre'].fillna('Unknown', inplace=True)

In [24]:
df.isnull().sum()

anime_id      0
name          0
genre         0
type         25
episodes      0
rating      230
members       0
dtype: int64

In [26]:
df['type'].fillna('Unknown', inplace=True)

In [28]:
df['episodes'] = df['episodes'].replace('Unknown', np.nan)

In [30]:
df['episodes'] = pd.to_numeric(df['episodes'])
df['episodes'].fillna(df['episodes'].median(), inplace=True)

In [32]:
df['rating'].fillna(df['rating'].median(), inplace=True)

In [34]:
df.isnull().sum()

anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64

In [36]:
df.duplicated().sum()

0

# Feature Extraction:

In [38]:
all_genres = set()
for genres in df['genre'].str.split(','):
    for g in genres:
        all_genres.add(g.strip())

for genre in all_genres:
    df[f'genre_{genre}'] = df['genre'].apply(lambda x: 1 if genre in x else 0)

In [40]:
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,genre_Shounen,genre_Psychological,genre_Magic,...,genre_Dementia,genre_Yaoi,genre_Supernatural,genre_Samurai,genre_Yuri,genre_Super Power,genre_Seinen,genre_Police,genre_Thriller,genre_Fantasy
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1.0,9.37,200630,0,0,0,...,0,0,1,0,0,0,0,0,0,0
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64.0,9.26,793665,1,0,1,...,0,0,0,0,0,0,0,0,0,1
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.25,114262,1,0,0,...,0,0,0,1,0,0,0,0,0,0
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24.0,9.17,673572,0,0,0,...,0,0,0,0,0,0,0,0,1,0
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.16,151266,1,0,0,...,0,0,0,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1.0,4.15,211,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12290,5543,Under World,Hentai,OVA,1.0,4.28,183,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4.0,4.88,219,0,0,0,...,0,0,0,0,0,0,0,0,0,0
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1.0,4.98,175,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [42]:
type_dummies = pd.get_dummies(df['type'], prefix='type', dtype=int)
df = pd.concat([df, type_dummies], axis=1)

In [44]:
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,genre_Shounen,genre_Psychological,genre_Magic,...,genre_Police,genre_Thriller,genre_Fantasy,type_Movie,type_Music,type_ONA,type_OVA,type_Special,type_TV,type_Unknown
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1.0,9.37,200630,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64.0,9.26,793665,1,0,1,...,0,0,1,0,0,0,0,0,1,0
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.25,114262,1,0,0,...,0,0,0,0,0,0,0,0,1,0
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24.0,9.17,673572,0,0,0,...,0,1,0,0,0,0,0,0,1,0
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.16,151266,1,0,0,...,0,0,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1.0,4.15,211,0,0,0,...,0,0,0,0,0,0,1,0,0,0
12290,5543,Under World,Hentai,OVA,1.0,4.28,183,0,0,0,...,0,0,0,0,0,0,1,0,0,0
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4.0,4.88,219,0,0,0,...,0,0,0,0,0,0,1,0,0,0
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1.0,4.98,175,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [194]:
numerical_features = ['episodes', 'rating', 'members']

In [196]:
df[numerical_features]

Unnamed: 0,episodes,rating,members
0,1.0,9.37,200630
1,64.0,9.26,793665
2,51.0,9.25,114262
3,24.0,9.17,673572
4,51.0,9.16,151266
...,...,...,...
12289,1.0,4.15,211
12290,1.0,4.28,183
12291,4.0,4.88,219
12292,1.0,4.98,175


# Recommendation System:

In [68]:
from sklearn.metrics.pairwise import cosine_similarity

In [70]:
df[numerical_features].shape

(12294, 3)

In [198]:
df[numerical_features].groupby('episodes')['rating'].mean().sort_values(ascending=False)[:5]

episodes
201.0    9.040000
148.0    8.545000
203.0    8.370000
291.0    8.320000
120.0    8.290000
           ...   
312.0    6.395000
85.0     6.380000
28.0     6.377143
140.0    6.355000
5.0      6.349917
Name: rating, Length: 162, dtype: float64

In [200]:
df1=df[numerical_features].pivot_table(index='episodes',columns='members',values='rating')

In [202]:
df1

members,5,11,12,13,15,17,19,20,21,22,...,633817,657190,673572,683297,715151,717796,793665,893100,896229,1013917
episodes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,6.57,,,8.5,8.0,6.0,,6.776667,7.75,6.27,...,,,,,,,,,,
2.0,,6.57,,,,,6.57,,,6.57,...,,,,,,,,,,
3.0,,,,,,,,,,,...,,,,,,,,,,
4.0,,,,,,,,,,,...,,,,,,,,,,
5.0,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1428.0,,,,,,,,,,,...,,,,,,,,,,
1471.0,,,,,,,,,,,...,,,,,,,,,,
1565.0,,,,,,,,,,,...,,,,,,,,,,
1787.0,,,,,,,,,,,...,,,,,,,,,,


In [204]:
df1.fillna(0,axis=1,inplace=True)

In [206]:
df1

members,5,11,12,13,15,17,19,20,21,22,...,633817,657190,673572,683297,715151,717796,793665,893100,896229,1013917
episodes,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1.0,6.57,0.00,0.0,8.5,8.0,6.0,0.00,6.776667,7.75,6.27,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2.0,0.00,6.57,0.0,0.0,0.0,0.0,6.57,0.000000,0.00,6.57,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3.0,0.00,0.00,0.0,0.0,0.0,0.0,0.00,0.000000,0.00,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4.0,0.00,0.00,0.0,0.0,0.0,0.0,0.00,0.000000,0.00,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5.0,0.00,0.00,0.0,0.0,0.0,0.0,0.00,0.000000,0.00,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1428.0,0.00,0.00,0.0,0.0,0.0,0.0,0.00,0.000000,0.00,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1471.0,0.00,0.00,0.0,0.0,0.0,0.0,0.00,0.000000,0.00,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1565.0,0.00,0.00,0.0,0.0,0.0,0.0,0.00,0.000000,0.00,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1787.0,0.00,0.00,0.0,0.0,0.0,0.0,0.00,0.000000,0.00,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [86]:
similarity=cosine_similarity(df1)

In [88]:
similarity.shape

(12292, 12292)

In [208]:
def recommend_movie(similar_movie):
    if similar_movie in df1.index:
        index=np.where(similar_movie==df1.index)[0][0]
        similar=sorted(list(enumerate(similarity[8])),reverse=True,key=lambda x: x[1])[1:6]
        print(f'recommended movie of {similar_movie}')
        print('-'*20)
        for movie in similar:
            print(df1.index[movie[0]])
    else:
        ('movie is not in the list')

In [212]:
recommend_movie(1428.0)

recommended movie of 1428.0
--------------------
1.0
2.0
3.0
4.0
5.0


In [214]:
df2=df[numerical_features].pivot_table(index='members',columns='episodes',values='rating')

In [216]:
df2

episodes,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,...,726.0,773.0,1006.0,1274.0,1306.0,1428.0,1471.0,1565.0,1787.0,1818.0
members,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5,6.57,,,,,,,,,,...,,,,,,,,,,
11,,6.57,,,,,,,,,...,,,,,,,,,,
12,,,,,,,,,,,...,,,,,,,,,,
13,8.50,,,,,,,,,,...,,,,,,,,,,
15,8.00,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
717796,,,,,,,,,,,...,,,,,,,,,,
793665,,,,,,,,,,,...,,,,,,,,,,
893100,,,,,,,,,,,...,,,,,,,,,,
896229,,,,,,,,,,,...,,,,,,,,,,


In [218]:
df2.fillna(0,axis=1,inplace=True)

In [228]:
df2

episodes,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0,...,726.0,773.0,1006.0,1274.0,1306.0,1428.0,1471.0,1565.0,1787.0,1818.0
members,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5,6.57,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
11,0.00,6.57,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
12,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
13,8.50,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
15,8.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
717796,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
793665,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
893100,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
896229,0.00,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [230]:
arr=cosine_similarity(df2)

In [232]:
df3=pd.DataFrame(arr,index=df.members.unique(),columns=df.members.unique())

In [234]:
df3

Unnamed: 0,200630,793665,114262,673572,151266,93351,425855,80679,72534,81109,...,838,1092,2413,3374,4550,5551,29463,27411,57355,652
200630,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.646355,1.0,0.690397,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
793665,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.000000,0.0,0.723431,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
114262,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
673572,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.646355,1.0,0.690397,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
151266,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.646355,1.0,0.690397,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5551,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.763036,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
29463,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
27411,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0
57355,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.000000,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,1.0,0.0


In [236]:
df3[652]>=0.8

200630    False
793665    False
114262    False
673572    False
151266    False
          ...  
5551      False
29463     False
27411     False
57355     False
652        True
Name: 652, Length: 6706, dtype: bool

In [246]:
df3[df3[652]>=0.8][652].sort_values(ascending=False)[1:5]

193822    1.0
33042     1.0
722       1.0
652       1.0
Name: 652, dtype: float64

In [248]:
df[numerical_features][(df[numerical_features].members==652) | (df[numerical_features].members==722)]

Unnamed: 0,episodes,rating,members
6119,1.0,6.25,722
12232,1.0,4.99,652
12233,1.0,4.95,652


In [250]:
df

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members,genre_Shounen,genre_Psychological,genre_Magic,...,genre_Police,genre_Thriller,genre_Fantasy,type_Movie,type_Music,type_ONA,type_OVA,type_Special,type_TV,type_Unknown
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1.0,9.37,200630,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64.0,9.26,793665,1,0,1,...,0,0,1,0,0,0,0,0,1,0
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.25,114262,1,0,0,...,0,0,0,0,0,0,0,0,1,0
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24.0,9.17,673572,0,0,0,...,0,1,0,0,0,0,0,0,1,0
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51.0,9.16,151266,1,0,0,...,0,0,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12289,9316,Toushindai My Lover: Minami tai Mecha-Minami,Hentai,OVA,1.0,4.15,211,0,0,0,...,0,0,0,0,0,0,1,0,0,0
12290,5543,Under World,Hentai,OVA,1.0,4.28,183,0,0,0,...,0,0,0,0,0,0,1,0,0,0
12291,5621,Violence Gekiga David no Hoshi,Hentai,OVA,4.0,4.88,219,0,0,0,...,0,0,0,0,0,0,1,0,0,0
12292,6133,Violence Gekiga Shin David no Hoshi: Inma Dens...,Hentai,OVA,1.0,4.98,175,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [254]:
df['name'].value_counts()

name
Shi Wan Ge Leng Xiaohua                           2
Saru Kani Gassen                                  2
Bakabon Osomatsu no Karee wo Tazunete Sansenri    1
Backkom Meogeujan Yeohaeng                        1
Backkom Mission Impossible                        1
                                                 ..
Yoroiden Samurai Troopers Kikoutei Densetsu       1
Yuu☆Yuu☆Hakusho: Mu Mu Hakusho                    1
3-gatsu no Lion meets Bump of Chicken             1
Bannou Bunka Neko-Musume                          1
Yasuji no Pornorama: Yacchimae!!                  1
Name: count, Length: 12292, dtype: int64

In [256]:
numerical_features

['episodes', 'rating', 'members']

In [258]:
target=df[['type']]

In [262]:
target

Unnamed: 0,type
0,Movie
1,TV
2,TV
3,TV
4,TV
...,...
12289,OVA
12290,OVA
12291,OVA
12292,OVA


In [270]:
df['type'].value_counts()

type
TV         3787
OVA        3311
Movie      2348
Special    1676
ONA         659
Music       488
Unknown      25
Name: count, dtype: int64

In [264]:
from sklearn.preprocessing import OrdinalEncoder

In [266]:
ord_enc=OrdinalEncoder()

In [274]:
target=pd.DataFrame(ord_enc.fit_transform(target),columns=target.columns)

In [276]:
target

Unnamed: 0,type
0,0.0
1,5.0
2,5.0
3,5.0
4,5.0
...,...
12289,3.0
12290,3.0
12291,3.0
12292,3.0


# Evaluation:

In [280]:
from sklearn.model_selection import train_test_split

In [290]:
features=df[numerical_features]

In [292]:
features

Unnamed: 0,episodes,rating,members
0,1.0,9.37,200630
1,64.0,9.26,793665
2,51.0,9.25,114262
3,24.0,9.17,673572
4,51.0,9.16,151266
...,...,...,...
12289,1.0,4.15,211
12290,1.0,4.28,183
12291,4.0,4.88,219
12292,1.0,4.98,175


In [389]:
from sklearn.preprocessing import MinMaxScaler

In [394]:
scaler = MinMaxScaler()
features=pd.DataFrame(scaler.fit_transform(features),columns=features.columns)
features

Unnamed: 0,episodes,rating,members
0,0.000000,0.924370,0.197872
1,0.034673,0.911164,0.782770
2,0.027518,0.909964,0.112689
3,0.012658,0.900360,0.664325
4,0.027518,0.899160,0.149186
...,...,...,...
12289,0.000000,0.297719,0.000203
12290,0.000000,0.313325,0.000176
12291,0.001651,0.385354,0.000211
12292,0.000000,0.397359,0.000168


In [532]:
x_train,x_test,y_train,y_test=train_test_split(features,target,train_size=0.7,random_state=40)

In [534]:
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

(8605, 3)
(3689, 3)
(8605, 1)
(3689, 1)


In [536]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In [537]:
log_model=LogisticRegression()

In [540]:
log_model.fit(x_train,y_train)

In [541]:
y_pred= log_model.predict(x_test)
y_pred

array([3., 3., 5., ..., 5., 5., 3.])

In [542]:
accuracy_score(y_test,y_pred)

0.451341827053402

In [550]:
from sklearn.metrics import confusion_matrix,classification_report

In [552]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

         0.0       0.00      0.00      0.00       691
         1.0       0.00      0.00      0.00       147
         2.0       0.16      0.03      0.05       190
         3.0       0.36      0.76      0.49      1012
         4.0       0.00      0.00      0.00       519
         5.0       0.59      0.79      0.67      1127
         6.0       0.00      0.00      0.00         3

    accuracy                           0.45      3689
   macro avg       0.16      0.23      0.17      3689
weighted avg       0.29      0.45      0.34      3689



# Interview Questions: