---

#***ANIME RECOMMENDATION SYSTEM***

---


* ***FRAMING THE PROBLEM***

I've a dataset of 28666 animes and I've to build an Anime Recommendation System and deploy it on website.

* ***GATHERING DATA***

I've fetched data from Jikan API, which is an unofficial API of MyAnimelist.


* ***TECHNIQUE USED***

I am going to use Content-Based filtering, in which recommendation is decided on the basis of content they like.

* ***EVALUATION METRICS***

PASS

* ***ASSUMPTIONS***

Since the data is missing 90% english titles, we're assuming that people will know the japanese name of animes.

* ***DATA DESCRIPTION***

This dataset specially target to those people who want to improve their skill of data cleaning. Also this can be used for NLP.

Data Description:-
mal_id: Unique MyAnimeList identifier.

url: Direct link to the anime’s MAL page.

images: Dictionary with URLs for poster, cover, etc.

trailer: YouTube trailer URL (if available).

approved: Boolean indicating MAL moderation approval.

titles: All titles (English, Japanese, synonyms).

title: Primary title (default language).

title_english: Official English title.

title_japanese: Official Japanese title.

title_synonyms: Alternate titles (aliases).

type: Media type (TV, Movie, OVA, etc.).

source: Original material (Manga, Light Novel, etc.).

episodes: Total episode count.

status: Airing status (Finished, Ongoing, etc.).

airing: Boolean for currently airing.

aired: Date range (start/end of broadcast).

duration: Episode runtime (e.g., "24 min").

rating: Age rating (PG-13, R, etc.).

score: Average user rating (1-10).

scored_by: Number of users who rated it.

rank: MAL popularity ranking.

popularity: Position in most-viewed list.

members: Number of MAL users with it in their list.

favorites: Number of users who favorited it.

synopsis: Plot summary.

background: Production/release context.

season: Airing season (Winter, Summer, etc.).

year: Release year.

broadcast: Weekly airing schedule (e.g., "Sundays").

producers: Production companies involved.

licensors: Licensed-by companies (for distribution).

studios: Animation studios.

genres: Main genres (Action, Romance, etc.).

explicit_genres: Adult-oriented genres (Hentai, etc.).

themes: Sub-genres/themes (e.g., "Music", "Military").

demographics: Target audience (Shonen, Shojo, etc.).


# **IMPORT LIBRARIES**

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import ast
import re
import nltk
from nltk.corpus import stopwords
import string
from nltk.stem.porter import PorterStemmer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer

In [3]:
nltk.download('stopwords')
nltk.download('punkt_tab')


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [4]:
pd.set_option('display.max_columns', None)

In [5]:
df = pd.read_csv('/content/drive/MyDrive/NLP practice datasets/jikan_28666_anime.csv')

In [6]:
df.head()

Unnamed: 0,mal_id,url,images,trailer,approved,titles,title,title_english,title_japanese,title_synonyms,type,source,episodes,status,airing,aired,duration,rating,score,scored_by,rank,popularity,members,favorites,synopsis,background,season,year,broadcast,producers,licensors,studios,genres,explicit_genres,themes,demographics
0,52991,https://myanimelist.net/anime/52991/Sousou_no_...,{'jpg': {'image_url': 'https://cdn.myanimelist...,"{'youtube_id': 'ZEkwCGJ3o7M', 'url': 'https://...",True,"[{'type': 'Default', 'title': 'Sousou no Frier...",Sousou no Frieren,Frieren: Beyond Journey's End,葬送のフリーレン,"['Frieren at the Funeral', 'Frieren The Slayer']",TV,Manga,28.0,Finished Airing,False,"{'from': '2023-09-29T00:00:00+00:00', 'to': '2...",24 min per ep,PG-13 - Teens 13 or older,9.3,659282.0,1.0,143,1118759,68946,During their decade-long quest to defeat the D...,Sousou no Frieren was released on Blu-ray and ...,fall,2023.0,"{'day': 'Fridays', 'time': '23:00', 'timezone'...","[{'mal_id': 17, 'type': 'anime', 'name': 'Anip...","[{'mal_id': 1468, 'type': 'anime', 'name': 'Cr...","[{'mal_id': 11, 'type': 'anime', 'name': 'Madh...","[{'mal_id': 2, 'type': 'anime', 'name': 'Adven...",[],[],"[{'mal_id': 27, 'type': 'anime', 'name': 'Shou..."
1,5114,https://myanimelist.net/anime/5114/Fullmetal_A...,{'jpg': {'image_url': 'https://cdn.myanimelist...,"{'youtube_id': '1ac3_YdSSy0', 'url': 'https://...",True,"[{'type': 'Default', 'title': 'Fullmetal Alche...",Fullmetal Alchemist: Brotherhood,Fullmetal Alchemist: Brotherhood,鋼の錬金術師 FULLMETAL ALCHEMIST,['Hagane no Renkinjutsushi: Fullmetal Alchemis...,TV,Manga,64.0,Finished Airing,False,"{'from': '2009-04-05T00:00:00+00:00', 'to': '2...",24 min per ep,R - 17+ (violence & profanity),9.1,2217270.0,2.0,3,3521982,233780,After a horrific alchemy experiment goes wrong...,,spring,2009.0,"{'day': 'Sundays', 'time': '17:00', 'timezone'...","[{'mal_id': 17, 'type': 'anime', 'name': 'Anip...","[{'mal_id': 102, 'type': 'anime', 'name': 'Fun...","[{'mal_id': 4, 'type': 'anime', 'name': 'Bones...","[{'mal_id': 1, 'type': 'anime', 'name': 'Actio...",[],"[{'mal_id': 38, 'type': 'anime', 'name': 'Mili...","[{'mal_id': 27, 'type': 'anime', 'name': 'Shou..."
2,9253,https://myanimelist.net/anime/9253/Steins_Gate,{'jpg': {'image_url': 'https://cdn.myanimelist...,"{'youtube_id': '27OZc-ku6is', 'url': 'https://...",True,"[{'type': 'Default', 'title': 'Steins;Gate'}, ...",Steins;Gate,Steins;Gate,STEINS;GATE,[],TV,Visual novel,24.0,Finished Airing,False,"{'from': '2011-04-06T00:00:00+00:00', 'to': '2...",24 min per ep,PG-13 - Teens 13 or older,9.07,1463446.0,3.0,14,2697340,196036,Eccentric scientist Rintarou Okabe has a never...,Steins;Gate is based on 5pb. and Nitroplus' vi...,spring,2011.0,"{'day': 'Wednesdays', 'time': '02:05', 'timezo...","[{'mal_id': 61, 'type': 'anime', 'name': 'Fron...","[{'mal_id': 102, 'type': 'anime', 'name': 'Fun...","[{'mal_id': 314, 'type': 'anime', 'name': 'Whi...","[{'mal_id': 8, 'type': 'anime', 'name': 'Drama...",[],"[{'mal_id': 40, 'type': 'anime', 'name': 'Psyc...",[]
3,38524,https://myanimelist.net/anime/38524/Shingeki_n...,{'jpg': {'image_url': 'https://cdn.myanimelist...,"{'youtube_id': 'hKHepjfj5Tw', 'url': 'https://...",True,"[{'type': 'Default', 'title': 'Shingeki no Kyo...",Shingeki no Kyojin Season 3 Part 2,Attack on Titan Season 3 Part 2,進撃の巨人 Season3 Part.2,[],TV,Manga,10.0,Finished Airing,False,"{'from': '2019-04-29T00:00:00+00:00', 'to': '2...",23 min per ep,R - 17+ (violence & profanity),9.05,1693878.0,4.0,21,2444918,61054,Seeking to restore humanity's diminishing hope...,Shingeki no Kyojin Season 3 Part 2 adapts cont...,spring,2019.0,"{'day': 'Mondays', 'time': '00:10', 'timezone'...","[{'mal_id': 10, 'type': 'anime', 'name': 'Prod...","[{'mal_id': 102, 'type': 'anime', 'name': 'Fun...","[{'mal_id': 858, 'type': 'anime', 'name': 'Wit...","[{'mal_id': 1, 'type': 'anime', 'name': 'Actio...",[],"[{'mal_id': 58, 'type': 'anime', 'name': 'Gore...","[{'mal_id': 27, 'type': 'anime', 'name': 'Shou..."
4,28977,https://myanimelist.net/anime/28977/Gintama°,{'jpg': {'image_url': 'https://cdn.myanimelist...,"{'youtube_id': None, 'url': None, 'embed_url':...",True,"[{'type': 'Default', 'title': 'Gintama°'}, {'t...",Gintama°,Gintama Season 4,銀魂°,"[""Gintama' (2015)""]",TV,Manga,51.0,Finished Airing,False,"{'from': '2015-04-08T00:00:00+00:00', 'to': '2...",24 min per ep,PG-13 - Teens 13 or older,9.05,263571.0,5.0,343,667177,17152,"Gintoki, Shinpachi, and Kagura return as the f...",,spring,2015.0,"{'day': 'Wednesdays', 'time': '18:00', 'timezo...","[{'mal_id': 16, 'type': 'anime', 'name': 'TV T...","[{'mal_id': 102, 'type': 'anime', 'name': 'Fun...","[{'mal_id': 1258, 'type': 'anime', 'name': 'Ba...","[{'mal_id': 1, 'type': 'anime', 'name': 'Actio...",[],"[{'mal_id': 57, 'type': 'anime', 'name': 'Gag ...","[{'mal_id': 27, 'type': 'anime', 'name': 'Shou..."


# **CLEANING**

In [7]:
df.images[0]

"{'jpg': {'image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006.jpg', 'small_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006t.jpg', 'large_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006l.jpg'}, 'webp': {'image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006.webp', 'small_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006t.webp', 'large_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006l.webp'}}"

In [8]:
x= '''{'youtube_id': 'ZEkwCGJ3o7M', 'url': 'https://www.youtube.com/watch?v=ZEkwCGJ3o7M', 'embed_url': 'https://www.youtube.com/embed/ZEkwCGJ3o7M?enablejsapi=1&wmode=opaque&autoplay=1', 'images': {'image_url': 'https://img.youtube.com/vi/ZEkwCGJ3o7M/default.jpg', 'small_image_url': 'https://img.youtube.com/vi/ZEkwCGJ3o7M/sddefault.jpg', 'medium_image_url': 'https://img.youtube.com/vi/ZEkwCGJ3o7M/mqdefault.jpg', 'large_image_url': 'https://img.youtube.com/vi/ZEkwCGJ3o7M/hqdefault.jpg', 'maximum_image_url': 'https://img.youtube.com/vi/ZEkwCGJ3o7M/maxresdefault.jpg'}}'''


**YOUTUBE LINKS**

ADD THIS COLUMN

In [9]:
def str_to_dict_and_direct_youtube_link(text):
  new_text = ast.literal_eval(text)
  return new_text.get('url')


str_to_dict_and_direct_youtube_link(x)

'https://www.youtube.com/watch?v=ZEkwCGJ3o7M'

In [10]:
txt = '''{'jpg': {'image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006.jpg', 'small_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006t.jpg', 'large_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006l.jpg'}, 'webp': {'image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006.webp', 'small_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006t.webp', 'large_image_url': 'https://cdn.myanimelist.net/images/anime/1015/138006l.webp'}}'''

**IMAGE LINK**

In [11]:
def image_link(text):
  try:
    new_text = ast.literal_eval(text)
    return new_text.get('jpg').get('image_url')
  except ValueError as e:
    print('check !, is it an expression?')

image_link(txt)

'https://cdn.myanimelist.net/images/anime/1015/138006.jpg'

In [12]:
df.columns

Index(['mal_id', 'url', 'images', 'trailer', 'approved', 'titles', 'title',
       'title_english', 'title_japanese', 'title_synonyms', 'type', 'source',
       'episodes', 'status', 'airing', 'aired', 'duration', 'rating', 'score',
       'scored_by', 'rank', 'popularity', 'members', 'favorites', 'synopsis',
       'background', 'season', 'year', 'broadcast', 'producers', 'licensors',
       'studios', 'genres', 'explicit_genres', 'themes', 'demographics'],
      dtype='object')

In [13]:
df['genres'][0]

"[{'mal_id': 2, 'type': 'anime', 'name': 'Adventure', 'url': 'https://myanimelist.net/anime/genre/2/Adventure'}, {'mal_id': 8, 'type': 'anime', 'name': 'Drama', 'url': 'https://myanimelist.net/anime/genre/8/Drama'}, {'mal_id': 10, 'type': 'anime', 'name': 'Fantasy', 'url': 'https://myanimelist.net/anime/genre/10/Fantasy'}]"

remove "season" , "part" from names

**FETCH GENRE**

In [14]:
def fetch_genre_and_themes(text):
  new = ast.literal_eval(text)
  return [i.get('name') for i in new]

In [15]:
fetch_genre_and_themes('''[{'mal_id': 2, 'type': 'anime', 'name': 'Adventure', 'url': 'https://myanimelist.net/anime/genre/2/Adventure'}, {'mal_id': 8, 'type': 'anime', 'name': 'Drama', 'url': 'https://myanimelist.net/anime/genre/8/Drama'}, {'mal_id': 10, 'type': 'anime', 'name': 'Fantasy', 'url': 'https://myanimelist.net/anime/genre/10/Fantasy'}]''')

['Adventure', 'Drama', 'Fantasy']

In [16]:
df.head(1)

Unnamed: 0,mal_id,url,images,trailer,approved,titles,title,title_english,title_japanese,title_synonyms,type,source,episodes,status,airing,aired,duration,rating,score,scored_by,rank,popularity,members,favorites,synopsis,background,season,year,broadcast,producers,licensors,studios,genres,explicit_genres,themes,demographics
0,52991,https://myanimelist.net/anime/52991/Sousou_no_...,{'jpg': {'image_url': 'https://cdn.myanimelist...,"{'youtube_id': 'ZEkwCGJ3o7M', 'url': 'https://...",True,"[{'type': 'Default', 'title': 'Sousou no Frier...",Sousou no Frieren,Frieren: Beyond Journey's End,葬送のフリーレン,"['Frieren at the Funeral', 'Frieren The Slayer']",TV,Manga,28.0,Finished Airing,False,"{'from': '2023-09-29T00:00:00+00:00', 'to': '2...",24 min per ep,PG-13 - Teens 13 or older,9.3,659282.0,1.0,143,1118759,68946,During their decade-long quest to defeat the D...,Sousou no Frieren was released on Blu-ray and ...,fall,2023.0,"{'day': 'Fridays', 'time': '23:00', 'timezone'...","[{'mal_id': 17, 'type': 'anime', 'name': 'Anip...","[{'mal_id': 1468, 'type': 'anime', 'name': 'Cr...","[{'mal_id': 11, 'type': 'anime', 'name': 'Madh...","[{'mal_id': 2, 'type': 'anime', 'name': 'Adven...",[],[],"[{'mal_id': 27, 'type': 'anime', 'name': 'Shou..."


**FETCH THEMES**

In [17]:
fetch_genre_and_themes('''[{'mal_id': 58, 'type': 'anime', 'name': 'Gore', 'url': 'https://myanimelist.net/anime/genre/58/Gore'}, {'mal_id': 38, 'type': 'anime', 'name': 'Military', 'url': 'https://myanimelist.net/anime/genre/38/Military'}, {'mal_id': 76, 'type': 'anime', 'name': 'Survival', 'url': 'https://myanimelist.net/anime/genre/76/Survival'}]''')

['Gore', 'Military', 'Survival']

In [18]:
df['Image_link'] = df['images'].apply(image_link)
df['Genre'] = df['genres'].apply(fetch_genre_and_themes)
df['Theme'] = df['themes'].apply(fetch_genre_and_themes)

In [19]:
df['YT_link'] = df['trailer'].apply(str_to_dict_and_direct_youtube_link)

title, titles synonyms, images, youtube@, rating, source,episode, duration, score, synopsis, genre.

In [20]:
final = df[['title', 'title_english','YT_link','Image_link', 'Genre', 'Theme','rating', 'score' , 'source', 'duration', 'synopsis']]

In [21]:
final.head(1)

Unnamed: 0,title,title_english,YT_link,Image_link,Genre,Theme,rating,score,source,duration,synopsis
0,Sousou no Frieren,Frieren: Beyond Journey's End,https://www.youtube.com/watch?v=ZEkwCGJ3o7M,https://cdn.myanimelist.net/images/anime/1015/...,"[Adventure, Drama, Fantasy]",[],PG-13 - Teens 13 or older,9.3,Manga,24 min per ep,During their decade-long quest to defeat the D...


* remove numbers and spc char in title and title english

* convert genre and theme in str and on place of [] put unknown also remove punc

* remove "per ep" from duration

In [22]:
df = final.copy()

**1.**

In [23]:
exclude = string.punctuation+'°'

In [24]:
def remove_nums_spc_char(text):
  text = str(text)
  try:
    pattern  = re.compile("\d+")
    text = pattern.sub(r'', text)
    return text.translate(str.maketrans("", "", exclude))
  except Exception as e:
    print(e)

**2.**

In [25]:
def genre_and_theme(text):
  text = str(text)
  text = text.replace('[]', 'unknown')
  return text.translate(str.maketrans('','', exclude))


In [26]:
def short(text):
  return text[:-7]

In [27]:
df['duration'] = final['duration'].apply(short)
df['Genre'] = final['Genre'].apply(genre_and_theme)
df['Theme'] = final['Theme'].apply(genre_and_theme)

In [28]:
df.isnull().sum()

Unnamed: 0,0
title,0
title_english,16354
YT_link,23178
Image_link,0
Genre,0
Theme,0
rating,676
score,10169
source,0
duration,0


* THIS SYNTAX IS DEPRICATED

In [29]:
# df['rating'].fillna('not available', inplace=True)
# df['score'].fillna(df['score'].median() , inplace=True)
# df['YT_link'].fillna('not available', inplace=True)
# df['synopsis'].fillna('not available', inplace=True)

* TRY USING THIS

In [30]:
df.fillna({'YT_link': 'not available'}, inplace=True)
df.fillna({'score': df['score'].median()}, inplace=True)
df.fillna({'rating': 'not available'}, inplace=True)
df.fillna({'synopsis': 'not available'}, inplace=True)
df.fillna({'title_english': 'not available'}, inplace=True)

# **PREPROCESSING**

In [31]:
df['title_english'] = df['title_english'].map(lambda x : 'not available' if x == 'nan' else x)

In [32]:
def rem(text):
  pattern = re.compile(r"\d+|/?(Source: \w+\S+)|\[Written by MAL Rewrite\]|\n")
  return pattern.sub(r'', text)

In [33]:
df['synopsis'][7465]

'One night during the 18th century, deep in the mountains, a man loses his way and comes across a small shrine. As he enters, the space transforms into a room of a different world.\n\n(Source: Annecy)'

In [34]:
rem('''One night during the 18th century, deep in the mountains, a man loses his way and comes across a small shrine. As he enters, the space transforms into a room of a different world.

(Source: Annecy)''')

'One night during the th century, deep in the mountains, a man loses his way and comes across a small shrine. As he enters, the space transforms into a room of a different world.('

(Source: ANN)  (Source: H-Moe)

In [35]:
rem('''One night during the 18th century, deep in the mountains, a man loses his way and comes across a small shrine. As he enters, the space transforms into a room of a different world.[Written by MAL Rewrite]''')

'One night during the th century, deep in the mountains, a man loses his way and comes across a small shrine. As he enters, the space transforms into a room of a different world.'

In [36]:
df['synopsis']=df['synopsis'].apply(rem)

In [37]:
new_df = df[['title']]

In [38]:
x = df['Theme'].apply(lambda x : x.split())
y = df['Genre'].apply(lambda x : x.split())
z = df['synopsis'].apply(lambda x : x.split())

In [39]:
new_df['tags'] = z+x+y

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = z+x+y


In [40]:
new_df['tags'] =new_df['tags'].apply(lambda x: " ".join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] =new_df['tags'].apply(lambda x: " ".join(x))


In [41]:
new_df['tags'][6559]

'In the world of Maburaho, everyone is born with the ability to use magic and are thus labeled magicians. However, the magical ability of each person is not equal. The number of times you can use magic determines the amount of respect you receive, and since one’s magical power is determined at birth by traits and genetics, those who have a bloodline stemming from famous magicians are highly sought after.Having the lowest magic count in Aoi Academy, Kazuki Shikimori is looked down upon by his classmates and seen as a nearly worthless magician. However, his bloodline consists of many great magicians throughout the ages, meaning that while he may not be a great magician, his offspring could be. This leads to him being sought after by three different young women: Yuna Miyama, a transfer student who declares herself his wife upon arrival, Rin Kamishiro, a prideful swordswoman of a traditional family who wants to kill him so she will be free to pursue her own desires, and Kuriko Kazetsubaki,

In [42]:
def rem_punc_and_stopwords(text):
  text = remove_nums_spc_char(text)
  new=[]
  for word in text.split():
    if word in stopwords.words('english'):
      new.append('')
    else:
      new.append(word)

  w = new[:]
  new.clear()
  return " ".join(w)


In [43]:
def for_name(text):
  text = text.lower()
  return text.translate(str.maketrans('','', exclude))

In [44]:
for_name('attack on titan season 3')

'attack on titan season 3'

In [45]:
new_df['tags'] = new_df['tags'].apply(rem_punc_and_stopwords)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(rem_punc_and_stopwords)


In [46]:
def stem(text):
  lst = [PorterStemmer().stem(txt) for txt in text.split()]
  return " ".join(lst)

In [47]:
new_df['tags'] = new_df['tags'].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(stem)


In [48]:
new_df['title'] = new_df['title'].apply(for_name)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['title'] = new_df['title'].apply(for_name)


# **RECOMMENDATIONS**

In [49]:
new_df

Unnamed: 0,title,tags
0,sousou no frieren,dure decadelong quest defeat demon king member...
1,fullmetal alchemist brotherhood,after horrif alchemi experi goe wrong elric ho...
2,steinsgate,eccentr scientist rintar okab neverend thirst ...
3,shingeki no kyojin season 3 part 2,seek restor human diminish hope survey corp em...
4,gintama,gintoki shinpachi kagura return funlov broke m...
...,...,...
28661,ruguo lishi shi yiqun miao miao miao mi ya,histori cat danc popular tune music comedi
28662,ruguo lishi shi yiqun miao 5th season,fifth season ruguo lishi shi yiqun miao histor...
28663,i love me,the anim relat beyondgend project aim foster s...
28664,chotto warareru gag gaiden,charact popular vote held commemor th annivers...


In [50]:
cv = TfidfVectorizer()
vectors = cv.fit_transform(new_df['tags']).toarray()

In [51]:
vectors

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [52]:
similarity = cosine_similarity(vectors)

In [53]:
similarity

array([[1.        , 0.0464102 , 0.03190302, ..., 0.00768951, 0.        ,
        0.        ],
       [0.0464102 , 1.        , 0.01517678, ..., 0.01307441, 0.        ,
        0.01351752],
       [0.03190302, 0.01517678, 1.        , ..., 0.00652497, 0.        ,
        0.0107039 ],
       ...,
       [0.00768951, 0.01307441, 0.00652497, ..., 1.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 1.        ,
        0.01668842],
       [0.        , 0.01351752, 0.0107039 , ..., 0.        , 0.01668842,
        1.        ]])

In [None]:
similarity[0]

array([1.        , 0.0464102 , 0.03190302, ..., 0.00768951, 0.        ,
       0.        ])

In [54]:
sorted(list(enumerate(np.round(similarity[371], 2))), reverse = True, key= lambda x : x[1])

[(371, np.float64(1.0)),
 (102, np.float64(0.27)),
 (258, np.float64(0.27)),
 (1175, np.float64(0.27)),
 (19451, np.float64(0.16)),
 (8299, np.float64(0.15)),
 (9786, np.float64(0.15)),
 (9158, np.float64(0.13)),
 (9876, np.float64(0.13)),
 (19676, np.float64(0.13)),
 (23057, np.float64(0.13)),
 (23485, np.float64(0.13)),
 (23486, np.float64(0.13)),
 (23695, np.float64(0.13)),
 (23713, np.float64(0.13)),
 (24317, np.float64(0.13)),
 (24514, np.float64(0.13)),
 (24515, np.float64(0.13)),
 (26018, np.float64(0.13)),
 (28173, np.float64(0.13)),
 (1723, np.float64(0.12)),
 (8881, np.float64(0.12)),
 (9201, np.float64(0.12)),
 (9421, np.float64(0.12)),
 (10857, np.float64(0.12)),
 (11994, np.float64(0.12)),
 (14336, np.float64(0.12)),
 (15404, np.float64(0.12)),
 (15837, np.float64(0.12)),
 (20211, np.float64(0.12)),
 (1615, np.float64(0.11)),
 (3045, np.float64(0.11)),
 (5047, np.float64(0.11)),
 (7542, np.float64(0.11)),
 (7573, np.float64(0.11)),
 (7651, np.float64(0.11)),
 (8012, np.flo

In [None]:
new_df['title'].iloc[23486]

'Jiayou Ba San Er Ban'

In [None]:
final['title_english'] = final.title.apply(lambda x : x.lower())

In [None]:
# final[final.title == '']

Unnamed: 0,title,title_english,YT_link,Image_link,Genre,Theme,rating,score,source,duration,synopsis
2908,high school dxd,high school dxd,https://www.youtube.com/watch?v=f4E8al_wo8w,https://cdn.myanimelist.net/images/anime/1331/...,"[Action, Comedy, Romance, Supernatural, Ecchi]","[Harem, Mythology, School]",R+ - Mild Nudity,7.32,Light novel,24 min per ep,High school student Issei Hyoudou is your run-...


In [55]:
def recommend(movie_name):
  movie_name = movie_name.lower()
  index = new_df[new_df['title'] == movie_name].index[0]
  recms = sorted(list(enumerate(np.round(similarity[index], 2))), reverse = True, key = lambda x: x[1])[1:11]
  for i in recms:
    print(new_df['title'].iloc[i[0]])

# **TESTING**

In [56]:
recommend('high school dxd')

high school dxd born
high school dxd new
high school dxd hero
high school dxd ova
high school dxd specials
high school dxd hero taiikukanura no holy
high school dxd born yomigaeranai fushichou
high school dxd new oppai tsutsumimasu
ai tenshi densetsu wedding peach
choujigen kakumei anime dimension high school


In [57]:
recommend('Seishun Buta Yarou wa Bunny Girl Senpai no Yume wo Minai')

seishun buta yarou wa yumemiru shoujo no yume wo minai
seishun buta yarou wa randoseru girl no yume wo minai
seishun buta yarou wa odekake sister no yume wo minai
yugsiggo pv
chäoschild silent sky
kimi no na wo yobeba
codee
guomin laogong dai huijia 2nd season
carrie haggyogaja
naeileun eonjena puleum


In [58]:
recommend('violet evergarden')

violet evergarden recollections
violet evergarden movie
violet evergarden kitto ai wo shiru hi ga kuru no darou
violet evergarden gaiden eien to jidou shuki ningyou
violet evergarden cms
yagami yuu
violet
shakuen no eris
aoi meno ningyou monogatari
hotori tada saiwai wo koinegau
