# Recommendation System using Content-based Filtering

by Harman Muhammad Satrio Reinaldy

### Import Library

1. NumPy: A library that provides support for numerical operations and multidimensional arrays. Used for numerical data manipulation and processing.
2. Pandas: A library used for data manipulation and analysis. Pandas provides powerful data structures such as DataFrame, which makes it easy to process and transform data.
3. sklearn: scikit-learn, commonly known as sklearn, is a popular machine learning library in Python. Sklearn provides a wide variety of tools and algorithms for tasks such as classification, regression, clustering, and dimensionality reduction.
4. rake_nltk: rake_nltk is a library that implements the Rapid Automatic Keyword Extraction (RAKE) algorithm. RAKE is a technique for extracting keywords from text by analyzing word frequency and co-occurrence.
5. TfidfVectorizer: TfidfVectorizer is a class in scikit-learn that converts a collection of raw text documents into a numerical feature matrix using TF-IDF (Term Frequency-Inverse Document Frequency) representation. This class is commonly used for text mining and information retrieval tasks.
6. cosine_similarity: cosine_similarity is a function in scikit-learn that calculates the cosine similarity between pairs of samples. It measures the similarity between two vectors by taking the cosine of the angle between them.

In [108]:
import pandas as pd
import numpy as np
import sklearn

from rake_nltk import Rake
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics import average_precision_score

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline

### Data Loading and Preprocessing

In this section, import data and preprocessing such as:
- taking a dataset of 15000 rows for processing
- checking for missing values
- title change

In [76]:
df = pd.read_csv("animes.csv")
df = df[0:15000]

In [77]:
df

Unnamed: 0,uid,title,synopsis,genre,aired,episodes,members,popularity,ranked,score,img_url,link
0,28891,Haikyuu!! Second Season,Following their participation at the Inter-Hig...,"['Comedy', 'Sports', 'Drama', 'School', 'Shoun...","Oct 4, 2015 to Mar 27, 2016",25.0,489888,141,25.0,8.82,https://cdn.myanimelist.net/images/anime/9/766...,https://myanimelist.net/anime/28891/Haikyuu_Se...
1,23273,Shigatsu wa Kimi no Uso,Music accompanies the path of the human metron...,"['Drama', 'Music', 'Romance', 'School', 'Shoun...","Oct 10, 2014 to Mar 20, 2015",22.0,995473,28,24.0,8.83,https://cdn.myanimelist.net/images/anime/3/671...,https://myanimelist.net/anime/23273/Shigatsu_w...
2,34599,Made in Abyss,The Abyss—a gaping chasm stretching down into ...,"['Sci-Fi', 'Adventure', 'Mystery', 'Drama', 'F...","Jul 7, 2017 to Sep 29, 2017",13.0,581663,98,23.0,8.83,https://cdn.myanimelist.net/images/anime/6/867...,https://myanimelist.net/anime/34599/Made_in_Abyss
3,5114,Fullmetal Alchemist: Brotherhood,"""In order for something to be obtained, someth...","['Action', 'Military', 'Adventure', 'Comedy', ...","Apr 5, 2009 to Jul 4, 2010",64.0,1615084,4,1.0,9.23,https://cdn.myanimelist.net/images/anime/1223/...,https://myanimelist.net/anime/5114/Fullmetal_A...
4,31758,Kizumonogatari III: Reiketsu-hen,After helping revive the legendary vampire Kis...,"['Action', 'Mystery', 'Supernatural', 'Vampire']","Jan 6, 2017",1.0,214621,502,22.0,8.83,https://cdn.myanimelist.net/images/anime/3/815...,https://myanimelist.net/anime/31758/Kizumonoga...
...,...,...,...,...,...,...,...,...,...,...,...,...
14995,2820,Time Travel Tondekeman!,"The series starts when Hayato, a soccer enthus...","['Action', 'Kids', 'Adventure', 'Comedy']","Oct 19, 1989 to Aug 26, 1990",39.0,1469,8684,2084.0,7.41,https://cdn.myanimelist.net/images/anime/6/456...,https://myanimelist.net/anime/2820/Time_Travel...
14996,38850,Star☆Twinkle Precure: Hoshi no Uta ni Omoi wo ...,,"['Action', 'Magic', 'Fantasy', 'Shoujo']","Oct 19, 2019",1.0,1187,9176,2081.0,7.41,https://cdn.myanimelist.net/images/anime/1324/...,https://myanimelist.net/anime/38850/Star%E2%98...
14997,40815,Honzuki no Gekokujou: Shisho ni Naru Tame ni w...,Second half of Honzuki no Gekokujou: Shisho n...,"['Slice of Life', 'Fantasy']","Apr, 2020 to ?",,11881,4612,,,https://cdn.myanimelist.net/images/anime/1639/...,https://myanimelist.net/anime/40815/Honzuki_no...
14998,40046,ID:Invaded,Sakaido is a genius detective who can track do...,"['Sci-Fi', 'Mystery']","Jan 6, 2020 to ?",,17205,3494,,,https://cdn.myanimelist.net/images/anime/1954/...,https://myanimelist.net/anime/40046/ID_Invaded


In [78]:
print(df.columns)

Index(['uid', 'title', 'synopsis', 'genre', 'aired', 'episodes', 'members',
       'popularity', 'ranked', 'score', 'img_url', 'link'],
      dtype='object')


- **```isnull().sum()```** This method checks whether the Dataframe contains missing values in each row and column.

In [79]:
df.isnull().sum()

uid              0
title            0
synopsis       812
genre            0
aired            0
episodes       616
members          0
popularity       0
ranked        1927
score          536
img_url        165
link             0
dtype: int64

creates variables taken from the "sinopsi", "genre", and "judul" columns of the dataframe.

In [80]:
sinopsi = df['synopsis']
genre = df['genre']
judul = df['title']
for i, t in enumerate(judul):
    judul[i] = t.lower()
    
judul.head()

0             haikyuu!! second season
1             shigatsu wa kimi no uso
2                       made in abyss
3    fullmetal alchemist: brotherhood
4    kizumonogatari iii: reiketsu-hen
Name: title, dtype: object

- The ```fillna()``` function is used to replace the missing values in the "synopsy" column with spaces (' '). This can be useful for further text processing.
- ```each_anime_genre.replace('[', '')```: Removes the opening square brackets '\[' from the genre string.
- ```each_anime_genre.replace(']', '')```: Removes the closing square brackets ']' from the genre string.
- ```each_anime_genre.replace("'", "")```: Removes single quotes (') from the genre string.
- ```each_anime_genre.replace(",", "")```: Removes the comma "," from the genre string.

In [81]:
sinopsi = sinopsi.fillna(' ')
for i,each_anime_genre in enumerate(genre):
    each_anime_genre = each_anime_genre.replace('[', '')
    each_anime_genre = each_anime_genre.replace(']', '')
    each_anime_genre = each_anime_genre.replace("'", "")
    each_anime_genre = each_anime_genre.replace(",", "")
    genre.iloc[i] = each_anime_genre
genre

0                       Comedy Sports Drama School Shounen
1                       Drama Music Romance School Shounen
2                   Sci-Fi Adventure Mystery Drama Fantasy
3        Action Military Adventure Comedy Drama Magic F...
4                      Action Mystery Supernatural Vampire
                               ...                        
14995                         Action Kids Adventure Comedy
14996                          Action Magic Fantasy Shoujo
14997                                Slice of Life Fantasy
14998                                       Sci-Fi Mystery
14999                        Action Adventure Mecha Sci-Fi
Name: genre, Length: 15000, dtype: object

In [82]:
rake = Rake()
keywords = []

for plot in sinopsi:
    rake.extract_keywords_from_text(plot)
    keywords_i = rake.get_ranked_phrases()
    keywords_i_string = ""
    for keyword in keywords_i:
        keywords_i_string = keywords_i_string + " " + keyword
    keywords.append(keywords_i_string)
    
df['keywords'] = keywords
df['keywords'][0]    

' large training camp alongside many notable volleyball teams karasuno high school volleyball team attempts volleyball team must learn overcome formidable opponents old class setter tooru oikawa standing rival nekoma high spring tournament instead senior players graduate national level players could possibly break archrival aoba jousai new — including toughest teams karasuno agrees new attacks would strengthen train harder take part powerful weapon mal rewrite last chance kageyama attempt also come high karasuno written world victory tokyo sturdiest skills sharpen settle refocus receive playing participation one moreover members may long japan invitation inter hope hope hinata following facing ever even efforts differences devise conquer blocks aiming'

Create a new column by combining "genre" and "keywords" from the dataframe. Then the column is used to convert the text data into numeric feature vectors.

In [83]:
df['kata-kata'] = df['genre'] + df['keywords']

In [84]:
vectorizer = CountVectorizer()
vectorized_kataKata = vectorizer.fit_transform(df['kata-kata'])
vectorized_kataKata = vectorized_kataKata.toarray()

### Pembuatan Fungsi Input Pencarian

In [144]:
def evaluate_precision(relevant_anime, recommended_anime):
    relevant_set = set(relevant_anime)
    recommended_set = set(recommended_anime)
    
    intersection = relevant_set.intersection(recommended_set)
    
    precision = len(intersection) / len(recommended_set) if len(recommended_set) > 0 else 0
    
    return precision

In [145]:
# Function to Find Recommendations
def cari_anime():
    search_type = input("Enter search type (1: Title, 2: Keywords, 3: Genre): ")
    search_query = input("Enter your search query: ")
    n_recommendations = int(input("Enter the number of recommendations you want: "))

    if search_type == "1": 
        recommendations = df[df['title'].str.contains(search_query, case=False)]['title'].head(n_recommendations).tolist()
    elif search_type == "2":  
        recommendations = df[df['keywords'].str.contains(search_query, case=False)]['title'].head(n_recommendations).tolist()
    elif search_type == "3":  
        recommendations = df[df['genre'].str.contains(search_query, case=False)]['title'].head(n_recommendations).tolist()
    else:
        print("Invalid search type. Please try again.")
        return

    print("Recommendations:")
    for idx, recommendation in enumerate(recommendations):
        print(f"{idx+1}. {recommendation}")

In [146]:
# Example usage
relevant_anime = ['Attack on Titan', 'One Punch Man', 'Naruto']
recommended_anime = ['Attack on Titan', 'One Punch Man', 'Sword Art Online']

precision = evaluate_precision(relevant_anime, recommended_anime)
print("Precision:", precision)

Precision: 0.6666666666666666


#### Title Search

In [147]:
cari_anime() #1

Enter search type (1: Title, 2: Keywords, 3: Genre): 1
Enter your search query: sword art online
Enter the number of recommendations you want: 5
Recommendations:
1. sword art online: alicization - war of underworld
2. sword art online movie: ordinal scale
3. sword art online fatal bullet: the third episode
4. sword art online fatal bullet: the third episode - pilot-ban
5. sword art online: alicization - war of underworld


#### Keyword Search

In [148]:
cari_anime() #2

Enter search type (1: Title, 2: Keywords, 3: Genre): 2
Enter your search query: kawaii
Enter the number of recommendations you want: 5
Recommendations:
1. shownoid mako-chan
2. kakko kawaii sengen! 2
3. nexus
4. reunion (music)
5. ao no exorcist movie special


#### Genre/Category Search

In [150]:
cari_anime() #3

Enter search type (1: Title, 2: Keywords, 3: Genre): 3
Enter your search query: Romance
Enter the number of recommendations you want: 5
Recommendations:
1. shigatsu wa kimi no uso
2. clannad: after story
3. nodame cantabile: finale - mine to kiyora no saikai
4. inuyasha movie 3: tenka hadou no ken
5. gekkan shoujo nozaki-kun specials


In [151]:
cari_anime() #wrong answer only

Enter search type (1: Title, 2: Keywords, 3: Genre): 5
Enter your search query: demon
Enter the number of recommendations you want: 5
Invalid search type. Please try again.
