# **Content Based Filtering**

Recommender System is a system that seeks to predict or filter preferences according to the user’s choices. Recommender systems are utilized in a variety of areas including movies, music, news, books, research articles, search queries, social tags, and products in general.

Like many machine learning techniques, a recommender system makes prediction based on users’ historical behaviors. Specifically, it’s to predict user preference for a set of items based on past experience. To build a recommender system, the most two popular approaches are Content-based and Collaborative Filtering.

Recommender systems produce a list of recommendations in any of the two ways –

1.    **Collaborative filtering**

Collaborative filtering approaches build a model from user’s past behavior (i.e. items purchased or searched by the user) as well as similar decisions made by other users. This model is then used to predict items (or ratings for items) that user may have an interest in.

Collaborative Filtering, on the other hand, doesn’t need anything else except users’ historical preference on a set of items. Because it’s based on historical data, the core assumption here is that the users who have agreed in the past tend to also agree in the future. 

2.    **Content-based filtering**

Content-based filtering approaches uses a series of discrete characteristics of an item in order to recommend additional items with similar properties. Content-based filtering methods are totally based on a description of the item and a profile of the user’s preferences. It recommends items based on user’s past preferences.

Content-based approach requires a good amount of information of items’ own features, rather than using users’ interactions and feedbacks. For example, it can be movie attributes such as genre, year, director, actor etc., or textual content of articles that can extracted by applying Natural Language Processing. 

<hr>

Back to Stanley. Instead of focusing on his friends, we could focus on what items from all the options are more similar to what we know he enjoys. This new focus is known as Item-Based Collaborative Filtering (IB-CF).

We could divide IB-CF in two sub tasks:

### 1. Calculate similarity among the items:

-    Cosine-Based Similarity
-    Correlation-Based Similarity
-    Adjusted Cosine Similarity
-    1-Jaccard distance

### 2. Calculation of Prediction:

-    Weighted Sum
-    Regression

<img src='a_img.png'>

# __Simple Case__

Rekomendasi diberikan berdasarkan features dari item yang disukai user

In [7]:
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [8]:
df = pd.DataFrame([
    {'title': 'A', 'genre': 'Pop', 'label': 'PT. A'},
    {'title': 'B', 'genre': 'Keroncong', 'label': 'PT. A'},
    {'title': 'C', 'genre': 'Dangdut', 'label': 'PT. A'},
    {'title': 'D', 'genre': 'Pop', 'label': 'PT. B'},
    {'title': 'E', 'genre': 'Keroncong', 'label': 'PT. B'},
    {'title': 'F', 'genre': 'Dangdut', 'label': 'PT. B'},
    {'title': 'G', 'genre': 'Pop', 'label': 'PT. C'},
    {'title': 'H', 'genre': 'Keroncong', 'label': 'PT. C'},
    {'title': 'I', 'genre': 'Dangdut', 'label': 'PT. C'},
    {'title': 'J', 'genre': 'Pop', 'label': 'PT. C'}
])

df

Unnamed: 0,title,genre,label
0,A,Pop,PT. A
1,B,Keroncong,PT. A
2,C,Dangdut,PT. A
3,D,Pop,PT. B
4,E,Keroncong,PT. B
5,F,Dangdut,PT. B
6,G,Pop,PT. C
7,H,Keroncong,PT. C
8,I,Dangdut,PT. C
9,J,Pop,PT. C


In [9]:
ecv = CountVectorizer()
mgenre = ecv.fit_transform(df['genre'])
ecv.get_feature_names()

['dangdut', 'keroncong', 'pop']

In [31]:
mgenre

<850x40 sparse matrix of type '<class 'numpy.int64'>'
	with 3696 stored elements in Compressed Sparse Row format>

In [10]:
mgenre.toarray()

array([[0, 0, 1],
       [0, 1, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [1, 0, 0],
       [0, 0, 1]], dtype=int64)

<img src = 'b_img.png'>

In [11]:
cosScore = cosine_similarity(mgenre)
cosScore

array([[1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
       [0., 1., 0., 0., 1., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 1., 0., 0., 1., 0.],
       [1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
       [0., 1., 0., 0., 1., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 1., 0., 0., 1., 0.],
       [1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
       [0., 1., 0., 0., 1., 0., 0., 1., 0., 0.],
       [0., 0., 1., 0., 0., 1., 0., 0., 1., 0.],
       [1., 0., 0., 1., 0., 0., 1., 0., 0., 1.]])

In [12]:
cosScore.shape

(10, 10)

In [37]:
# ada user suka produk index ke 7
produkMirip = list(enumerate(cosScore[7]))
produkMirip = sorted(produkMirip, key = lambda x: x[1], reverse=True)
produkMirip[:5]

[(7, 1.0),
 (413, 1.0),
 (92, 0.8944271909999159),
 (296, 0.8944271909999159),
 (448, 0.8944271909999159)]

In [14]:
for i in produkMirip[:5]:
    print(df.iloc[i[0]])

title            B
genre    Keroncong
label        PT. A
Name: 1, dtype: object
title            E
genre    Keroncong
label        PT. B
Name: 4, dtype: object
title            H
genre    Keroncong
label        PT. C
Name: 7, dtype: object
title        A
genre      Pop
label    PT. A
Name: 0, dtype: object
title          C
genre    Dangdut
label      PT. A
Name: 2, dtype: object


<hr>

# Anime Recomendation
### **Content-based filtering**

In [101]:
df = pd.read_csv('anime.csv')
df = df.iloc[0:850] #too large dataset
df.head()
# df['type'].unique()

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
0,32281,Kimi no Na wa.,"Drama, Romance, School, Supernatural",Movie,1,9.37,200630
1,5114,Fullmetal Alchemist: Brotherhood,"Action, Adventure, Drama, Fantasy, Magic, Mili...",TV,64,9.26,793665
2,28977,Gintama°,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.25,114262
3,9253,Steins;Gate,"Sci-Fi, Thriller",TV,24,9.17,673572
4,9969,Gintama&#039;,"Action, Comedy, Historical, Parody, Samurai, S...",TV,51,9.16,151266


In [97]:
# anime yang memiliki genre paling mirip dengan 'Kimi no Na wa.'
df[df['name']=='Kokoro ga Sakebitagatterunda.']

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
208,28725,Kokoro ga Sakebitagatterunda.,"Drama, Romance, School",Movie,1,8.32,59652


In [98]:
df[df['name']=='Clannad: After Story - Mou Hitotsu no Sekai, Kyou-hen']

Unnamed: 0,anime_id,name,genre,type,episodes,rating,members
504,6351,"Clannad: After Story - Mou Hitotsu no Sekai, K...","Drama, Romance, School",Special,1,8.02,138364


In [63]:
# df['genre'].unique()

In [34]:
len(df['name'])

850

In [35]:
df.isnull().sum()

anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64

In [19]:
df = df.dropna()

In [20]:
df.isnull().sum()

anime_id    0
name        0
genre       0
type        0
episodes    0
rating      0
members     0
dtype: int64

In [21]:
len(df['name'])

850

**Mendapatkan genre**

In [65]:
ext = CountVectorizer(
    tokenizer = lambda x: x.split(', ') #hanya koma dan spasi untuk memecah genre 
)

mgenre = ext.fit_transform(df['genre'])

print(len(ext.get_feature_names()))
print(ext.get_feature_names())

40
['action', 'adventure', 'cars', 'comedy', 'dementia', 'demons', 'drama', 'ecchi', 'fantasy', 'game', 'harem', 'historical', 'horror', 'josei', 'kids', 'magic', 'martial arts', 'mecha', 'military', 'music', 'mystery', 'parody', 'police', 'psychological', 'romance', 'samurai', 'school', 'sci-fi', 'seinen', 'shoujo', 'shoujo ai', 'shounen', 'shounen ai', 'slice of life', 'space', 'sports', 'super power', 'supernatural', 'thriller', 'vampire']


In [23]:
mgenre.toarray()

array([[0, 0, 0, ..., 1, 0, 0],
       [1, 1, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       ...,
       [1, 0, 0, ..., 1, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [24]:
cosScore = cosine_similarity(mgenre)
cosScore # semua data
# cosScore[0] # cos score semua data terhadap anime ke-0

## **Trying Recommender**

In [89]:
animeSuka = "Kimi no Na wa."
indexSuka = df[df['name'] == animeSuka].index[0]
indexSuka

0

In [90]:
animeSama = list(enumerate(cosScore[indexSuka]))
# animeSama

In [91]:
# 1. ranking manual
animeSamaSortir = sorted(animeSama, key=lambda x:x[1], reverse=True)
# animeSamaSortir

In [92]:
# 2. filter yang cosine similarity score-nya lebih dari 70%
animeSama = list(filter(lambda x: x[1] > 0.7, animeSama))
# list(animeSama)

In [93]:
# 2. ranking manual
animeSama = sorted(animeSama, key=lambda x:x[1], reverse=True)
# animeSama

In [95]:
# Rekomendasi untuk kamu yang suka "Kimi no Na wa."
for i in animeSama[1:10]:
    print(df.iloc[i[0]]['name'])

Kokoro ga Sakebitagatterunda.
Clannad: After Story - Mou Hitotsu no Sekai, Kyou-hen
Little Busters!: Refrain
Clannad
Kokoro Connect: Michi Random
Kokoro Connect
Little Busters!: EX
Hotarubi no Mori e
Yahari Ore no Seishun Love Comedy wa Machigatteiru. Zoku


<hr>

## **Take Class Exercise**
#### 1. Create Recommender System based on 'type' feature
#### 2. Create Recommender System based on 'genre' & 'type' feature

## **Take Home Exercise**
#### 3. Create Recommender System based on 'rating' & 'type' feature

# **Reference**:
- Carlos Pinela, "Recommender Systems — User-Based and Item-Based Collaborative Filtering", https://medium.com/@cfpinela/recommender-systems-user-based-and-item-based-collaborative-filtering-5d5f375a127f
- Rakesh4real, "User-Based and Item-Based Collaborative Filtering — Part 5", https://medium.com/fnplus/user-based-and-item-based-collaborative-filtering-b73d9b2badba
- Muffaddal Qutbuddin, "Comprehensive Guide on Item Based Collaborative Filtering", https://towardsdatascience.com/comprehensive-guide-on-item-based-recommendation-systems-d67e40e2b75d
- Aishwarya.27, "Python | Implementation of Movie Recommender System", https://www.geeksforgeeks.org/python-implementation-of-movie-recommender-system/
- Shuyu Luo, "Introduction to Recommender System", https://towardsdatascience.com/intro-to-recommender-system-collaborative-filtering-64a238194a26
- Kevin Liao, "Prototyping a Recommender System Step by Step Part 1: KNN Item-Based Collaborative Filtering", https://towardsdatascience.com/prototyping-a-recommender-system-step-by-step-part-1-knn-item-based-collaborative-filtering-637969614ea- Dataset Source, https://www.kaggle.com/CooperUnion/anime-recommendations-database