<a href="https://colab.research.google.com/github/kristianbagus/project/blob/main/Content_Based_Recommendation_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Content Based Recommendation System

Sistem rekomendasi merupakan hal yang cukup penting digunakan untuk memudahkan pencarian pada suatu konten. 
Kali ini, kita akan membuat recommender system yang menggunakan content/feature untuk melakukan pencarian sebuah film. Sistem rekomendasi ini dibuat dengan melakukan perhitungan terhadap kesamaannya satu film dengan yang lain (similarity index), sehingga ketika kita menunjuk ke satu film kita akan mendapat beberapa film lain yang memiliki kesamaan dengan film tersebut.

## Import Dataset

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

#Import dataset
movie_rating_df = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/movie_rating_df.csv')
name_df = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/actor_name.csv')
director_writers = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/directors_writers.csv')

### Cek Tiap Dataset

In [2]:
movie_rating_df.head()

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,averageRating,numVotes
0,tt0000001,short,Carmencita,Carmencita,0,1894.0,,1.0,"Documentary,Short",5.6,1608
1,tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892.0,,5.0,"Animation,Short",6.0,197
2,tt0000003,short,Pauvre Pierrot,Pauvre Pierrot,0,1892.0,,4.0,"Animation,Comedy,Romance",6.5,1285
3,tt0000004,short,Un bon bock,Un bon bock,0,1892.0,,12.0,"Animation,Short",6.1,121
4,tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893.0,,1.0,"Comedy,Short",6.1,2050


In [3]:
movie_rating_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 751614 entries, 0 to 751613
Data columns (total 11 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   tconst          751614 non-null  object 
 1   titleType       751614 non-null  object 
 2   primaryTitle    751614 non-null  object 
 3   originalTitle   751614 non-null  object 
 4   isAdult         751614 non-null  int64  
 5   startYear       751614 non-null  float64
 6   endYear         16072 non-null   float64
 7   runtimeMinutes  751614 non-null  float64
 8   genres          486766 non-null  object 
 9   averageRating   751614 non-null  float64
 10  numVotes        751614 non-null  int64  
dtypes: float64(4), int64(2), object(5)
memory usage: 63.1+ MB


In [4]:
name_df.head()

Unnamed: 0,nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
0,nm1774132,Nathan McLaughlin,1973,\N,"special_effects,make_up_department","tt0417686,tt1713976,tt1891860,tt0454839"
1,nm10683464,Bridge Andrew,\N,\N,actor,tt7718088
2,nm1021485,Brandon Fransvaag,\N,\N,miscellaneous,tt0168790
3,nm6940929,Erwin van der Lely,\N,\N,miscellaneous,tt4232168
4,nm5764974,Svetlana Shypitsyna,\N,\N,actress,tt3014168


In [5]:
name_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   nconst             1000 non-null   object
 1   primaryName        1000 non-null   object
 2   birthYear          1000 non-null   object
 3   deathYear          1000 non-null   object
 4   primaryProfession  891 non-null    object
 5   knownForTitles     1000 non-null   object
dtypes: object(6)
memory usage: 47.0+ KB


In [6]:
director_writers.head()

Unnamed: 0,tconst,director_name,writer_name
0,tt0011414,David Kirkland,"John Emerson,Anita Loos"
1,tt0011890,Roy William Neill,"Arthur F. Goodrich,Burns Mantle,Mary Murillo"
2,tt0014341,"Buster Keaton,John G. Blystone","Jean C. Havez,Clyde Bruckman,Joseph A. Mitchell"
3,tt0018054,Cecil B. DeMille,Jeanie Macpherson
4,tt0024151,James Cruze,"Max Miller,Wells Root,Jack Jevne"


In [7]:
director_writers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 986 entries, 0 to 985
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   tconst         986 non-null    object
 1   director_name  986 non-null    object
 2   writer_name    986 non-null    object
dtypes: object(3)
memory usage: 23.2+ KB


## Data Preprocessing

### Column Selection

Karena kita hanya akan membutuhkan kolom nconst, primaryName, dan knownForTitles pada tabel name_df, maka kita bisa menghapus kolom lain yang tidak diperlukan.

In [8]:
name_df = name_df[['nconst','primaryName','knownForTitles']]
name_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   nconst          1000 non-null   object
 1   primaryName     1000 non-null   object
 2   knownForTitles  1000 non-null   object
dtypes: object(3)
memory usage: 23.6+ KB


### Convert into List

Untuk membuat sistem rekomendasi, kita perlu mengubah kumpulan nama menjadi sebuah list. Karena tadi saat melihat informasi di dataframe directors_writer tidak terdapat nilai NULL, maka selanjutnya kita bisa langsung mengubah director_name dan writer_name dari string menjadi list.

In [9]:
#Mengubah director_name menjadi list
director_writers['director_name'] = director_writers['director_name'].apply(lambda row: row.split(','))
director_writers['writer_name'] = director_writers['writer_name'].apply(lambda row: row.split(','))

#Tampilkan 5 data teratas
director_writers.head()

Unnamed: 0,tconst,director_name,writer_name
0,tt0011414,[David Kirkland],"[John Emerson, Anita Loos]"
1,tt0011890,[Roy William Neill],"[Arthur F. Goodrich, Burns Mantle, Mary Murillo]"
2,tt0014341,"[Buster Keaton, John G. Blystone]","[Jean C. Havez, Clyde Bruckman, Joseph A. Mitc..."
3,tt0018054,[Cecil B. DeMille],[Jeanie Macpherson]
4,tt0024151,[James Cruze],"[Max Miller, Wells Root, Jack Jevne]"



### Convert Title into List

Seperti pada dataframe director_writers, kita perlu mengubah kolom knownForTitles pada dataframe name_df menjadi sebuah list karena seorang aktor dapat membintangi lebih dari 1 film. Sebelumnya kita juga bisa melakukan pengecekan berapa variasi jumlah film yang ada pada database ini. 



In [10]:
#Melakukan pengecekan variasi
print(name_df['knownForTitles'].apply(lambda x: len(x.split(','))).unique())

[4 1 2 3]


In [11]:
#Mengubah knownForTitles menjadi list of list
name_df['knownForTitles'] = name_df['knownForTitles'].apply(lambda x: x.split(','))

#Mencetak 5 baris teratas
name_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,nconst,primaryName,knownForTitles
0,nm1774132,Nathan McLaughlin,"[tt0417686, tt1713976, tt1891860, tt0454839]"
1,nm10683464,Bridge Andrew,[tt7718088]
2,nm1021485,Brandon Fransvaag,[tt0168790]
3,nm6940929,Erwin van der Lely,[tt4232168]
4,nm5764974,Svetlana Shypitsyna,[tt3014168]


### Korespondensi 1-1

Setelah selesai mengubahnya menjadi list, maka selanjutnya akan dibuat table yang mempunyai relasi 1-1 ke masing-masing title movie tersebut.

In [12]:
#menyiapkan bucket untuk dataframe
df_uni = []

for x in ['knownForTitles']:
    #mengulang index dari tiap baris sampai tiap elemen dari knownForTitles
    idx = name_df.index.repeat(name_df['knownForTitles'].str.len())
   
   #memecah values dari list di setiap baris dan menggabungkan nya dengan rows lain menjadi dataframe
    df1 = pd.DataFrame({
        x: np.concatenate(name_df[x].values)
    })
    
    #mengganti index dataframe tersebut dengan idx yang sudah kita define di awal
    df1.index = idx
    #untuk setiap dataframe yang terbentuk, kita append ke dataframe bucket
    df_uni.append(df1)
    
#menggabungkan semua dataframe menjadi satu
df_concat = pd.concat(df_uni, axis=1)

#left join dengan value dari dataframe yang awal  
unnested_df = df_concat.join(name_df.drop(['knownForTitles'], 1), how='left')

#select kolom sesuai dengan dataframe awal
unnested_df = unnested_df[name_df.columns.tolist()]

unnested_df



Unnamed: 0,nconst,primaryName,knownForTitles
0,nm1774132,Nathan McLaughlin,tt0417686
0,nm1774132,Nathan McLaughlin,tt1713976
0,nm1774132,Nathan McLaughlin,tt1891860
0,nm1774132,Nathan McLaughlin,tt0454839
1,nm10683464,Bridge Andrew,tt7718088
...,...,...,...
998,nm5245804,Eliza Jenkins,tt1464058
999,nm0948460,Greg Yolen,tt0436869
999,nm0948460,Greg Yolen,tt0476663
999,nm0948460,Greg Yolen,tt0109723



### Mengelompokkan Nama Pemain ke dalam Judul

Setelah melakukan korspondensi 1-1, kita bisa memasukkan kumpulan nama-nama pemain dalam 1 film.

In [13]:
unnested_drop = unnested_df.drop(['nconst'], axis=1)

#menyiapkan bucket untuk dataframe
df_uni = []

for col in ['primaryName']:
    #agregasi kolom PrimaryName sesuai group_col yang sudah di define di atas
    dfi = unnested_drop.groupby(['knownForTitles'])[col].apply(list)
    #Lakukan append
    df_uni.append(dfi)

df_grouped = pd.concat(df_uni, axis=1).reset_index()
df_grouped.columns = ['knownForTitles','cast_name']

df_grouped

Unnamed: 0,knownForTitles,cast_name
0,tt0008125,[Charles Harley]
1,tt0009706,[Charles Harley]
2,tt0010304,[Natalie Talmadge]
3,tt0011414,[Natalie Talmadge]
4,tt0011890,[Natalie Talmadge]
...,...,...
1893,tt9610496,[Stefano Baffetti]
1894,tt9714030,[Kevin Kain]
1895,tt9741820,[Caroline Plyler]
1896,tt9759814,[Ethan Francis]


### Join Table

Setelah selesai memproses masing-masing tabel, maka langkah selanjutnya adalah menggabungkan tiap tabel.


In [14]:
#join antara movie table dan cast table 
base_df = pd.merge(df_grouped, movie_rating_df, left_on='knownForTitles', right_on='tconst', how='inner')

#join antara base_df dengan director_writer table
base_df = pd.merge(base_df, director_writers, left_on='tconst', right_on='tconst', how='left')

# Cek dataframe
base_df.head()

Unnamed: 0,knownForTitles,cast_name,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,averageRating,numVotes,director_name,writer_name
0,tt0011414,[Natalie Talmadge],tt0011414,movie,The Love Expert,The Love Expert,0,1920.0,,60.0,"Comedy,Romance",4.9,136,[David Kirkland],"[John Emerson, Anita Loos]"
1,tt0011890,[Natalie Talmadge],tt0011890,movie,Yes or No,Yes or No,0,1920.0,,72.0,,6.3,7,[Roy William Neill],"[Arthur F. Goodrich, Burns Mantle, Mary Murillo]"
2,tt0014341,[Natalie Talmadge],tt0014341,movie,Our Hospitality,Our Hospitality,0,1923.0,,65.0,"Comedy,Romance,Thriller",7.8,9621,"[Buster Keaton, John G. Blystone]","[Jean C. Havez, Clyde Bruckman, Joseph A. Mitc..."
3,tt0018054,[Reeka Roberts],tt0018054,movie,The King of Kings,The King of Kings,0,1927.0,,155.0,"Biography,Drama,History",7.3,1826,[Cecil B. DeMille],[Jeanie Macpherson]
4,tt0024151,[James Hackett],tt0024151,movie,I Cover the Waterfront,I Cover the Waterfront,0,1933.0,,80.0,"Drama,Romance",6.3,455,[James Cruze],"[Max Miller, Wells Root, Jack Jevne]"


### Clean New Table

Setelah melakukan join table sebelumnya, sekarang hal yang akan kembali kita lakukan adalah melakukan cleaning pada data yang sudah dihasilkan. 


In [15]:
base_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1060 entries, 0 to 1059
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   knownForTitles  1060 non-null   object 
 1   cast_name       1060 non-null   object 
 2   tconst          1060 non-null   object 
 3   titleType       1060 non-null   object 
 4   primaryTitle    1060 non-null   object 
 5   originalTitle   1060 non-null   object 
 6   isAdult         1060 non-null   int64  
 7   startYear       1060 non-null   float64
 8   endYear         110 non-null    float64
 9   runtimeMinutes  1060 non-null   float64
 10  genres          745 non-null    object 
 11  averageRating   1060 non-null   float64
 12  numVotes        1060 non-null   int64  
 13  director_name   986 non-null    object 
 14  writer_name     986 non-null    object 
dtypes: float64(4), int64(2), object(9)
memory usage: 132.5+ KB


Setelah dilihat masih terdapat missing value pada kolom 'director_name','writer_name', dan 'genres'. Selain itu masih ada kolom-kolom yang tidak diperlukan. Oleh karena itu akan dilakukan pembersihan data sekali lagi.

In [16]:
#Melakukan drop terhadap kolom knownForTitles
base_drop = base_df.drop(['knownForTitles','tconst','isAdult','endYear','originalTitle'], axis=1)

#Mengganti nilai NULL dengan 'Unknown'
base_drop[['director_name','writer_name','genres']] = base_drop[['director_name','writer_name','genres']].fillna('unknown')

#karena value kolom genres terdapat multiple values, jadi kita akan bungkus menjadi list of list
base_drop['genres'] = base_drop['genres'].apply(lambda x: x.split(','))

#Mengurutkan kolom dan menyederhanakan nama kolom
base_drop = base_drop[['primaryTitle','titleType','startYear','runtimeMinutes','genres','averageRating','numVotes','cast_name','director_name','writer_name']]
base_drop.columns = ['title','type','start','duration','genres','rating','votes','cast_name','director_name','writer_name']

# Pengecekan Terakhir 
base_drop

Unnamed: 0,title,type,start,duration,genres,rating,votes,cast_name,director_name,writer_name
0,The Love Expert,movie,1920.0,60.0,"[Comedy, Romance]",4.9,136,[Natalie Talmadge],[David Kirkland],"[John Emerson, Anita Loos]"
1,Yes or No,movie,1920.0,72.0,[unknown],6.3,7,[Natalie Talmadge],[Roy William Neill],"[Arthur F. Goodrich, Burns Mantle, Mary Murillo]"
2,Our Hospitality,movie,1923.0,65.0,"[Comedy, Romance, Thriller]",7.8,9621,[Natalie Talmadge],"[Buster Keaton, John G. Blystone]","[Jean C. Havez, Clyde Bruckman, Joseph A. Mitc..."
3,The King of Kings,movie,1927.0,155.0,"[Biography, Drama, History]",7.3,1826,[Reeka Roberts],[Cecil B. DeMille],[Jeanie Macpherson]
4,I Cover the Waterfront,movie,1933.0,80.0,"[Drama, Romance]",6.3,455,[James Hackett],[James Cruze],"[Max Miller, Wells Root, Jack Jevne]"
...,...,...,...,...,...,...,...,...,...,...
1055,UFC on ESPN,tvSeries,2019.0,180.0,[unknown],8.1,38,[Vanessa Hanson],unknown,unknown
1056,Bozkir,tvMiniSeries,2018.0,50.0,"[Crime, Drama, Mystery]",8.2,1231,[Utku Arslan],[Bahadir Ince],"[Levent Cantek, Ali Demirel, Baris Erdogan]"
1057,Blue Story,movie,2019.0,91.0,"[Crime, Drama]",5.5,1411,[Jonathon Deering],[Rapman],[Rapman]
1058,Typewriter,tvSeries,2019.0,48.0,"[Horror, Thriller]",6.5,2895,[Sandini Dhar],[Sujoy Ghosh],"[Sujoy Ghosh, Raj Vasant, Pratim D. Gupta, Sur..."


## Reccomendation with Similarity Index


### Klasifikasi Metadata

kita akan klasifikasikan berdasarkan metadata genres, primaryName (cast name), director name, dan writer_name


In [17]:
#Klasifikasi berdasar title, cast_name, genres, director_name, dan writer_name
feature_df = base_drop[['title','cast_name','genres','director_name','writer_name']]

#Tampilkan 5 baris teratas
feature_df.head()

Unnamed: 0,title,cast_name,genres,director_name,writer_name
0,The Love Expert,[Natalie Talmadge],"[Comedy, Romance]",[David Kirkland],"[John Emerson, Anita Loos]"
1,Yes or No,[Natalie Talmadge],[unknown],[Roy William Neill],"[Arthur F. Goodrich, Burns Mantle, Mary Murillo]"
2,Our Hospitality,[Natalie Talmadge],"[Comedy, Romance, Thriller]","[Buster Keaton, John G. Blystone]","[Jean C. Havez, Clyde Bruckman, Joseph A. Mitc..."
3,The King of Kings,[Reeka Roberts],"[Biography, Drama, History]",[Cecil B. DeMille],[Jeanie Macpherson]
4,I Cover the Waterfront,[James Hackett],"[Drama, Romance]",[James Cruze],"[Max Miller, Wells Root, Jack Jevne]"


### Sanitize

Langkah selanjutnya adalah melakukan sanitize yang digunakan untuk menghilangkan spasi dari setiap row dan setiap elemennya.


In [18]:
def sanitize(x):
    try:
        #kalau cell berisi list
        if isinstance(x, list):
            return [i.replace(' ','').lower() for i in x]
        #kalau cell berisi string
        else:
            return [x.replace(' ','').lower()]
    except:
        print(x)
        
#Kolom : cast_name, genres, writer_name, director_name        
feature_cols = ['cast_name','genres','writer_name','director_name']

#Apply function sanitize 
for col in feature_cols:
    feature_df[col] = feature_df[col].apply(sanitize)

#Cek dataframe
feature_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,title,cast_name,genres,director_name,writer_name
0,The Love Expert,[natalietalmadge],"[comedy, romance]",[davidkirkland],"[johnemerson, anitaloos]"
1,Yes or No,[natalietalmadge],[unknown],[roywilliamneill],"[arthurf.goodrich, burnsmantle, marymurillo]"
2,Our Hospitality,[natalietalmadge],"[comedy, romance, thriller]","[busterkeaton, johng.blystone]","[jeanc.havez, clydebruckman, josepha.mitchell]"
3,The King of Kings,[reekaroberts],"[biography, drama, history]",[cecilb.demille],[jeaniemacpherson]
4,I Cover the Waterfront,[jameshackett],"[drama, romance]",[jamescruze],"[maxmiller, wellsroot, jackjevne]"


### Soup

Setelah di sanitize, kita akan menggabungkan data pada tiap-tiap kolom ke dalam satu kolom. 

In [19]:
#kolom yang digunakan : cast_name, genres, director_name, writer_name
def soup_feature(x):
    return ' '.join(x['cast_name']) + ' ' + ' '.join(x['genres']) + ' ' + ' '.join(x['director_name']) + ' ' + ' '.join(x['writer_name'])

#membuat soup menjadi 1 kolom 
feature_df['soup'] = feature_df.apply(soup_feature, axis=1)


#Cek dataframe
feature_df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,title,cast_name,genres,director_name,writer_name,soup
0,The Love Expert,[natalietalmadge],"[comedy, romance]",[davidkirkland],"[johnemerson, anitaloos]",natalietalmadge comedy romance davidkirkland j...
1,Yes or No,[natalietalmadge],[unknown],[roywilliamneill],"[arthurf.goodrich, burnsmantle, marymurillo]",natalietalmadge unknown roywilliamneill arthur...
2,Our Hospitality,[natalietalmadge],"[comedy, romance, thriller]","[busterkeaton, johng.blystone]","[jeanc.havez, clydebruckman, josepha.mitchell]",natalietalmadge comedy romance thriller buster...
3,The King of Kings,[reekaroberts],"[biography, drama, history]",[cecilb.demille],[jeaniemacpherson],reekaroberts biography drama history cecilb.de...
4,I Cover the Waterfront,[jameshackett],"[drama, romance]",[jamescruze],"[maxmiller, wellsroot, jackjevne]",jameshackett drama romance jamescruze maxmille...


### Vectorizing

Setelah menggabungkan semua elemen yang diperlukan pada kolom soup, kita perlu melakukan vectorizing. Vectorinzing pada dasarnya adalah mengubah semua elemen-elemen menjadi sebuah angka di mana nilainya sesuai dengan jumlah yang ada.


In [20]:
#import CountVectorizer 
from sklearn.feature_extraction.text import CountVectorizer

#definisikan CountVectorizer dan mengubah soup tadi menjadi bentuk vector
count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(feature_df['soup'])

print(count)
print(count_matrix.shape)

CountVectorizer(stop_words='english')
(1060, 10026)



### Membuat Model dengan Cosine Similarity

Setelah selesai melakukan vectorizing, kita akan menghitung score cosine similarity dari setiap pasangan judul (berdasarkan semua kombinasi pasangan yang ada dimana cell di kolom i dan j menunjukkan score similarity antara judul i dan j). 

Lita akan menggunakan formula cosine similarity untuk membuat model. Score cosine ini sangatlah berguna dan mudah untuk dihitung. Formula untuk perhitungan cosine similarity antara 2 text, adalah sebagai berikut:
$cosine(x,y)=\frac{x.y^T}{||x||.||y||}$

Output yang didapat antara range 0 sampai 1. Score yang hampir mencapai 1 artinya kedua entitas tersebut sangatlah mirip sedangkan score yang hampir mencapai 0 artinya kedua entitas tersebut adalah beda.


In [21]:
#Import cosine_similarity
from sklearn.metrics.pairwise import cosine_similarity

#Gunakan cosine_similarity antara count_matrix 
cosine_sim = cosine_similarity(count_matrix, count_matrix)

#print hasilnya
print(cosine_sim)

[[1.         0.15430335 0.35355339 ... 0.         0.         0.13608276]
 [0.15430335 1.         0.10910895 ... 0.         0.         0.        ]
 [0.35355339 0.10910895 1.         ... 0.         0.08703883 0.09622504]
 ...
 [0.         0.         0.         ... 1.         0.         0.        ]
 [0.         0.         0.08703883 ... 0.         1.         0.10050378]
 [0.13608276 0.         0.09622504 ... 0.         0.10050378 1.        ]]


### Membuat Fungsi Recommender

Setelah selesai membuat cosine similarity, kita bisa membuat sistem rekomendasi dengan mengambil nilai yang paling mirip dari judul yang dicari.


In [22]:
def content_recommender(title):
    indices = pd.Series(feature_df.index, index=feature_df['title']).drop_duplicates()

    #mendapatkan index dari judul film (title) yang disebutkan
    idx = indices[title]

    #menjadikan list dari array similarity cosine sim
    sim_scores = list(enumerate(cosine_sim[idx]))

    #mengurutkan film dari similarity tertinggi ke terendah
    sim_scores = sorted(sim_scores, key=lambda x:x[1], reverse=True)

    #untuk mendapatkan list judul dari item kedua sampe ke 11
    sim_scores = sim_scores[1:11]

    #mendapatkan index dari judul-judul yang muncul di sim_scores
    movie_indices = [i[0] for i in sim_scores]

    #dengan menggunakan iloc, kita bisa panggil balik berdasarkan index dari movie_indices
    return base_drop.iloc[movie_indices]

## Mengaplikasikan System Recommender

Setelah selesai membuat sistem rekomendasi, kita bisa langsung mengaplikasikannya. Sebagai contoh, ada seseorang yang ingin mencari film yang mirip dengan 'Iron Man'. Dengan sistem rekomendasi ini, film yang cocok untuknya adalah:

In [23]:
#Mengaplikasikan Sistem
content_recommender('Iron Man')

Unnamed: 0,title,type,start,duration,genres,rating,votes,cast_name,director_name,writer_name
726,Black Panther,movie,2018.0,134.0,"[Action, Adventure, Sci-Fi]",7.3,575851,[Robert E. Evans],[Ryan Coogler],"[Stan Lee, Jack Kirby, Ryan Coogler, Joe Rober..."
873,X-Men: Apocalypse,movie,2016.0,144.0,"[Action, Adventure, Sci-Fi]",6.9,377709,[Frank Maudsley],[Bryan Singer],"[Stan Lee, Jack Kirby, Simon Kinberg, Bryan Si..."
511,Star Trek,movie,2009.0,127.0,"[Action, Adventure, Sci-Fi]",7.9,567224,"[Matthew Fuchs, Aida Caefer]",[J.J. Abrams],"[Gene Roddenberry, Roberto Orci, Alex Kurtzman]"
611,X-Men: First Class,movie,2011.0,131.0,"[Action, Adventure, Sci-Fi]",7.7,629609,[Aida Caefer],[Matthew Vaughn],"[Stan Lee, Jack Kirby, Ashley Miller, Zack Ste..."
494,Ant-Man,movie,2015.0,117.0,"[Action, Adventure, Comedy]",7.3,540644,[Francesco Cadoni],[Peyton Reed],"[Stan Lee, Larry Lieber, Jack Kirby, Edgar Wri..."
791,Spider-Man: Homecoming,movie,2017.0,133.0,"[Action, Adventure, Sci-Fi]",7.4,479292,[Frank Maudsley],[Jon Watts],"[Stan Lee, Jack Kirby, Joe Simon, Jonathan Gol..."
466,Alita: Battle Angel,movie,2019.0,122.0,"[Action, Adventure, Sci-Fi]",7.3,202735,[Jeff Bottoms],[Robert Rodriguez],"[James Cameron, Laeta Kalogridis, Yukito Kishiro]"
637,Inception,movie,2010.0,148.0,"[Action, Adventure, Sci-Fi]",8.8,1950039,[Dan Churchill],[Christopher Nolan],[Christopher Nolan]
649,Battleship,movie,2012.0,131.0,"[Action, Adventure, Sci-Fi]",5.8,230468,[Robert E. Evans],[Peter Berg],"[Jon Hoeber, Erich Hoeber]"
877,Thor: Ragnarok,movie,2017.0,130.0,"[Action, Adventure, Comedy]",7.9,544539,[Francesco Cadoni],[Taika Waititi],"[Stan Lee, Larry Lieber, Jack Kirby, Craig Kyl..."
