#Import Library

Importing the library that is needed in this project.
- Pandas : manipulate and analyze data
- Numpy (concatenate) : join each row's values 
- Count Vectorization : convert text data to numerical data with a bag-of-words model
- Cosine Similarity : measure the similarity between title movie input data with movie dataset to give movie recommendation

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Data Preparation


## Movie Dataset

**Understanding Movie Dataset with knowing features and information about data.**
<br>
This dataset has 11 features and 751.614 rows with variate data types such as int, float, and object. Not all of the columns are filled by value, there are some columns with empty cells.

In [2]:
pd.set_option('display.max_columns', None)

movie_rating_df = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/movie_rating_df.csv')

#tampilkan 5 baris teratas dari movive_rating_df
movie_rating_df

Unnamed: 0,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,averageRating,numVotes
0,tt0000001,short,Carmencita,Carmencita,0,1894.0,,1.0,"Documentary,Short",5.6,1608
1,tt0000002,short,Le clown et ses chiens,Le clown et ses chiens,0,1892.0,,5.0,"Animation,Short",6.0,197
2,tt0000003,short,Pauvre Pierrot,Pauvre Pierrot,0,1892.0,,4.0,"Animation,Comedy,Romance",6.5,1285
3,tt0000004,short,Un bon bock,Un bon bock,0,1892.0,,12.0,"Animation,Short",6.1,121
4,tt0000005,short,Blacksmith Scene,Blacksmith Scene,0,1893.0,,1.0,"Comedy,Short",6.1,2050
...,...,...,...,...,...,...,...,...,...,...,...
751609,tt9916538,movie,Kuambil Lagi Hatiku,Kuambil Lagi Hatiku,0,2019.0,,123.0,,8.4,5
751610,tt9916544,short,My Sweet Prince,My Sweet Prince,0,2019.0,,12.0,"Drama,Short",7.2,19
751611,tt9916576,tvEpisode,Destinee's Story,Destinee's Story,0,2019.0,,85.0,,6.0,9
751612,tt9916720,short,The Nun 2,The Nun 2,0,2019.0,,10.0,"Comedy,Horror,Mystery",5.6,49


In [3]:
movie_rating_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 751614 entries, 0 to 751613
Data columns (total 11 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   tconst          751614 non-null  object 
 1   titleType       751614 non-null  object 
 2   primaryTitle    751614 non-null  object 
 3   originalTitle   751614 non-null  object 
 4   isAdult         751614 non-null  int64  
 5   startYear       751614 non-null  float64
 6   endYear         16072 non-null   float64
 7   runtimeMinutes  751614 non-null  float64
 8   genres          486766 non-null  object 
 9   averageRating   751614 non-null  float64
 10  numVotes        751614 non-null  int64  
dtypes: float64(4), int64(2), object(5)
memory usage: 63.1+ MB


## Actor Dataset

**Understanding Actor Datasets with knowing features and information about data.**
<br>
This dataset has 6 features and 1.000 rows with object data types. Not all of the columns are filled by value, there is a column with empty cells.

In [4]:
actor_df = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/actor_name.csv')
actor_df

Unnamed: 0,nconst,primaryName,birthYear,deathYear,primaryProfession,knownForTitles
0,nm1774132,Nathan McLaughlin,1973,\N,"special_effects,make_up_department","tt0417686,tt1713976,tt1891860,tt0454839"
1,nm10683464,Bridge Andrew,\N,\N,actor,tt7718088
2,nm1021485,Brandon Fransvaag,\N,\N,miscellaneous,tt0168790
3,nm6940929,Erwin van der Lely,\N,\N,miscellaneous,tt4232168
4,nm5764974,Svetlana Shypitsyna,\N,\N,actress,tt3014168
...,...,...,...,...,...,...
995,nm7596674,Paul Whitrow,\N,\N,actor,"tt4118352,tt9104322,tt4447090,tt4892804"
996,nm5938546,Wendy Ponce,\N,\N,,tt2125666
997,nm2101810,Ans Brugmans,\N,\N,costume_designer,tt0488280
998,nm5245804,Eliza Jenkins,\N,\N,,tt1464058


In [5]:
actor_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   nconst             1000 non-null   object
 1   primaryName        1000 non-null   object
 2   birthYear          1000 non-null   object
 3   deathYear          1000 non-null   object
 4   primaryProfession  891 non-null    object
 5   knownForTitles     1000 non-null   object
dtypes: object(6)
memory usage: 47.0+ KB


Cleaning unnecessary columns in Actor Dataset.

In [6]:
actor_df = actor_df[['nconst', 'primaryName', 'knownForTitles']]

#Tampilkan 5 baris teratas dari name_df
actor_df.head()

Unnamed: 0,nconst,primaryName,knownForTitles
0,nm1774132,Nathan McLaughlin,"tt0417686,tt1713976,tt1891860,tt0454839"
1,nm10683464,Bridge Andrew,tt7718088
2,nm1021485,Brandon Fransvaag,tt0168790
3,nm6940929,Erwin van der Lely,tt4232168
4,nm5764974,Svetlana Shypitsyna,tt3014168


Convert the "knownForTitles" column to a list by splitting the value with a comma (,) because comma is the separator of the value.
By checking the length of each value, the longest list in this column is 4 data.

In [7]:
#Melakukan pengecekan variasi
print(actor_df['knownForTitles'].apply(lambda x: len(x.split(','))).unique())

#Mengubah knownForTitles menjadi list of list
actor_df['knownForTitles'] = actor_df['knownForTitles'].apply(lambda x: x.split(','))

actor_df.head()

[4 1 2 3]


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  actor_df['knownForTitles'] = actor_df['knownForTitles'].apply(lambda x: x.split(','))


Unnamed: 0,nconst,primaryName,knownForTitles
0,nm1774132,Nathan McLaughlin,"[tt0417686, tt1713976, tt1891860, tt0454839]"
1,nm10683464,Bridge Andrew,[tt7718088]
2,nm1021485,Brandon Fransvaag,[tt0168790]
3,nm6940929,Erwin van der Lely,[tt4232168]
4,nm5764974,Svetlana Shypitsyna,[tt3014168]


**Making a new data frame for Actor Dataset so that each value in "knownForTitles" has its own row.**
<br>
- First, we should prepare a list variable to collect the new data frame named "df_uni". 
- Then, we have looping for all of the rows in "knownForTitles" column. Make an index variable named "idx" to contain the repetition of index data based on the length of a list in each row. Use the "concatenate" function from the NumPy library to join each row's values into a new data frame named "df1". Change the index from the new data frame with the "idx" variable. Append the "df1" data frame to the "df_uni" list.
- Convert the "df_uni" list into a data frame.


In [8]:
df_uni = []

for x in ['knownForTitles']:
    #mengulang index dari tiap baris sampai tiap elemen dari knownForTitles
    idx = actor_df.index.repeat(actor_df['knownForTitles'].str.len())

   #memecah values dari list di setiap baris dan menggabungkan nya dengan rows lain menjadi dataframe
    df1 = pd.DataFrame({
        x: np.concatenate(actor_df[x].values)
    })
    
    #mengganti index dataframe tersebut dengan idx yang sudah kita define di awal
    df1.index = idx
    #untuk setiap dataframe yang terbentuk, kita append ke dataframe bucket
    df_uni.append(df1)
    
#menggabungkan semua dataframe menjadi satu
df_concat = pd.concat(df_uni, axis=1)
df_concat

Unnamed: 0,knownForTitles
0,tt0417686
0,tt1713976
0,tt1891860
0,tt0454839
1,tt7718088
...,...
998,tt1464058
999,tt0436869
999,tt0476663
999,tt0109723


Joining the new data frame from the previous part with the Actor Dataset by using the "join" function. We use left join in this part and drop the 'knownForTitles' column.

In [9]:
#left join dengan value dari dataframe yang awal
unnested_df = df_concat.join(actor_df.drop(['knownForTitles'], 1), how='left')

#select kolom sesuai dengan dataframe awal
unnested_df = unnested_df[actor_df.columns.tolist()]
unnested_df

  unnested_df = df_concat.join(actor_df.drop(['knownForTitles'], 1), how='left')


Unnamed: 0,nconst,primaryName,knownForTitles
0,nm1774132,Nathan McLaughlin,tt0417686
0,nm1774132,Nathan McLaughlin,tt1713976
0,nm1774132,Nathan McLaughlin,tt1891860
0,nm1774132,Nathan McLaughlin,tt0454839
1,nm10683464,Bridge Andrew,tt7718088
...,...,...,...
998,nm5245804,Eliza Jenkins,tt1464058
999,nm0948460,Greg Yolen,tt0436869
999,nm0948460,Greg Yolen,tt0476663
999,nm0948460,Greg Yolen,tt0109723


Grouping the new data frame by using the "groupby" function, group it based on the "knownForTitles" column and aggregate the 'primaryName' column value. Also, reset the index.

In [10]:
unnested_drop = unnested_df.drop(['nconst'], axis=1)

#menyiapkan bucket untuk dataframe
df_uni = []

for col in ['primaryName']:
    #agregasi kolom PrimaryName sesuai group_col yang sudah di define di atas
    dfi = unnested_drop.groupby(['knownForTitles'])[col].apply(list)
    
    #Lakukan append
    df_uni.append(dfi)
df_grouped = pd.concat(df_uni, axis=1).reset_index()
df_grouped.columns = ['knownForTitles','cast_name']
df_grouped

Unnamed: 0,knownForTitles,cast_name
0,tt0008125,[Charles Harley]
1,tt0009706,[Charles Harley]
2,tt0010304,[Natalie Talmadge]
3,tt0011414,[Natalie Talmadge]
4,tt0011890,[Natalie Talmadge]
...,...,...
1893,tt9610496,[Stefano Baffetti]
1894,tt9714030,[Kevin Kain]
1895,tt9741820,[Caroline Plyler]
1896,tt9759814,[Ethan Francis]


## Dataset Director dan Writer

**Understanding Movie Dataset with knowing features and information about data.**
<br>
This dataset has 3 features and 986 rows with object data type. All of the columns are filled by value.

In [11]:
dir_df = pd.read_csv('https://storage.googleapis.com/dqlab-dataset/directors_writers.csv')
dir_df

Unnamed: 0,tconst,director_name,writer_name
0,tt0011414,David Kirkland,"John Emerson,Anita Loos"
1,tt0011890,Roy William Neill,"Arthur F. Goodrich,Burns Mantle,Mary Murillo"
2,tt0014341,"Buster Keaton,John G. Blystone","Jean C. Havez,Clyde Bruckman,Joseph A. Mitchell"
3,tt0018054,Cecil B. DeMille,Jeanie Macpherson
4,tt0024151,James Cruze,"Max Miller,Wells Root,Jack Jevne"
...,...,...,...
981,tt9236688,Kai Wessel,Christian Jeltsch
982,tt9278408,Bahadir Ince,"Levent Cantek,Ali Demirel,Baris Erdogan"
983,tt9285882,Rapman,Rapman
984,tt9310372,Sujoy Ghosh,"Sujoy Ghosh,Raj Vasant,Pratim D. Gupta,Suresh ..."


In [12]:
dir_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 986 entries, 0 to 985
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   tconst         986 non-null    object
 1   director_name  986 non-null    object
 2   writer_name    986 non-null    object
dtypes: object(3)
memory usage: 23.2+ KB


Convert the 'director_name' and 'writer_name' columns to a list by splitting the value with a comma (,) because comma is the separator of the value.

In [13]:
dir_df['director_name'] = dir_df['director_name'].apply(lambda row: row.split(','))
dir_df['writer_name'] = dir_df['writer_name'].apply(lambda row: row.split(','))

# Merge Datasets

We should merge all datasets together. First, we merge Movie Dataset with Actor Dataset that has already been cleaned and prepared before by using the "merge" function and using inner join so that we can get rows that match all datasets into a new data frame.
<br>
<br>
Then, also merge the new data frame with Director Dataset by using the "merge" function, but in this part, we use left join so that we can get all of the rows from the previous new data frame even though there is no match in the Director Dataset.

In [14]:
#join antara movie table dan cast table 
base_df = pd.merge(df_grouped, movie_rating_df, left_on='knownForTitles', right_on='tconst', how='inner')

#join antara base_df dengan director_writer table
base_df = pd.merge(base_df, dir_df, left_on='tconst', right_on='tconst', how='left')
base_df.head()

Unnamed: 0,knownForTitles,cast_name,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,averageRating,numVotes,director_name,writer_name
0,tt0011414,[Natalie Talmadge],tt0011414,movie,The Love Expert,The Love Expert,0,1920.0,,60.0,"Comedy,Romance",4.9,136,[David Kirkland],"[John Emerson, Anita Loos]"
1,tt0011890,[Natalie Talmadge],tt0011890,movie,Yes or No,Yes or No,0,1920.0,,72.0,,6.3,7,[Roy William Neill],"[Arthur F. Goodrich, Burns Mantle, Mary Murillo]"
2,tt0014341,[Natalie Talmadge],tt0014341,movie,Our Hospitality,Our Hospitality,0,1923.0,,65.0,"Comedy,Romance,Thriller",7.8,9621,"[Buster Keaton, John G. Blystone]","[Jean C. Havez, Clyde Bruckman, Joseph A. Mitc..."
3,tt0018054,[Reeka Roberts],tt0018054,movie,The King of Kings,The King of Kings,0,1927.0,,155.0,"Biography,Drama,History",7.3,1826,[Cecil B. DeMille],[Jeanie Macpherson]
4,tt0024151,[James Hackett],tt0024151,movie,I Cover the Waterfront,I Cover the Waterfront,0,1933.0,,80.0,"Drama,Romance",6.3,455,[James Cruze],"[Max Miller, Wells Root, Jack Jevne]"


The next step is we cleaned the new merge dataset by dropping the "knownForTitles" column, filling the empty cell in columns, and converting the 'genres and 'writer_name' columns to a list by splitting the value with a comma (,).

In [15]:
#Melakukan drop terhadap kolom knownForTitles
base_drop = base_df.drop(['knownForTitles'], axis=1)
print(base_drop.info())

#Mengganti nilai NULL pada kolom genres dengan 'Unknown'
base_drop['genres'] = base_drop['genres'].fillna('Unknown')

#Melakukan perhitungan jumlah nilai NULL pada tiap kolom
print(base_drop.isnull().sum())

#Mengganti nilai NULL pada kolom dorector_name dan writer_name dengan 'Unknown'
base_drop[['director_name','writer_name']] = base_drop[['director_name','writer_name']].fillna('unknown')

#karena value kolom genres terdapat multiple values, jadi kita akan bungkus menjadi list of list
base_drop['genres'] = base_drop['genres'].apply(lambda x: x.split(','))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1060 entries, 0 to 1059
Data columns (total 14 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   cast_name       1060 non-null   object 
 1   tconst          1060 non-null   object 
 2   titleType       1060 non-null   object 
 3   primaryTitle    1060 non-null   object 
 4   originalTitle   1060 non-null   object 
 5   isAdult         1060 non-null   int64  
 6   startYear       1060 non-null   float64
 7   endYear         110 non-null    float64
 8   runtimeMinutes  1060 non-null   float64
 9   genres          745 non-null    object 
 10  averageRating   1060 non-null   float64
 11  numVotes        1060 non-null   int64  
 12  director_name   986 non-null    object 
 13  writer_name     986 non-null    object 
dtypes: float64(4), int64(2), object(8)
memory usage: 124.2+ KB
None
cast_name           0
tconst              0
titleType           0
primaryTitle        0
originalTitle   

Drop unnecessary columns and change the name of the data frame header.

In [16]:
#Drop kolom tconst, isAdult, endYear, originalTitle
base_drop2 = base_drop.drop(['tconst', 'isAdult', 'endYear', 'originalTitle'], axis=1)

base_drop2 = base_drop2[['primaryTitle','titleType','startYear','runtimeMinutes','genres','averageRating','numVotes','cast_name','director_name','writer_name']]
# Gunakan petunjuk!
base_drop2.columns = ['title','type','start','duration','genres','rating','votes','cast_name','director_name','writer_name']
base_drop2.head()

Unnamed: 0,title,type,start,duration,genres,rating,votes,cast_name,director_name,writer_name
0,The Love Expert,movie,1920.0,60.0,"[Comedy, Romance]",4.9,136,[Natalie Talmadge],[David Kirkland],"[John Emerson, Anita Loos]"
1,Yes or No,movie,1920.0,72.0,[Unknown],6.3,7,[Natalie Talmadge],[Roy William Neill],"[Arthur F. Goodrich, Burns Mantle, Mary Murillo]"
2,Our Hospitality,movie,1923.0,65.0,"[Comedy, Romance, Thriller]",7.8,9621,[Natalie Talmadge],"[Buster Keaton, John G. Blystone]","[Jean C. Havez, Clyde Bruckman, Joseph A. Mitc..."
3,The King of Kings,movie,1927.0,155.0,"[Biography, Drama, History]",7.3,1826,[Reeka Roberts],[Cecil B. DeMille],[Jeanie Macpherson]
4,I Cover the Waterfront,movie,1933.0,80.0,"[Drama, Romance]",6.3,455,[James Hackett],[James Cruze],"[Max Miller, Wells Root, Jack Jevne]"


#Build The Recommender System

Choose **'title'**, **'cast_name'**, **'genres'**, **'director_name'**, and **'writer_name'** columns for use in the recommender system.

In [17]:
#Klasifikasi berdasar title, cast_name, genres, director_name, dan writer_name
feature_df = base_drop2[['title', 'cast_name', 'genres', 'director_name', 'writer_name']]

#Tampilkan 5 baris teratas
feature_df.head()

Unnamed: 0,title,cast_name,genres,director_name,writer_name
0,The Love Expert,[Natalie Talmadge],"[Comedy, Romance]",[David Kirkland],"[John Emerson, Anita Loos]"
1,Yes or No,[Natalie Talmadge],[Unknown],[Roy William Neill],"[Arthur F. Goodrich, Burns Mantle, Mary Murillo]"
2,Our Hospitality,[Natalie Talmadge],"[Comedy, Romance, Thriller]","[Buster Keaton, John G. Blystone]","[Jean C. Havez, Clyde Bruckman, Joseph A. Mitc..."
3,The King of Kings,[Reeka Roberts],"[Biography, Drama, History]",[Cecil B. DeMille],[Jeanie Macpherson]
4,I Cover the Waterfront,[James Hackett],"[Drama, Romance]",[James Cruze],"[Max Miller, Wells Root, Jack Jevne]"


Change values in **'cast_name'**, **'genres'**, **'writer_name'**, and **'director_name'** columns so that has no spaces and become lowercase by making a "sanitize" function and applying it to all of these columns.

In [18]:
def sanitize(x):
    try:
        #kalau cell berisi list
        if isinstance(x, list):
            return [i.replace(' ','').lower() for i in x]
        #kalau cell berisi string
        else:
            return [x.replace(' ','').lower()]
    except:
        print(x)
        
#Kolom : cast_name, genres, writer_name, director_name        
feature_cols = ['cast_name','genres','writer_name','director_name']

#Apply function sanitize 
for col in feature_cols:
    feature_df[col] = feature_df[col].apply(sanitize)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feature_df[col] = feature_df[col].apply(sanitize)


Join **'cast_name'**, **'genres'**, **'writer_name'**, and **'director_name'** columns into a new column named "soup".

In [19]:
#kolom yang digunakan : cast_name, genres, director_name, writer_name
def soup_feature(x):
    return ' '.join(x['cast_name']) + ' ' + ' '.join(x['genres']) + ' ' + ' '.join(x['director_name']) + ' ' + ' '.join(x['writer_name'])

#membuat soup menjadi 1 kolom 
feature_df['soup'] = feature_df.apply(soup_feature, axis=1)
feature_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  feature_df['soup'] = feature_df.apply(soup_feature, axis=1)


Unnamed: 0,title,cast_name,genres,director_name,writer_name,soup
0,The Love Expert,[natalietalmadge],"[comedy, romance]",[davidkirkland],"[johnemerson, anitaloos]",natalietalmadge comedy romance davidkirkland j...
1,Yes or No,[natalietalmadge],[unknown],[roywilliamneill],"[arthurf.goodrich, burnsmantle, marymurillo]",natalietalmadge unknown roywilliamneill arthur...
2,Our Hospitality,[natalietalmadge],"[comedy, romance, thriller]","[busterkeaton, johng.blystone]","[jeanc.havez, clydebruckman, josepha.mitchell]",natalietalmadge comedy romance thriller buster...
3,The King of Kings,[reekaroberts],"[biography, drama, history]",[cecilb.demille],[jeaniemacpherson],reekaroberts biography drama history cecilb.de...
4,I Cover the Waterfront,[jameshackett],"[drama, romance]",[jamescruze],"[maxmiller, wellsroot, jackjevne]",jameshackett drama romance jamescruze maxmille...
...,...,...,...,...,...,...
1055,UFC on ESPN,[vanessahanson],[unknown],[unknown],[unknown],vanessahanson unknown unknown unknown
1056,Bozkir,[utkuarslan],"[crime, drama, mystery]",[bahadirince],"[leventcantek, alidemirel, bariserdogan]",utkuarslan crime drama mystery bahadirince lev...
1057,Blue Story,[jonathondeering],"[crime, drama]",[rapman],[rapman],jonathondeering crime drama rapman rapman
1058,Typewriter,[sandinidhar],"[horror, thriller]",[sujoyghosh],"[sujoyghosh, rajvasant, pratimd.gupta, sureshn...",sandinidhar horror thriller sujoyghosh sujoygh...


Change the **'soup'** column into vectors by using the **'CountVectorization'** function in the sklearn library and remove the English stopwords.

In [20]:
#definisikan CountVectorizer dan mengubah soup tadi menjadi bentuk vector
count = CountVectorizer(stop_words='english')
count_matrix = count.fit_transform(feature_df['soup'])

print(count)
print(count_matrix.shape)

CountVectorizer(stop_words='english')
(1060, 10026)


Implement the **'cosine_similarity'** function into "soup" column vectors.

In [21]:
cosine_sim = cosine_similarity(count_matrix, count_matrix)

indices = pd.Series(feature_df.index, index=feature_df['title']).drop_duplicates()

def content_recommender(title):
    #mendapatkan index dari judul film (title) yang disebutkan
    idx = indices[title]
    
    #menjadikan list dari array similarity cosine sim 
    #hint: cosine_sim[idx]
    sim_scores = list(enumerate(cosine_sim[idx]))
    
    #mengurutkan film dari similarity tertinggi ke terendah
    sim_scores = sorted(sim_scores,key=lambda x:x[1],reverse=True)

    #untuk mendapatkan list judul dari item kedua sampe ke 11
    sim_scores = sim_scores[1:11]

    #mendapatkan index dari judul-judul yang muncul di sim_scores
    movie_indices = [i[0] for i in sim_scores]
    print(movie_indices)
    #dengan menggunakan iloc, kita bisa panggil balik berdasarkan index dari movie_indices
    return base_df.iloc[movie_indices]

#aplikasikan function di atas
content_recommender('Our Hospitality')

[0, 344, 1052, 24, 398, 441, 142, 172, 325, 345]


Unnamed: 0,knownForTitles,cast_name,tconst,titleType,primaryTitle,originalTitle,isAdult,startYear,endYear,runtimeMinutes,genres,averageRating,numVotes,director_name,writer_name
0,tt0011414,[Natalie Talmadge],tt0011414,movie,The Love Expert,The Love Expert,0,1920.0,,60.0,"Comedy,Romance",4.9,136,[David Kirkland],"[John Emerson, Anita Loos]"
344,tt0237123,[Anat Dychtwald],tt0237123,tvSeries,Coupling,Coupling,0,2000.0,2004.0,30.0,"Comedy,Romance",8.5,41571,[Martin Dennis],[Steven Moffat]
1052,tt9124840,[Metin Namlisesli],tt9124840,movie,Organik Ask,Organik Ask,0,2018.0,,100.0,"Comedy,Romance",4.3,89,[Kamil Cetin],[Volkan Girgin]
24,tt0043762,[Constance De Mattiazzi],tt0043762,movie,Lullaby of Broadway,Lullaby of Broadway,0,1951.0,,92.0,"Comedy,Musical,Romance",6.8,893,[David Butler],[Earl Baldwin]
398,tt0308670,[Wai Chi Wong],tt0308670,movie,Oi ching bak min bau,Oi ching bak min bau,0,2001.0,,101.0,"Comedy,Romance",6.8,47,[Steven Lo],"[Canny Leung, Chi Shan Leung]"
441,tt0396269,[Matthew Fuchs],tt0396269,movie,Wedding Crashers,Wedding Crashers,0,2005.0,,119.0,"Comedy,Romance",6.9,323737,[David Dobkin],"[Steve Faber, Bob Fisher]"
142,tt0094889,[Harvey J. Alperin],tt0094889,movie,Cocktail,Cocktail,0,1988.0,,104.0,"Comedy,Drama,Romance",5.9,76694,[Roger Donaldson],[Heywood Gould]
172,tt0102006,[Mats Tegner],tt0102006,movie,'Harry Lund' lägger näsan i blöt!,'Harry Lund' lägger näsan i blöt!,0,1991.0,,105.0,"Comedy,Thriller",5.1,66,[Mats Arehn],[Mats Arehn]
325,tt0198284,[Tim Horsely],tt0198284,movie,After Sex,After Sex,0,2000.0,,96.0,"Comedy,Drama,Romance",4.4,753,[Cameron Thor],[Thomas M. Kostigen]
345,tt0237501,[Ngan-Ying Poon],tt0237501,movie,Ninth Happiness,Gau sing bou hei,0,1998.0,,86.0,"Comedy,Musical,Romance",5.9,118,[Clifton Ko],[Raymond To]
