# *Recommender System* : Sistem Rekomendasi Movie Berdasarkan Tipe *Content Based Filter*

Analisis oleh [Shelly Victory](https://www.dicoding.com/users/victorysl)

*Dataset*: [MovieLens (small)](https://www.kaggle.com/sengzhaotoo/movielens-small)

## 1. Pendahuluan
Pada proyek ini dibuat sistem rekomendasi movie dengan tipe *content based filter* pada situs MovieLens sebagai tugas *submission* ahir pada kelas *Machine Learning* Terapan.

## 2. Data *Understanding*

### 2.1. Mengimpor Pustaka *Python* yang  Diperlukan

In [1]:
import numpy as np
import os
import pandas as pd
import tensorflow as tf
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from google.colab import drive
from sklearn.metrics import precision_score

### 2.2 Data *Loading*
Mengunduh dan mendeksripsikan *dataset*.

In [2]:
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
links = pd.read_csv('/content/drive/MyDrive/dataset/movie/links.csv')
movies = pd.read_csv('/content/drive/MyDrive/dataset/movie/movies.csv')
tags = pd.read_csv('/content/drive/MyDrive/dataset/movie/tags.csv')
ratings = pd.read_csv('/content/drive/MyDrive/dataset/movie/ratings.csv')

print('Jumlah data link movies: ', len(links.tmdbId.unique()))
print('Jumlah data judul movies: ', len(movies.title.unique()))
print('Jumlah data movie yang diberikan tag: ', len(tags.movieId.unique()))
print('Jumlah data pengguna yang setidaknya memberikan 1 kali penilaian: ', len(ratings.userId.unique()))

Jumlah data link movies:  9113
Jumlah data judul movies:  9123
Jumlah data movie yang diberikan tag:  689
Jumlah data pengguna yang setidaknya memberikan 1 kali penilaian:  671


## 3. *Univariate Exploratory Data Analysis*
Secara keseluruhan, variabel-variabel yang terdapat dalam *dataset* meliputi: <br>
a. links: merupakan *dataset* yang berisikan tautan menuju sumber atau *database* untuk mengakses detail movie. <br>
b. movies: merupakan *dataset* yang berisikan judul dan genre movie. <br>
c. ratings: merupakan penilaian *user* terhadap movie. <br>
d. tags: merupakan penanda sistem online pada movie. <br>

Pada ke-4 *dataset* tersebut, fitur userId merujuk pada movieId dan data yang sama. 




### 3.1.  *Links Variable* 

In [4]:
links

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0
...,...,...,...
9120,162672,3859980,402672.0
9121,163056,4262980,315011.0
9122,163949,2531318,391698.0
9123,164977,27660,137608.0


Pada *dataset* links, terdapat beberapa fitur yaitu: <br>
- movieId: ID movie yang bersifat unik pada tiap judul. <br>
- Internet Movie Database ID (imdbId): Situs Web yang menyediakan informasi mengenai movie.<br>
- The Movie Database ID (tmdbId): Database yang menyediakan informasi mengenai movie.


In [5]:
print('Banyak data: ', len(links.movieId.unique()))
print('Banyak imdbID yang tersedia: ', len(links.imdbId.unique()))

Banyak data:  9125
Banyak imdbID yang tersedia:  9125


### 3.2. *Movies Variable*

In [6]:
movies

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9120,162672,Mohenjo Daro (2016),Adventure|Drama|Romance
9121,163056,Shin Godzilla (2016),Action|Adventure|Fantasy|Sci-Fi
9122,163949,The Beatles: Eight Days a Week - The Touring Y...,Documentary
9123,164977,The Gay Desperado (1936),Comedy


Pada *dataset* movies, terdapat beberapa fitur: <br>
- movieId: ID movie yang bersifat unik pada tiap judul. <br>
- title: judul movie. <br>
- genres: Kategori yang menjadi salah satu dasar pengelompokkan movie.

In [7]:
print('Banyak data judul movie: ', len(movies.title.unique()))
print('Jumlah genre: ', len(movies.genres.unique()))

Banyak data judul movie:  9123
Jumlah genre:  902


### 3.3. *Ratings Variable*

In [8]:
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
...,...,...,...,...
99999,671,6268,2.5,1065579370
100000,671,6269,4.0,1065149201
100001,671,6365,4.0,1070940363
100002,671,6385,2.5,1070979663


Pada *dataset* ratings, fitur-fitur yang dimiliki adalah: <br>
- userId : ID pengguna yang bersifat unik tiap orang. <br>
- movieId: ID movie yang bersifat unik pada tiap judul. <br>
- rating: penilaian yang diberikan pengguna terhadap movie. <br>
- timestamp: stempel waktu. <br>

In [9]:
print('Jumlah user yang memberikan penilaian: ', len(ratings.userId.unique()))
print('Jumlah movie yang diberikan penilaian: ', len(ratings.movieId.unique()))

Jumlah user yang memberikan penilaian:  671
Jumlah movie yang diberikan penilaian:  9066


In [10]:
ratings.describe()

Unnamed: 0,userId,movieId,rating,timestamp
count,100004.0,100004.0,100004.0,100004.0
mean,347.01131,12548.664363,3.543608,1129639000.0
std,195.163838,26369.198969,1.058064,191685800.0
min,1.0,1.0,0.5,789652000.0
25%,182.0,1028.0,3.0,965847800.0
50%,367.0,2406.5,4.0,1110422000.0
75%,520.0,5418.0,4.0,1296192000.0
max,671.0,163949.0,5.0,1476641000.0


### 3.4. *Tags Variable*

In [11]:
tags

Unnamed: 0,userId,movieId,tag,timestamp
0,15,339,sandra 'boring' bullock,1138537770
1,15,1955,dentist,1193435061
2,15,7478,Cambodia,1170560997
3,15,32892,Russian,1170626366
4,15,34162,forgettable,1141391765
...,...,...,...,...
1291,660,135518,meaning of life,1436680885
1292,660,135518,philosophical,1436680885
1293,660,135518,sci-fi,1436680885
1294,663,260,action,1438398078


Sama seperti *dataset* ratings, fitur-fitur yang dimiliki oleh *dataset* tags adalah: <br>
- userId : ID pengguna yang bersifat unik tiap orang. <br>
- movieId: ID movie yang bersifat unik pada tiap judul. <br>
- tag: penanda sistem online pada movie. <br>
- timestamp: stempel waktu. <br>

In [12]:
print('Banyak data tag movie: ', len(tags.tag.unique()))
print('Jumlah movie yang diberi label: ', len(tags.movieId.unique()))

Banyak data tag movie:  582
Jumlah movie yang diberi label:  689


## 4. *Data Preparation*

### 4.1. *Data Preprocessing*

#### 4.1.1. Menggabungkan Movie

In [13]:
# Menggabungkan Movie
# Menggabungkan seluruh movieID pada kategori movie
movie_all = np.concatenate((
    links.movieId.unique(),
    movies.movieId.unique()
))

# Mengurutkan data dan menghapus data yang sama
movie_all = np.sort(np.unique(movie_all))
print('Jumlah seluruh data movie berdasarkan movieID: ', len(movie_all))

Jumlah seluruh data movie berdasarkan movieID:  9125


#### 4.1.2. Menggabungkan Pengguna

In [14]:
# Menggabungkan seluruh userID berdasarkan kategori user
user_all = np.concatenate((
    ratings.userId.unique(),
    tags.userId.unique()
))

# mengurutkan data dan menghapus data yang sama
user_all = np.sort(np.unique(user_all))
print('Jumlah seluruh data pengguna berdasarkan userID: ', len(user_all))

Jumlah seluruh data pengguna berdasarkan userID:  671


#### 4.1.3. Menggabungkan Rating dengan Judul Movie

In [15]:
all_movie = ratings

In [16]:
# Menggabungkan all_movie dengan dataframe movies berdasarkan movieID
all_movie_name = pd.merge(all_movie, movies[['movieId', 'title']], on='movieId', how='left')

# print dataframe all_movie_name
all_movie_name

Unnamed: 0,userId,movieId,rating,timestamp,title
0,1,31,2.5,1260759144,Dangerous Minds (1995)
1,1,1029,3.0,1260759179,Dumbo (1941)
2,1,1061,3.0,1260759182,Sleepers (1996)
3,1,1129,2.0,1260759185,Escape from New York (1981)
4,1,1172,4.0,1260759205,Cinema Paradiso (Nuovo cinema Paradiso) (1989)
...,...,...,...,...,...
99999,671,6268,2.5,1065579370,Raising Victor Vargas (2002)
100000,671,6269,4.0,1065149201,Stevie (2002)
100001,671,6365,4.0,1070940363,"Matrix Reloaded, The (2003)"
100002,671,6385,2.5,1070979663,Whale Rider (2002)


#### 4.1.4 Menggabungkan Data dengan Genre Movie

In [17]:
# Menggabungkan all_movie dengan dataframe movies berdasarkan movieID
all_movie_genre = pd.merge(all_movie_name, movies[['movieId', 'genres']], on='movieId', how='left')

# print dataframe all_movie_name
all_movie_genre

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,31,2.5,1260759144,Dangerous Minds (1995),Drama
1,1,1029,3.0,1260759179,Dumbo (1941),Animation|Children|Drama|Musical
2,1,1061,3.0,1260759182,Sleepers (1996),Thriller
3,1,1129,2.0,1260759185,Escape from New York (1981),Action|Adventure|Sci-Fi|Thriller
4,1,1172,4.0,1260759205,Cinema Paradiso (Nuovo cinema Paradiso) (1989),Drama
...,...,...,...,...,...,...
99999,671,6268,2.5,1065579370,Raising Victor Vargas (2002),Comedy|Drama|Romance
100000,671,6269,4.0,1065149201,Stevie (2002),Documentary
100001,671,6365,4.0,1070940363,"Matrix Reloaded, The (2003)",Action|Adventure|Sci-Fi|Thriller|IMAX
100002,671,6385,2.5,1070979663,Whale Rider (2002),Drama


### 4.2. *Data Cleaning*

In [18]:
# Memeriksa missing values
all_movie_genre.isnull().sum()

userId       0
movieId      0
rating       0
timestamp    0
title        0
genres       0
dtype: int64

In [19]:
# Menyamakan jenis genre movie
all_movie_genre=all_movie_genre.sort_values('movieId', ascending=True)
all_movie_genre

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
9713,68,1,4.0,1194741818,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
35933,261,1,1.5,1101665532,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
52631,383,1,5.0,852806429,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
35983,262,1,2.5,1433898798,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
12038,77,1,4.0,1163005363,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
...,...,...,...,...,...,...
39546,287,161944,5.0,1470167824,The Last Brickmaker in America (2001),Drama
11823,73,162376,4.5,1474255532,Stranger Things,Drama
92339,611,162542,5.0,1471520667,Rustom (2016),Romance|Thriller
92340,611,162672,3.0,1471523986,Mohenjo Daro (2016),Adventure|Drama|Romance


In [20]:
len(all_movie_genre.movieId.unique())

9066

In [21]:
all_movie_genre.genres.unique()

array(['Adventure|Animation|Children|Comedy|Fantasy',
       'Adventure|Children|Fantasy', 'Comedy|Romance',
       'Comedy|Drama|Romance', 'Comedy', 'Action|Crime|Thriller',
       'Adventure|Children', 'Action', 'Action|Adventure|Thriller',
       'Comedy|Horror', 'Adventure|Animation|Children', 'Drama',
       'Action|Adventure|Romance', 'Crime|Drama', 'Drama|Romance',
       'Action|Comedy|Crime|Drama|Thriller', 'Comedy|Crime|Thriller',
       'Crime|Drama|Horror|Mystery|Thriller', 'Drama|Sci-Fi',
       'Children|Drama', 'Adventure|Drama|Fantasy|Mystery|Sci-Fi',
       'Mystery|Sci-Fi|Thriller', 'Documentary|IMAX', 'Children|Comedy',
       'Drama|War', 'Action|Crime|Drama', 'Action|Adventure|Fantasy',
       'Comedy|Drama|Thriller', 'Mystery|Thriller',
       'Animation|Children|Drama|Musical|Romance',
       'Crime|Mystery|Thriller', 'Adventure|Drama', 'Drama|Mystery',
       'Drama|Thriller', 'Comedy|Crime', 'Action|Sci-Fi|Thriller',
       'Action|Comedy|Horror|Thriller', 'Com

In [22]:
all_movie_genre = all_movie_genre.drop_duplicates('movieId')
all_movie_genre

Unnamed: 0,userId,movieId,rating,timestamp,title,genres
9713,68,1,4.0,1194741818,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
97451,654,2,3.0,1145389613,Jumanji (1995),Adventure|Children|Fantasy
63773,459,3,4.0,859210733,Grumpier Old Men (1995),Comedy|Romance
54673,391,4,2.0,891534197,Waiting to Exhale (1995),Comedy|Drama|Romance
16858,110,5,4.0,840100796,Father of the Bride Part II (1995),Comedy
...,...,...,...,...,...,...
39546,287,161944,5.0,1470167824,The Last Brickmaker in America (2001),Drama
11823,73,162376,4.5,1474255532,Stranger Things,Drama
92339,611,162542,5.0,1471520667,Rustom (2016),Romance|Thriller
92340,611,162672,3.0,1471523986,Mohenjo Daro (2016),Adventure|Drama|Romance


### 4.3. Data *Transformation*

In [23]:
preparation = all_movie_genre

In [24]:
# mengonversi data series 'movieID' menjadi dalam bentuk list
movies_id = preparation['movieId'].tolist()

# mengonversi data series 'title' menjadi dalam bentuk list
movies_name = preparation['title'].tolist()

# mengonversi data series 'genres' menjadi dalam bentuk list
movies_genre = preparation['genres'].tolist()

print(len(movies_id))
print(len(movies_name))
print(len(movies_genre))

9066
9066
9066


In [25]:
# membuat dictionary untuk data 'movies_id', 'movies_name', dan 'movies_genre'
movies_new = pd.DataFrame({
    'movies_id' : movies_id,
    'movies_name' : movies_name,
    'movies_genre' : movies_genre
})
movies_new

Unnamed: 0,movies_id,movies_name,movies_genre
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9061,161944,The Last Brickmaker in America (2001),Drama
9062,162376,Stranger Things,Drama
9063,162542,Rustom (2016),Romance|Thriller
9064,162672,Mohenjo Daro (2016),Adventure|Drama|Romance


## 5. *Model Development* dengan *Content Based Filtering*

In [26]:
data = movies_new
data.head(2)

Unnamed: 0,movies_id,movies_name,movies_genre
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy


### 5.1. TF-IDF Vectorizer

In [27]:
# inisialisasi TfidfVectorizer
tf = TfidfVectorizer()

# melakukan perhitungan idf pada datacuisine
tf.fit(data['movies_genre'])

# Mapping array dari fitur index integer ke fitur utama
tf.get_feature_names()



['action',
 'adventure',
 'animation',
 'children',
 'comedy',
 'crime',
 'documentary',
 'drama',
 'fantasy',
 'fi',
 'film',
 'genres',
 'horror',
 'imax',
 'listed',
 'musical',
 'mystery',
 'no',
 'noir',
 'romance',
 'sci',
 'thriller',
 'war',
 'western']

In [28]:
# melakukan fit lalu ditransformasikan ke bentuk matrix
tfidf_matrix = tf.fit_transform(data['movies_genre'])

# melihat ukuran matrix tfidf
tfidf_matrix.shape

(9066, 24)

In [29]:
# megubah vektor tf-idf dalam bentuk matrix dengan fungsi todense()
tfidf_matrix.todense()

matrix([[0.        , 0.41032179, 0.53148344, ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.51028204, 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ],
        ...,
        [0.        , 0.        , 0.        , ..., 0.69290835, 0.        ,
         0.        ],
        [0.        , 0.68705353, 0.        , ..., 0.        , 0.        ,
         0.        ],
        [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
         0.        ]])

In [30]:
# membuat dataframe untuk melihat tf-idf matrix
# kolom diisi dengan jenis genre
# baris diisi dengan judul movie

pd.DataFrame(
    tfidf_matrix.todense(),
    columns=tf.get_feature_names(),
    index=data.movies_name
).sample(22, axis=1).sample(10, axis=0)



Unnamed: 0_level_0,fi,fantasy,animation,children,horror,thriller,romance,adventure,comedy,genres,film,documentary,no,musical,listed,western,war,crime,imax,mystery,sci,drama
movies_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
The Big Bus (1976),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.58694,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Juror, The (1996)",0.0,0.0,0.0,0.0,0.0,0.837286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.546764
Freakonomics (2010),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Twister (1996),0.0,0.0,0.0,0.0,0.0,0.470673,0.489773,0.546751,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Waste Land (2010),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Cercle Rouge, Le (Red Circle, The) (1970)",0.0,0.0,0.0,0.0,0.0,0.649788,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.760116,0.0,0.0,0.0,0.0
"Devil and Max Devlin, The (1981)",0.0,0.874972,0.0,0.0,0.0,0.0,0.0,0.0,0.484173,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Cinderella Story, A (2004)",0.0,0.0,0.0,0.0,0.0,0.0,0.809761,0.0,0.58676,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"Fly, The (Légy, A) (1980)",0.0,0.0,0.894026,0.0,0.0,0.0,0.0,0.0,0.448016,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
He's Just Not That Into You (2009),0.0,0.0,0.0,0.0,0.0,0.0,0.721898,0.0,0.523094,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.45303


### 5.2. Cosine Similarity

In [31]:
# menghitung cosine similarity pada matrix tf-idf
cosine_sim = cosine_similarity(tfidf_matrix)
cosine_sim

array([[1.        , 0.80410786, 0.15627644, ..., 0.        , 0.28191304,
        0.        ],
       [0.80410786, 1.        , 0.        , ..., 0.        , 0.35059107,
        0.        ],
       [0.15627644, 0.        , 1.        , ..., 0.58385843, 0.49837044,
        0.        ],
       ...,
       [0.        , 0.        , 0.58385843, ..., 1.        , 0.44375798,
        0.        ],
       [0.28191304, 0.35059107, 0.49837044, ..., 0.44375798, 1.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        1.        ]])

In [32]:
# membuat dataframe dari variabel cosine_sim dengan baris dan kolom berupa nama movie
cosine_sim_df = pd.DataFrame(cosine_sim, index=data['movies_name'], columns=data['movies_name'])
print('Shape: ', cosine_sim_df.shape)

# melhat similarity matrix pada setiap movies
cosine_sim_df.sample(5, axis=1).sample(10, axis=0)

Shape:  (9066, 9066)


movies_name,Backdraft (1991),Mozart and the Whale (2005),Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001),Jalla! Jalla! (2000),"Toxic Avenger Part III: The Last Temptation of Toxie, The (1989)"
movies_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"Brady Bunch Movie, The (1995)",0.0,0.523094,0.0,0.523094,0.515254
Lolita (1997),0.282644,0.852275,0.0,0.852275,0.0
Happy Gilmore (1996),0.0,0.523094,0.0,0.523094,0.515254
4 Little Girls (1997),0.0,0.0,0.0,0.0,0.0
Malice (1993),0.0,0.0,0.0,0.0,0.0
"Hunted, The (1995)",0.846913,0.0,0.0,0.0,0.0
Darfur Now (2007),0.0,0.0,0.0,0.0,0.0
"Jungle Book, The (1967)",0.0,0.146817,0.323132,0.146817,0.144617
Sympathy for Mr. Vengeance (Boksuneun naui geot) (2002),0.259182,0.220821,0.0,0.220821,0.0
Parineeta (2005),0.17543,0.528985,0.0,0.528985,0.0


### 5.3. Mendapatkan Rekomendasi

In [33]:
def movies_recommendations(nama_movie, similarity_data=cosine_sim_df, items=data[['movies_name', 'movies_genre']], k=5):
    """
    Rekomendasi Movies berdasarkan kemiripan dataframe
 
    Parameter:
    ---
    nama_movie : tipe data string (str)
                Nama Restoran (index kemiripan dataframe)
    similarity_data : tipe data pd.DataFrame (object)
                      Kesamaan dataframe, simetrik, dengan movies sebagai 
                      indeks dan kolom
    items : tipe data pd.DataFrame (object)
            Mengandung kedua nama dan fitur lainnya yang digunakan untuk mendefinisikan kemiripan
    k : tipe data integer (int)
        Banyaknya jumlah rekomendasi yang diberikan
    ---
 
 
    Pada index ini, kita mengambil k dengan nilai similarity terbesar 
    pada index matrix yang diberikan (i).
    """
 
 
    # Mengambil data dengan menggunakan argpartition untuk melakukan partisi secara tidak langsung sepanjang sumbu yang diberikan    
    # Dataframe diubah menjadi numpy
    # Range(start, stop, step)
    index = similarity_data.loc[:,nama_movie].to_numpy().argpartition(
        range(-1, -k, -1))
    
    # Mengambil data dengan similarity terbesar dari index yang ada
    closest = similarity_data.columns[index[-1:-(k+2):-1]]
    
    # Drop nama_movie agar nama movie yang dicari tidak muncul dalam daftar rekomendasi
    closest = closest.drop(nama_movie, errors='ignore')
 
    return pd.DataFrame(closest).merge(items).head(k)

In [34]:
data[data.movies_name.eq('Billy Liar (1963)')]

Unnamed: 0,movies_id,movies_name,movies_genre
3675,4687,Billy Liar (1963),Comedy


In [35]:
movies_recommendations('Billy Liar (1963)')

Unnamed: 0,movies_name,movies_genre
0,Bring It On (2000),Comedy
1,Private School (1983),Comedy
2,Punk's Dead: SLC Punk! 2 (2014),Comedy
3,Porky's Revenge (1985),Comedy
4,Death at a Funeral (2007),Comedy


In [38]:
rekomendasi_relevan = 5
total_rekomendasi = 5

presisi = (rekomendasi_relevan / total_rekomendasi)
presisi

1.0

Berdasarkan hasil rekomendasi yang muncul, seluruh movie yang disarankan memiliki genre yang sama dengan Billy Liar yaitu komedi sehingga presisi sistem bernilai 5/5 atau 100%.

## 6. Penutup
Pada proyek ini dilakukan pembuatan sistem rekomendasi dengan tipe *content based filtering* menggunakan dataset MovieLens. Pada sistem rekomendasi, fungsi tfidfvectorizer mengidentifikasi representasi penting dari fitur genre movie. Sementara itu, cosine similarity mengidentifikasi derajat kesamaan antar judul movie. Berdasarkan hasil pengembangan model, didapat model sistem rekomendasi yang memberikan saran judul movie dengan genre dan kesukaan pengguna sebelumnya. Metrik evaluasi yang digunakan adalah metrik presisi dengan nilai 1.0 atau 100%.

## 7. Referensi
*Dataset*: [MovieLens (small)](https://www.kaggle.com/sengzhaotoo/movielens-small)