# Film Recommender dengan TensorFlow

Notebook ini mengimplementasikan sistem rekomendasi film menggunakan **TensorFlow** dan **Keras**. Tujuannya adalah untuk membuat model *Matrix Factorization* yang memprediksi rating film untuk pengguna.

**Langkah-langkah:**
1.  Instalasi dan Impor Library.
2.  Load dan Pra-pemrosesan Data.
3.  Membangun Arsitektur Model Matrix Factorization.
4.  Melatih Model.
5.  Menyimpan dan Mengevaluasi Model.
6.  Membuat Fungsi Rekomendasi.

## 1. Instalasi dan Impor Library

In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import os

## 2. Load dan Pra-pemrosesan Data

In [2]:
ratings_df = pd.read_csv("data/ratings.csv")
movies_df = pd.read_csv("data/movies.csv")

print("Data Rating:")
print(ratings_df.head())
print("Data Film:")
print(movies_df.head())

Data Rating:
   userId  movieId  rating   timestamp
0       1      296     5.0  1147880044
1       1      306     3.5  1147868817
2       1      307     5.0  1147868828
3       1      665     5.0  1147878820
4       1      899     3.5  1147868510
Data Film:
   movieId                               title  \
0        1                    Toy Story (1995)   
1        2                      Jumanji (1995)   
2        3             Grumpier Old Men (1995)   
3        4            Waiting to Exhale (1995)   
4        5  Father of the Bride Part II (1995)   

                                        genres  
0  Adventure|Animation|Children|Comedy|Fantasy  
1                   Adventure|Children|Fantasy  
2                               Comedy|Romance  
3                         Comedy|Drama|Romance  
4                                       Comedy  


### Mengubah ID menjadi Indeks Berurutan
Embedding layer di TensorFlow memerlukan input berupa integer yang berurutan dari 0. Oleh karena itu, kita perlu mengubah `userId` dan `movieId` menjadi indeks.

In [3]:
# Encoder untuk userId
user_encoder = LabelEncoder()
ratings_df['user_idx'] = user_encoder.fit_transform(ratings_df['userId'])
num_users = len(ratings_df['user_idx'].unique())

# Encoder untuk movieId
movie_encoder = LabelEncoder()
ratings_df['movie_idx'] = movie_encoder.fit_transform(ratings_df['movieId'])
num_movies = len(ratings_df['movie_idx'].unique())

print(f"Jumlah Pengguna Unik: {num_users}")
print(f"Jumlah Film Unik: {num_movies}")
ratings_df.head()

Jumlah Pengguna Unik: 162541
Jumlah Film Unik: 59047


Unnamed: 0,userId,movieId,rating,timestamp,user_idx,movie_idx
0,1,296,5.0,1147880044,0,292
1,1,306,3.5,1147868817,0,302
2,1,307,5.0,1147868828,0,303
3,1,665,5.0,1147878820,0,654
4,1,899,3.5,1147868510,0,878


### Membagi Data Training dan Testing

In [4]:
X = ratings_df[['user_idx', 'movie_idx']]
y = ratings_df['rating']

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

## 3. Membangun Arsitektur Model

In [5]:
EMBEDDING_SIZE = 50

class RecommenderNet(keras.Model):
    def __init__(self, num_users, num_movies, embedding_size, **kwargs):
        super(RecommenderNet, self).__init__(**kwargs)
        self.user_embedding = layers.Embedding(
            num_users, embedding_size, embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6)
        )
        self.user_bias = layers.Embedding(num_users, 1)
        self.movie_embedding = layers.Embedding(
            num_movies, embedding_size, embeddings_initializer="he_normal",
            embeddings_regularizer=keras.regularizers.l2(1e-6)
        )
        self.movie_bias = layers.Embedding(num_movies, 1)

    def call(self, inputs):
        user_vector = self.user_embedding(inputs[:, 0])
        user_bias = self.user_bias(inputs[:, 0])
        movie_vector = self.movie_embedding(inputs[:, 1])
        movie_bias = self.movie_bias(inputs[:, 1])
        dot_user_movie = tf.tensordot(user_vector, movie_vector, 2)
        # Add bias terms
        x = dot_user_movie + user_bias + movie_bias
        return tf.nn.sigmoid(x) * 5 # Skala output ke rentang 0-5

model = RecommenderNet(num_users, num_movies, EMBEDDING_SIZE)

model.compile(
    loss=tf.keras.losses.MeanSquaredError(),
    optimizer=keras.optimizers.Adam(learning_rate=0.001)
)

## 4. Melatih Model

In [6]:
history = model.fit(
    x=X_train.values,
    y=y_train.values,
    batch_size=64,
    epochs=5,
    verbose=1,
    validation_data=(X_val.values, y_val.values)
)

Epoch 1/5
[1m  7905/312502[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m5:08:05[0m 61ms/step - loss: 1.5524

KeyboardInterrupt: 

## 5. Menyimpan Model

In [None]:
model_dir = 'models/tf_model'
os.makedirs(model_dir, exist_ok=True)
model.save(model_dir)

## 6. Membuat Fungsi Rekomendasi

In [None]:
def recommend_movies_for_user_tf(user_id, n=10):
    # Dapatkan movie yang sudah ditonton user
    watched_movies = ratings_df[ratings_df['userId'] == user_id]['movieId'].tolist()
    
    # Dapatkan semua movie yang belum ditonton
    unseen_movies = movies_df[~movies_df['movieId'].isin(watched_movies)]
    unseen_movie_ids = unseen_movies['movieId'].tolist()
    
    # Ubah ID ke indeks internal
    user_idx = user_encoder.transform([user_id])[0]
    unseen_movie_idx = movie_encoder.transform(unseen_movie_ids)
    
    # Buat input untuk model
    user_movie_pairs = np.array([[user_idx, movie_idx] for movie_idx in unseen_movie_idx])
    
    # Prediksi rating
    predicted_ratings = model.predict(user_movie_pairs).flatten()
    
    # Gabungkan hasil
    results = pd.DataFrame({
        'movieId': unseen_movie_ids,
        'predicted_rating': predicted_ratings
    })
    
    # Urutkan dan ambil top-N
    top_recommendations = results.sort_values(by='predicted_rating', ascending=False).head(n)
    
    # Gabungkan dengan judul film
    recommended_movies = movies_df.merge(top_recommendations, on='movieId')
    
    return recommended_movies

### Coba Rekomendasi

In [None]:
# Pilih user ID acak untuk diuji
test_user_id = 33
recommendations = recommend_movies_for_user_tf(test_user_id, n=10)
print(f"Rekomendasi untuk User ID: {test_user_id}")
print(recommendations[['movieId', 'title', 'predicted_rating']])