## Pendahuluan

Sistem rekomendasi merupakan sistem yang bertujuan untuk memberikan saran item kepada pengguna berdasarkan preferensi dan riwayat interaksi sebelumnya. Pada tugas ini digunakan pendekatan model-based collaborative filtering, di mana model dibangun dari data historis pengguna untuk memprediksi item yang relevan bagi pengguna aktif.


## Dataset

Dataset yang digunakan adalah MovieLens yang terdiri dari data pengguna, item (film), dan rating. Dataset ini digunakan sebagai dasar untuk membangun sistem rekomendasi.


In [15]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity


In [16]:
import pandas as pd

ratings = pd.read_csv("ratings.csv")
movies = pd.read_csv("movies.csv")

ratings.head()


Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [17]:
ratings['implicit'] = (ratings['rating'] >= 3).astype(int)


In [18]:
R = ratings.pivot_table(
    index='userId',
    columns='movieId',
    values='implicit',
    fill_value=0
)


In [19]:
def train_test_split_user(R, test_size=0.2, min_item=5):
    R_train = R.copy()
    R_test = pd.DataFrame(0, index=R.index, columns=R.columns)

    for user in R.index:
        items = R.loc[user][R.loc[user] > 0].index
        if len(items) < min_item:
            continue

        test_items = np.random.choice(
            items, 
            size=int(len(items) * test_size), 
            replace=False
        )

        R_train.loc[user, test_items] = 0
        R_test.loc[user, test_items] = 1

    return R_train, R_test


In [21]:
R_train, R_test = train_test_split_user(R)


In [22]:
item_popularity = R_train.sum(axis=0).sort_values(ascending=False)

def popularity_recommend(user_id, top_n=10):
    seen = R_train.loc[user_id]
    unseen = item_popularity[seen == 0]
    return unseen.index[:top_n]


In [23]:
item_similarity = cosine_similarity(R_train.T)

item_similarity_df = pd.DataFrame(
    item_similarity,
    index=R_train.columns,
    columns=R_train.columns
)


In [24]:
def item_based_recommend(user_id, top_n=10):
    user_vector = R_train.loc[user_id]
    scores = item_similarity_df.dot(user_vector)
    scores = scores[user_vector == 0]  # exclude seen items
    return scores.sort_values(ascending=False).head(top_n).index


In [25]:
def precision_at_k(model_func, R_train, R_test, k=10):
    scores = []

    for user in R_train.index:
        if R_test.loc[user].sum() == 0:
            continue

        recs = model_func(user, k)
        relevant = set(R_test.loc[user][R_test.loc[user] > 0].index)
        hit = len(set(recs) & relevant)

        scores.append(hit / k)

    return np.mean(scores)


In [26]:
def hit_rate_at_k(model_func, R_train, R_test, k=10):
    hits = []

    for user in R_train.index:
        if R_test.loc[user].sum() == 0:
            continue

        recs = model_func(user, k)
        relevant = set(R_test.loc[user][R_test.loc[user] > 0].index)
        hits.append(1 if len(set(recs) & relevant) > 0 else 0)

    return np.mean(hits)


In [27]:
def recall_at_k(model_func, R_train, R_test, k=10):
    scores = []

    for user in R_train.index:
        if R_test.loc[user].sum() == 0:
            continue

        recs = model_func(user, k)
        relevant = set(R_test.loc[user][R_test.loc[user] > 0].index)
        hit = len(set(recs) & relevant)

        scores.append(hit / len(relevant))

    return np.mean(scores)


In [28]:
p_b = precision_at_k(popularity_recommend, R_train, R_test)
r_b = recall_at_k(popularity_recommend, R_train, R_test)
hr_b = hit_rate_at_k(popularity_recommend, R_train, R_test)

print("\nBaseline (Popularity)")
print("Precision@10:", round(p_b,4))
print("Recall@10:", round(r_b,4))
print("Hit Rate@10:", round(hr_b,4))



Baseline (Popularity)
Precision@10: 0.1416
Recall@10: 0.0783
Hit Rate@10: 0.5905


In [29]:
def show_recommendation(user_id, top_n=5):
    movie_ids = item_based_recommend(user_id, top_n)
    return movies[movies['movieId'].isin(movie_ids)][['movieId', 'title']]


## Limitasi Sistem:
Sistem rekomendasi ini menggunakan collaborative filtering sehingga belum mampu menangani masalah cold-start pada user dan item baru. Evaluasi dilakukan berdasarkan data historis, sehingga belum mengukur kepuasan pengguna secara langsung.

## Kesimpulan
Berdasarkan hasil evaluasi, model Item-Based Collaborative Filtering menunjukkan performa yang lebih baik dibandingkan baseline popularity-based, terutama pada metrik Precision@10 dan Recall@10. Hal ini menunjukkan bahwa model mampu memberikan rekomendasi yang lebih relevan secara personal kepada pengguna aktif.