#**Merchant Recommendation System**

This project develops a recommender system for merchant, focusing on personalized merchant suggestions for users.

The dataset used in this project is “Digital Wallet Transaction”. This dataset simulates transactions from a digital wallet platform similar to popular services like PayTm in India or Khalti in Nepal. It contains 5000 synthetic records of various financial transactions across multiple categories, providing a rich source for analysis of digital payment behaviors and trends.

####Load and Preprocess the Data

In [None]:
import pandas as pd
import numpy as np

# load dataset
df = pd.read_csv("digital_wallet_transactions.csv")

# mengambil data dengan status transaksi sukses
df = df[df['transaction_status'] == "Successful"]
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 16 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   idx                 5000 non-null   int64  
 1   transaction_id      5000 non-null   object 
 2   user_id             5000 non-null   object 
 3   transaction_date    5000 non-null   object 
 4   product_category    5000 non-null   object 
 5   product_name        5000 non-null   object 
 6   merchant_name       5000 non-null   object 
 7   product_amount      5000 non-null   float64
 8   transaction_fee     5000 non-null   float64
 9   cashback            5000 non-null   float64
 10  loyalty_points      5000 non-null   int64  
 11  payment_method      5000 non-null   object 
 12  transaction_status  5000 non-null   object 
 13  merchant_id         5000 non-null   object 
 14  device_type         5000 non-null   object 
 15  location            5000 non-null   object 
dtypes: flo

Unnamed: 0,idx,transaction_id,user_id,transaction_date,product_category,product_name,merchant_name,product_amount,transaction_fee,cashback,loyalty_points,payment_method,transaction_status,merchant_id,device_type,location
0,1,4dac3ea3-6492-46ec-80b8-dc45c3ad0b14,USER_05159,2023-08-19 03:32,Rent Payment,2BHK Flat Deposit,Airbnb,1525.39,36.69,19.19,186,Debit Card,Successful,MERCH_0083,iOS,Urban
1,2,a903ed9f-eb84-47e7-b8aa-fd1786c919cf,USER_07204,2023-08-19 04:37,Gas Bill,Commercial Gas Connection,Adani Gas,1495.4,28.19,89.99,182,UPI,Successful,MERCH_0163,iOS,Urban
2,3,2a393013-733c-4add-9f09-bed1eeb33676,USER_00903,2023-08-19 05:52,Bus Ticket,Semi-Sleeper,MakeMyTrip Bus,1267.71,11.36,95.7,994,UPI,Successful,MERCH_0320,iOS,Urban
3,4,9a07ad19-4673-4794-9cd2-9b139f39c715,USER_01769,2023-08-19 06:35,Internet Bill,4G Unlimited Plan,Airtel Broadband,9202.63,6.41,82.24,409,Debit Card,Successful,MERCH_0194,Android,Urban
4,5,76418260-c985-4011-979d-0914604d0d68,USER_03544,2023-08-19 06:36,Loan Repayment,Home Loan EMI,Axis Bank,3100.58,41.15,40.47,837,Debit Card,Successful,MERCH_0504,Android,Urban


**Interpretations:**

The filtering process successfully reduced the dataset to include only completed transactions. This ensures that subsequent analysis—such as evaluating spending behavior, merchant performance, or cashback distribution—focuses exclusively on valid and finalized payments. The reduction from 5,000 to 4,755 entries highlights that a small portion of transactions did not succeed.

###**Collaborative Filtering: Matrix Factorization With SVD**


By leveraging transaction histories, the system applies **collaborative filtering** with **matrix factorization (SVD)** to uncover hidden patterns in user–merchant interactions. The goal is to predict user preferences and recommend new merchants that align with individual spending behavior, ultimately enhancing customer engagement and improving the overall digital wallet experience.

In [None]:
# user vs merchant interaction
user_item_matrix = df.groupby(['user_id', 'merchant_id'])['product_amount'].sum().reset_index()

# Normalization into a Rating Scale
user_item_matrix['rating'] = (user_item_matrix['product_amount'] - user_item_matrix['product_amount'].min()) / \
                             (user_item_matrix['product_amount'].max() - user_item_matrix['product_amount'].min())

user_item_matrix.head()

Unnamed: 0,user_id,merchant_id,product_amount,rating
0,USER_00001,MERCH_0917,5810.65,0.580819
1,USER_00002,MERCH_0909,1785.23,0.177748
2,USER_00019,MERCH_0081,1535.44,0.152736
3,USER_00019,MERCH_0299,3883.54,0.387855
4,USER_00020,MERCH_0797,3619.0,0.361366


**Interpretation:**

The process converts raw transaction data into a structured format where user–merchant interactions are represented as ratings. These normalized values can then be used to train recommender algorithms, enabling merchant recommendations.

In [None]:
!pip uninstall -y numpy #scikit-surprise
!pip install numpy==1.26.4
!pip install surprise

In [None]:
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

# prepare data for surprise
reader = Reader(rating_scale=(0, 1))
data = Dataset.load_from_df(user_item_matrix[['user_id', 'merchant_id', 'rating']], reader)

# split data
trainset, testset = train_test_split(data, test_size=0.2)

# model SVD (matrix factorization)
algo = SVD()
algo.fit(trainset)

# evaluate
predictions = algo.test(testset)

print("RMSE:", accuracy.rmse(predictions))

RMSE: 0.2950
RMSE: 0.2950097006475385


**Conclusion:**

The SVD-based recommender system is able to generate predictions with reasonable accuracy, achieving an RMSE of about 0.3014 on normalized ratings. This provides a solid foundation.

In [None]:
# rekomendasi function
def recommend_merchants(user_id, n=5):
    # semua merchant unik
    all_merchants = df['merchant_id'].unique()

    # merchant yang sudah pernah dipakai user
    used_merchants = user_item_matrix[user_item_matrix['user_id'] == user_id]['merchant_id'].unique()

    # kandidat merchant baru
    candidates = [m for m in all_merchants if m not in used_merchants]

    # prediksi rating
    predictions = [(m, algo.predict(user_id, m).est) for m in candidates]

    # sort by rating
    recommendations = sorted(predictions, key=lambda x: x[1], reverse=True)[:n]
    return recommendations

# contoh rekomendasi untuk user random
print(recommend_merchants("user_12345", n=5))

[('MERCH_0390', 0.6435038545082995), ('MERCH_0791', 0.6274529116698829), ('MERCH_0759', 0.6136976884700329), ('MERCH_0346', 0.6129566316077666), ('MERCH_0599', 0.6054455012053448)]


**Recommendation Result for user_12345**:

The recommender system suggests the following top 5 merchants that the user has not interacted with before, ranked by predicted preference:

MERCH_0390 → predicted rating 0.6435

MERCH_0791 → predicted rating 0.6274

MERCH_0759 → predicted rating 0.6136

MERCH_0346 → predicted rating 0.6129

MERCH_0599 → predicted rating 0.6054

**Interpretation:**

*  The predicted ratings are on a scale from 0 to 1 (since transaction amounts were normalized).
*   A higher score means the model believes the user is more likely to engage with that merchant.
*   For user_12345, MERCH_0390 is expected to be the most relevant recommendation, followed closely by MERCH_0759 and MERCH_0791.

###**Content Based Recommender System**

The script implements Recommender System with Content-Based Filtering using cosine similarity.

In [None]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler

# load data
df = pd.read_csv("digital_wallet_transactions.csv")

# hanya transaksi sukses
df = df[df['transaction_status'] == "Successful"]

# buat user-merchant matrix (nilai: jumlah transaksi atau total amount)
user_merchant = df.groupby(['user_id', 'merchant_id'])['product_amount'].sum().unstack().fillna(0)

# scaling (biar range sama)
scaler = MinMaxScaler()
user_merchant_scaled = scaler.fit_transform(user_merchant)

# cosine similarity antar merchant
merchant_similarity = cosine_similarity(user_merchant_scaled.T)  # transpose supaya merchant jadi axis
merchant_similarity_df = pd.DataFrame(merchant_similarity,
                                      index=user_merchant.columns,
                                      columns=user_merchant.columns)


In [None]:
def recommend_similar_merchants(merchant_id, n=5):
    if merchant_id not in merchant_similarity_df.index:
        return f"Merchant {merchant_id} tidak ada di data"

    # ambil skor similarity
    sim_scores = merchant_similarity_df[merchant_id].sort_values(ascending=False)

    # buang diri sendiri
    sim_scores = sim_scores.drop(merchant_id)

    # top-n recommendation
    return sim_scores.head(n)

# contoh: rekomendasi merchant mirip merchant_id tertentu
print(recommend_similar_merchants("MERCH_0083", n=5))


merchant_id
MERCH_0870    0.212613
MERCH_0336    0.111163
MERCH_0434    0.089629
MERCH_0448    0.011195
MERCH_0300    0.006121
Name: MERCH_0083, dtype: float64


**User-Based approach**

In [None]:
def recommend_merchants_for_user(user_id, n=5):
    if user_id not in user_merchant.index:
        return f"User {user_id} tidak ada di data"

    # merchant yang pernah digunakan user
    used_merchants = user_merchant.loc[user_id]
    used_merchants = used_merchants[used_merchants > 0].index.tolist()

    # kumpulkan rekomendasi berdasarkan merchant similarity
    rec_scores = {}
    for m in used_merchants:
        similar_merchants = merchant_similarity_df[m].drop(m)
        for merchant, score in similar_merchants.items():
            if merchant not in used_merchants:  # hanya merchant baru
                rec_scores[merchant] = rec_scores.get(merchant, 0) + score

    # urutkan berdasarkan skor
    ranked = sorted(rec_scores.items(), key=lambda x: x[1], reverse=True)
    return ranked[:n]

# contoh: rekomendasi merchant untuk user tertentu
print(recommend_merchants_for_user("USER_07771", n=5))


[('MERCH_0930', 0.2838250160225582), ('MERCH_0114', 0.2635966203667275), ('MERCH_0207', 0.24900870918633022), ('MERCH_0471', 0.21049762267185526), ('MERCH_0561', 0.1931274488170685)]


In [None]:
pivot_table = user_item_matrix.pivot(index='user_id', columns='merchant_id', values='rating').fillna(0)

item_similarity = cosine_similarity(pivot_table.T)
item_similarity_df = pd.DataFrame(item_similarity,
                                  index=pivot_table.columns,
                                  columns=pivot_table.columns)

def recommend_merchants_for_user(user_id, n=5):
    if user_id not in pivot_table.index:
        return f"User {user_id} tidak ditemukan."

    # rating user dalam bentuk vector
    user_ratings = pivot_table.loc[user_id].values.reshape(1, -1)

    # hitung skor rekomendasi = similarity antar item * rating user
    scores = np.dot(user_ratings, item_similarity_df.values)

    # jadikan series
    scores_series = pd.Series(scores.flatten(), index=item_similarity_df.columns)

    # buang merchant yang sudah pernah dipakai user
    purchased = pivot_table.loc[user_id][pivot_table.loc[user_id] > 0].index
    scores_series = scores_series.drop(purchased, errors='ignore')

    # ambil top-N merchant
    return scores_series.sort_values(ascending=False).head(n)

#Contoh Rekomendasi
print("Rekomendasi untuk USER_07771:")
print(recommend_merchants_for_user("USER_07771", n=5))

Rekomendasi untuk USER_07771:
merchant_id
MERCH_0207    0.225570
MERCH_0930    0.174682
MERCH_0561    0.118861
MERCH_0114    0.117930
MERCH_0471    0.094196
dtype: float64


In [None]:
# cosine similarity antar user
user_similarity = cosine_similarity(user_merchant_scaled)
user_similarity_df = pd.DataFrame(user_similarity,
                                  index=user_merchant.index,
                                  columns=user_merchant.index)

def recommend_merchants_user_based(user_id, n=5):
    if user_id not in user_similarity_df.index:
        return f"User {user_id} tidak ada di data"

    # ambil similarity user
    sim_users = user_similarity_df[user_id].sort_values(ascending=False)

    # buang diri sendiri
    sim_users = sim_users.drop(user_id)

    # top user paling mirip
    top_users = sim_users.head(5).index

    # ambil merchant yang digunakan oleh top users
    rec_scores = {}
    for u in top_users:
        merchants = user_merchant.loc[u]
        for m, val in merchants.items():
            if val > 0 and user_merchant.loc[user_id, m] == 0:  # hanya merchant baru
                rec_scores[m] = rec_scores.get(m, 0) + val

    # urutkan berdasarkan skor (total amount dari user mirip)
    ranked = sorted(rec_scores.items(), key=lambda x: x[1], reverse=True)
    return ranked[:n]

# contoh rekomendasi untuk user tertentu
print(recommend_merchants_user_based("USER_07771", n=5))


[]


In [None]:
pivot_table = user_item_matrix.pivot(index='user_id', columns='merchant_id', values='rating').fillna(0)

user_similarity = cosine_similarity(pivot_table)
user_similarity_df = pd.DataFrame(user_similarity,
                                  index=pivot_table.index,
                                  columns=pivot_table.index)

# ------------------------------
# 4. Fungsi Rekomendasi User-Based CF
# ------------------------------
def recommend_merchants_usercf(user_id, n=5):
    if user_id not in pivot_table.index:
        return f"User {user_id} tidak ditemukan."

    # ambil similarity user_id terhadap user lain
    sim_users = user_similarity_df[user_id].drop(user_id).sort_values(ascending=False)

    # ambil top user yang mirip
    top_users = sim_users.head(5).index

    # merchant yang sudah pernah dipakai user target
    purchased = set(pivot_table.loc[user_id][pivot_table.loc[user_id] > 0].index)

    # kumpulkan merchant dari user mirip
    scores = {}
    for u in top_users:
        weight = sim_users[u]
        for merchant, rating in pivot_table.loc[u].items():
            if rating > 0 and merchant not in purchased:
                scores[merchant] = scores.get(merchant, 0) + weight * rating

    # urutkan hasil
    sorted_scores = sorted(scores.items(), key=lambda x: x[1], reverse=True)

    return sorted_scores[:n]

# ------------------------------
# 5. Contoh pemanggilan
# ------------------------------
print("Rekomendasi User-Based CF untuk USER_07771:")
print(recommend_merchants_usercf("USER_09328", n=5))

Rekomendasi User-Based CF untuk USER_07771:
[('MERCH_0784', 0.07245694654628461)]


In [None]:
# -----------------------------
# 1. Top-N user berdasarkan FREKUENSI transaksi
# -----------------------------
top_users_freq = (
    df.groupby("user_id")["transaction_id"]
      .count()
      .reset_index(name="total_transactions")
      .sort_values(by="total_transactions", ascending=False)
)

print("Top 10 User berdasarkan jumlah transaksi:")
print(top_users_freq.head(10))

# -----------------------------
# 2. Top-N user berdasarkan TOTAL NOMINAL transaksi
# -----------------------------
top_users_amount = (
    df.groupby("user_id")["product_amount"]
      .sum()
      .reset_index(name="total_amount")
      .sort_values(by="total_amount", ascending=False)
)

print("\nTop 10 User berdasarkan total nominal transaksi:")
print(top_users_amount.head(10))

Top 10 User berdasarkan jumlah transaksi:
         user_id  total_transactions
3508  USER_09328                   4
3235  USER_08584                   4
2256  USER_05939                   4
2506  USER_06591                   4
1632  USER_04319                   4
2208  USER_05836                   4
1518  USER_03945                   4
360   USER_00930                   4
2707  USER_07098                   4
2007  USER_05280                   4

Top 10 User berdasarkan total nominal transaksi:
         user_id  total_amount
3366  USER_08946      28799.29
2208  USER_05836      25189.19
1518  USER_03945      25063.95
3235  USER_08584      23842.59
1496  USER_03888      23800.76
368   USER_00949      23748.91
482   USER_01223      23524.42
2007  USER_05280      23482.35
1791  USER_04724      23471.34
3141  USER_08295      22091.62
