<a href="https://colab.research.google.com/github/kfirs127/Improving-Sequential-Recommendation-with-Hybrid-Attention-Weighting-Leveraging-LinRec-and-FMLP-Rec/blob/main/FMLP_paper_reproduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Reproduce FMLP-Rec

In [None]:
!git clone https://github.com/RUCAIBox/FMLP-Rec.git

Cloning into 'FMLP-Rec'...
remote: Enumerating objects: 20, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 20 (delta 0), reused 20 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (20/20), 8.41 MiB | 11.16 MiB/s, done.


Running the FMLP-Rec model on the Amazon Beauty dataset (included in their files after all preprocessing).

* Number of hidden layers (learnable filter blocks): 2
* Hidden layer size: 64
* Batch size: 256
* Adam optimizer
* Learning rate: 0.001
* Early stopping implemented if MRR score decreases for more than 10 epochs

In [None]:
!cd /content/FMLP-Rec/ && python main.py --data_name='Beauty'

Namespace(data_dir='./data/', output_dir='output/', data_name='Beauty', do_eval=False, load_model=None, model_name='FMLPRec', hidden_size=64, num_hidden_layers=2, num_attention_heads=2, hidden_act='gelu', attention_probs_dropout_prob=0.5, hidden_dropout_prob=0.5, initializer_range=0.02, max_seq_length=50, no_filters=False, lr=0.001, batch_size=256, epochs=200, no_cuda=False, log_freq=1, full_sort=False, patience=10, seed=42, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, gpu_id='0', variance=5, cuda_condition=True, data_file='./data/Beauty.txt', sample_file='./data/Beauty_sample.txt', item_size=12102, log_file='output/FMLPRec-Beauty-Feb-20-2025_07-59-31.txt')
Total Parameters: 851200
Recommendation EP_train:0: 100% 587/587 [00:16<00:00, 35.64it/s]
{'epoch': 0, 'rec_loss': '1.2398'}
Recommendation EP_test:0: 100% 88/88 [00:02<00:00, 39.30it/s]
{'Epoch': 0, 'HIT@1': '0.0850', 'NDCG@1': '0.0850', 'HIT@5': '0.2497', 'NDCG@5': '0.1682', 'HIT@10': '0.3571', 'NDCG@10': '0.2029', 'MRR': '

Paper Evaluation results (Amazon Beauty dataset):

* HR@1 (HIT@1): 0.2011
* HR@5: 0.4025
* NDCG@5: 0.3070
* HR@10: 0.4998
* NDCG@10: 0.3385
* MRR: 0.3051

Reproduction Evaluation results:
* HR@1: 0.1987
* HR@5: 0.4019
* NDCG@5: 0.3055
* HR@10: 0.4956
* NDCG@10: 0.3358
* MRR: 0.3031


### Run FMLP-Rec on MovieLens Dataset

#### Preprocess ML-1M Dataset

In [None]:
!git clone https://github.com/RUCAIBox/FMLP-Rec.git

Cloning into 'FMLP-Rec'...
remote: Enumerating objects: 20, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 20 (delta 0), reused 20 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (20/20), 8.41 MiB | 15.06 MiB/s, done.


In [None]:
import pandas as pd
import numpy as np
import zipfile
import os
from collections import Counter
import csv
import shutil

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
data_file = '/content/drive/Shareddrives/RecSys/ml-1m.zip'
extract_path = 'ml-1m'
ratings_file = 'ml-1m/ratings.dat'

In [None]:
processed_file = "ML-1M.txt"
sample_file = "ML-1M_sample.txt"
destination_folder = "FMLP-Rec/data/"

In [None]:
def convert_zip_to_df(zip_path, extract_path, ratings_file):
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(extract_path)

    ml1m_path = os.path.join(extract_path, ratings_file)
    df = pd.read_csv(ml1m_path, sep="::", engine="python", names=["user_id", "item_id", "rating", "timestamp"])
    return df

In [None]:
def filter_k_core(df, k=5):
    while True:
        user_counts = df["user_id"].value_counts()
        item_counts = df["item_id"].value_counts()

        df = df[df["user_id"].isin(user_counts[user_counts >= k].index)]
        df = df[df["item_id"].isin(item_counts[item_counts >= k].index)]

        new_user_counts = df["user_id"].value_counts()
        new_item_counts = df["item_id"].value_counts()

        if len(new_user_counts) == len(user_counts) and len(new_item_counts) == len(item_counts):
            break
    return df

In [None]:
def run_preprocess(data_file, extract_path, ratings_file, min_rating=0):

    data = convert_zip_to_df(data_file, extract_path, ratings_file)

    # Keep only interactions where rating > min_rating
    data = data[data["rating"] > min_rating]

    # apply k-core filtering (ensuring each user and item has at least k interactions)
    # data = filter_k_core(data, k=5)

    user_map = {old: new+1 for new, old in enumerate(data["user_id"].unique())}
    item_map = {old: new+1 for new, old in enumerate(data["item_id"].unique())}

    data["user_id"] = data["user_id"].map(user_map)
    data["item_id"] = data["item_id"].map(item_map)

    # group by user_id, sort by timestamp, and aggregate item interactions
    data = data.sort_values(by=["user_id", "timestamp"])
    grouped = data.groupby("user_id")["item_id"].apply(list).to_dict()

    grouped_data = data.groupby("user_id")["item_id"].apply(lambda x: " ".join(map(str, x))).reset_index()
    grouped_data["formatted"] = grouped_data["user_id"].astype(str) + " " + grouped_data["item_id"]
    grouped_data["formatted"].to_csv(f"{processed_file}", index=False, header=False)

    print(f"Processing complete. Saved as {processed_file}")

    # 99 negative samples per user required
    all_items = set(data["item_id"].unique())

    with open(sample_file, "w") as f:
        for user_id, pos_items in grouped.items():
            positive_items = set(pos_items) if isinstance(pos_items, (set, list, pd.Series)) else {pos_items}
            negative_samples = list(all_items - positive_items)
            sampled_negatives = np.random.choice(negative_samples, 99, replace=False) if len(negative_samples) >= 99 else negative_samples
            f.write(f"{user_id} " + " ".join(map(str, sampled_negatives)) + "\n")

    print(f"Sample file '{sample_file}' created successfully.")

In [None]:
run_preprocess(data_file, extract_path, ratings_file)

Processing complete. Saved as ML-1M.txt
Sample file 'ML-1M_sample.txt' created successfully.


In [None]:
os.makedirs(destination_folder, exist_ok=True)
shutil.move(processed_file, os.path.join(destination_folder, processed_file))
shutil.move(sample_file, os.path.join(destination_folder, sample_file))

print(f"Files moved to {destination_folder}")

Files moved to FMLP-Rec/data/


#### Run FMLP on ML-1M Dataset

In [None]:
!sed -i "s/^sequential_data_list = \['Beauty','Sports_and_Outdoors','Toys_and_Games','Yelp'\]/sequential_data_list = \['Beauty','Sports_and_Outdoors','Toys_and_Games','Yelp','ML-1M'\]/" /content/FMLP-Rec/utils.py

In [None]:
!cd /content/FMLP-Rec/ && python main.py --data_name='ML-1M'

Namespace(data_dir='./data/', output_dir='output/', data_name='ML-1M', do_eval=False, load_model=None, model_name='FMLPRec', hidden_size=64, num_hidden_layers=2, num_attention_heads=2, hidden_act='gelu', attention_probs_dropout_prob=0.5, hidden_dropout_prob=0.5, initializer_range=0.02, max_seq_length=50, no_filters=False, lr=0.001, batch_size=256, epochs=200, no_cuda=False, log_freq=1, full_sort=False, patience=10, seed=42, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, gpu_id='0', variance=5, cuda_condition=True, data_file='./data/ML-1M.txt', sample_file='./data/ML-1M_sample.txt', item_size=3707, log_file='output/FMLPRec-ML-1M-Feb-21-2025_12-56-17.txt')
Total Parameters: 313920
Recommendation EP_train:0: 100% 1047/1047 [00:24<00:00, 42.39it/s]
{'epoch': 0, 'rec_loss': '1.0011'}
Recommendation EP_test:0: 100% 24/24 [00:00<00:00, 56.40it/s]
{'Epoch': 0, 'HIT@1': '0.1916', 'NDCG@1': '0.1916', 'HIT@5': '0.4793', 'NDCG@5': '0.3406', 'HIT@10': '0.6445', 'NDCG@10': '0.3941', 'MRR': '0.3

Results for ML-1M Dataset: (5 minimum)

* HR@1: 0.3430
* HR@5: 0.6897
* NDCG@5: 0.5292
* HR@10: 0.7997
* NDCG@10: 0.5650
* MRR: 0.4995


Results for ML-1M Dataset: (no minimum)

* HR@1: 0.3581
* HR@5: 0.6959
* NDCG@5: 0.5381
* HR@10: 0.8015
* NDCG@10: 0.5724
* MRR: 0.5089
