<i>Copyright (c) Recommenders contributors.</i>

<i>Licensed under the MIT License.</i>

# SASRec 

Self-Attentive Sequential Recommendation (SASRec) [1], is a sequential recommendation system model that uses self-attention mechanisms to capture the sequential patterns in user-item interactions. It is designed to predict the next item a user is likely to interact with based on their previous interactions.

In [1]:
import sys
import torch
import pandas as pd

from recommenders.datasets import movielens
from recommenders.models.unirec.data.dataset.movielens_utils import merge_category
from recommenders.models.unirec.model.sequential.sasrec import SASRec


print(f"System version: {sys.version}")
print(f"Pandas version: {pd.__version__}")
print(f"PyTorch version: {torch.__version__}")

System version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
Pandas version: 2.2.2
PyTorch version: 2.3.1+cu121


In [None]:
# top k items to recommend
TOP_K = 10

# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = "100k"

USER_COL="userId",
ITEM_COL="movieId",
RATING_COL="rating",
TIMESTAMP_COL="timestamp",
GENRE_COL="genre"


In [2]:
df = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=[USER_COL, ITEM_COL, RATING_COL, TIMESTAMP_COL],
    genres_col=GENRE_COL,
    local_cache_path=".",
)
df.head(5)

Unnamed: 0,userId,movieId,rating,timestamp,genre
0,196,242,3.0,881250949,Comedy
1,186,302,3.0,891717742,Crime|Film-Noir|Mystery|Thriller
2,22,377,1.0,878887116,Children's|Comedy
3,244,51,2.0,880606923,Drama|Romance|War|Western
4,166,346,1.0,886397596,Crime|Drama


In [3]:
cate_df = df[[ITEM_COL, GENRE_COL]].drop_duplicates()
print(cate_df.shape)
cate_df.head(5)

(1682, 2)


Unnamed: 0,movieId,genre
0,242,Comedy
1,302,Crime|Film-Noir|Mystery|Thriller
2,377,Children's|Comedy
3,51,Drama|Romance|War|Western
4,346,Crime|Drama


In [4]:
# Extract all unique genres from the data
all_genres = set(genre for genre_string in cate_df[GENRE_COL] for genre in genre_string.split("|"))

# Create a mapping from genre to ID (1-based index)
genre_to_id = {genre: idx + 1 for idx, genre in enumerate(all_genres)}

# Map genres to IDs using the dynamic mapping
cate_df["cateId"] = cate_df[GENRE_COL].apply(
    lambda x: [genre_to_id[genre] for genre in x.split("|") if genre in genre_to_id]
)

print("Genre to ID Mapping:", {genre: id for genre, id in genre_to_id.items()})
print("Number of unique genres:", len(all_genres))
cate_df.drop(columns=[GENRE_COL], inplace=True)
cate_df.head(5)

Genre to ID Mapping: {'Mystery': 1, 'Western': 2, "Children's": 3, 'Drama': 4, 'Film-Noir': 5, 'Adventure': 6, 'Thriller': 7, 'Musical': 8, 'Animation': 9, 'Comedy': 10, 'Sci-Fi': 11, 'Crime': 12, 'Romance': 13, 'Fantasy': 14, 'Action': 15, 'unknown': 16, 'Horror': 17, 'Documentary': 18, 'War': 19}
Number of unique genres: 19


Unnamed: 0,movieId,cateId
0,242,[10]
1,302,"[12, 5, 1, 7]"
2,377,"[3, 10]"
3,51,"[4, 13, 19, 2]"
4,346,"[12, 4]"


In [5]:
df.drop(columns=[GENRE_COL], inplace=True)
rating_df = pd.merge(df, cate_df, how="inner", on=[ITEM_COL])

# Merge categories containing a small number of items (lower than min_item_in_cate) into one category, and get the new mappings
cate2idx, item2cate, num_cates = merge_category(rating_df, min_item_in_cate=50)

print("New genre to ID Mapping:", {genre: id for genre, id in cate2idx.items()})
# print(item2cate)
print("Number of unique genres:", num_cates)
rating_df.head()

get cate2items: 100000it [00:00, 1061558.86it/s]
get item2cate: 100000it [00:00, 1576669.69it/s]

New genre to ID Mapping: {10: 1, 12: 2, 1: 3, 7: 4, 3: 5, 4: 6, 13: 7, 19: 8, 11: 9, 15: 10, 6: 11, 8: 12, 17: 13, 5: 14, 2: 14, 18: 14, 9: 14, 14: 14, 16: 14}
Number of unique genres: 14





Unnamed: 0,userId,movieId,rating,timestamp,cateId
0,196,242,3.0,881250949,[10]
1,186,302,3.0,891717742,"[12, 5, 1, 7]"
2,22,377,1.0,878887116,"[3, 10]"
3,244,51,2.0,880606923,"[4, 13, 19, 2]"
4,166,346,1.0,886397596,"[12, 4]"


## Reference

\[1\] Wang-Cheng Kang, and Julian McAuley, *Self-Attentive Sequential Recommendation*, arXiv preprint arXiv:1808.09781, 2018. <br>

\[2\] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, *Attention is all you need*, in Advances in Neural Information Processing Systems, 5998–6008, 2017. <br>