<i>Copyright (c) Recommenders contributors.</i>

<i>Licensed under the MIT License.</i>

# SASRec 

Self-Attentive Sequential Recommendation (SASRec) [1], is a sequential recommendation system model that uses self-attention mechanisms to capture the sequential patterns in user-item interactions. It is designed to predict the next item a user is likely to interact with based on their previous interactions.

In [1]:
import sys
import torch
import pandas as pd

from recommenders.datasets import movielens 
from recommenders.models.unirec.data.dataset.movielens_utils import merge_category
from recommenders.models.unirec.model.sequential.sasrec import SASRec


print(f"System version: {sys.version}")
print(f"Pandas version: {pd.__version__}")
print(f"PyTorch version: {torch.__version__}")

System version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0]
Pandas version: 2.2.2
PyTorch version: 2.3.1+cu121


In [2]:
df = movielens.load_pandas_df(size='100k', header=['userId', 'movieId', 'rating', 'timestamp'], genres_col="genre", local_cache_path=".")
df.head(5)

Unnamed: 0,userId,movieId,rating,timestamp,genre
0,196,242,3.0,881250949,Comedy
1,186,302,3.0,891717742,Crime|Film-Noir|Mystery|Thriller
2,22,377,1.0,878887116,Children's|Comedy
3,244,51,2.0,880606923,Drama|Romance|War|Western
4,166,346,1.0,886397596,Crime|Drama


In [3]:
cate_df = df[["movieId", "genre"]].drop_duplicates()
print(cate_df.shape)
cate_df.head(5)

(1682, 2)


Unnamed: 0,movieId,genre
0,242,Comedy
1,302,Crime|Film-Noir|Mystery|Thriller
2,377,Children's|Comedy
3,51,Drama|Romance|War|Western
4,346,Crime|Drama


In [4]:
# Extract all unique genres from the data
all_genres = set(genre for genre_string in cate_df['genre'] for genre in genre_string.split('|'))

# Create a mapping from genre to ID (1-based index)
genre_to_id = {genre: idx + 1 for idx, genre in enumerate(all_genres)}

# Map genres to IDs using the dynamic mapping
cate_df['cateId'] = cate_df['genre'].apply(lambda x: [genre_to_id[genre] for genre in x.split('|') if genre in genre_to_id])

print("Genre to ID Mapping:", {genre: id for genre, id in genre_to_id.items()})
print("Number of unique genres:", len(all_genres))
cate_df.drop(columns=["genre"], inplace=True)
cate_df.head(5)

Genre to ID Mapping: {'Mystery': 1, 'Animation': 2, 'Adventure': 3, 'War': 4, 'Western': 5, 'Musical': 6, 'Comedy': 7, 'Drama': 8, 'Fantasy': 9, 'Crime': 10, 'Sci-Fi': 11, 'Documentary': 12, 'unknown': 13, "Children's": 14, 'Action': 15, 'Thriller': 16, 'Romance': 17, 'Horror': 18, 'Film-Noir': 19}
Number of unique genres: 19


Unnamed: 0,movieId,cateId
0,242,[7]
1,302,"[10, 19, 1, 16]"
2,377,"[14, 7]"
3,51,"[8, 17, 4, 5]"
4,346,"[10, 8]"


In [5]:
df.drop(columns=["genre"], inplace=True)
rating_df = pd.merge(df, cate_df, how='inner', on=['movieId'])

# Merge categories containing a small number of items (lower than min_item_in_cate) into one category, and get the new mappings
cate2idx, item2cate, num_cates = merge_category(rating_df, min_item_in_cate=50)

print("New genre to ID Mapping:", {genre: id for genre, id in cate2idx.items()})
print(item2cate)
print("Number of unique genres:", num_cates)
rating_df.head()

get cate2items: 100000it [00:00, 1409381.08it/s]
get item2cate: 100000it [00:00, 739055.75it/s]

New genre to ID Mapping: {7: 1, 10: 2, 1: 3, 16: 4, 14: 5, 8: 6, 17: 7, 4: 8, 11: 9, 15: 10, 3: 11, 6: 12, 18: 13, 19: 14, 5: 14, 12: 14, 2: 14, 9: 14, 13: 14}
{242: [1], 302: [2, 14, 3, 4], 377: [5, 1], 51: [6, 7, 8, 14], 346: [2, 6], 474: [9, 8], 265: [10, 4], 465: [11, 5, 7], 451: [1, 12, 7], 86: [6], 257: [10, 11, 1, 9], 1014: [1], 222: [10, 11, 9], 40: [1], 29: [10, 11, 1, 2], 785: [1, 7], 387: [6], 274: [1, 7], 1042: [3, 4], 1184: [14], 392: [6], 486: [1, 7], 144: [10, 4], 118: [10, 11, 4], 1: [14, 5, 1], 546: [10, 4], 95: [14, 5, 1, 12], 768: [11, 5], 277: [6], 234: [10, 13], 246: [6, 7], 98: [6, 4], 193: [6], 88: [1, 7], 194: [1, 2], 1081: [2], 603: [3, 4], 796: [1, 7], 32: [14], 16: [1, 7], 304: [11, 5], 979: [6, 4], 564: [1, 13], 327: [2, 6, 3], 201: [10, 11, 1, 13], 1137: [6, 7], 241: [10, 7, 8], 4: [10, 1, 6], 332: [2, 6, 4], 100: [2, 6, 4], 432: [14, 5, 12], 322: [3, 4], 181: [10, 11, 7, 9, 8], 196: [6], 679: [10, 11], 384: [1], 143: [12], 423: [5, 6, 14, 9], 515: [10, 6, 




Unnamed: 0,userId,movieId,rating,timestamp,cateId
0,196,242,3.0,881250949,[7]
1,186,302,3.0,891717742,"[10, 19, 1, 16]"
2,22,377,1.0,878887116,"[14, 7]"
3,244,51,2.0,880606923,"[8, 17, 4, 5]"
4,166,346,1.0,886397596,"[10, 8]"


## Reference

\[1\] Wang-Cheng Kang, and Julian McAuley, *Self-Attentive Sequential Recommendation*, arXiv preprint arXiv:1808.09781, 2018. <br>

\[2\] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, *Attention is all you need*, in Advances in Neural Information Processing Systems, 5998–6008, 2017. <br>