<font color='tomato'><font color="#CC3D3D"><p>
# CEASE (EASE with Side Information)

- **ADD-EASE**
$$
S^* = \underset{S_x}{\arg\min} \left( \lVert \mathbf{W}_x \circ (\mathbf{X} - \mathbf{X}S_x) \rVert_F^2 + \lambda_x \lVert S_x \rVert_1 \right)
$$
$$
+ (1 - \alpha) \underset{S_r}{\arg\min} \left( \lVert \mathbf{W}_r \circ (\mathbf{T} - \mathbf{T}S_r) \rVert_F^2 + \lambda_r \lVert S_r \rVert_1 \right)
$$
$$
\text{subject to } \text{diag}(S_x) = \text{diag}(S_r) = 0.
$$
- **CEASE**
$$
S^* = \underset{S}{\arg\min} \lVert \sqrt{\mathbf{W}} \circ (\mathbf{X}' - \mathbf{X}'S) \rVert_F^2 + \lambda \lVert S \rVert_F^2,
$$
$$
\text{subject to } \text{diag}(S) = 0, \text{ where } \mathbf{X}' = \begin{bmatrix} \mathbf{X} \\ \mathbf{T} \end{bmatrix}
$$

이 튜토리얼은 아래 논문에서 제시한 모델을 Cornac 버전으로 수정한 것임.
- "Closed-Form Models for Collaborative Filtering with Side-Information", Jeunen et al., RecSys2020.
- https://github.com/olivierjeunen/ease-side-info-recsys-2020    

또한, item feature 뿐만아니라 user feature도 side information으로 사용할 수 있게 모델을 확장하였음.

### Setup

In [1]:
import pandas as pd
import numpy as np

# MS recommenders API 
import sys
sys.path.append('msr/')  # 본인이 msr.zip 압축을 푼 위치를 확인(셀에서 pwd 명령어 실행) 후 변경해야 함. 
                                                # 윈도우에서는 폴더 구분자를 // 또는 \\로 해야 함.  
from msr.cornac_utils import predict_ranking
from msr.python_splitters import python_stratified_split

# Cornac API 
import cornac
print(f"Cornac version: {cornac.__version__}")
from cornac.eval_methods import BaseMethod, RatioSplit, StratifiedSplit, CrossValidation
from cornac.metrics import Precision, Recall, NDCG, AUC, MAP
from cornac.data import FeatureModality
from cornac.models import EASE

# Custom models
from recom_cease import CEASE

FM model is only supported on Linux.
Windows executable can be found at http://www.libfm.org.
Cornac version: 1.17


In [2]:
# Data column definition
DEFAULT_USER_COL = 'resume_seq'
DEFAULT_ITEM_COL = 'recruitment_seq'
DEFAULT_RATING_COL = 'rating'
DEFAULT_PREDICTION_COL = 'prediction'

# Top k items to recommend
TOP_K = 15

# Random seed, Verbose, etc.
SEED = 202311
VERBOSE = True

### Preprocessing

In [3]:
# 데이터 로딩
data = pd.read_csv('data/apply_train.csv')
data[DEFAULT_RATING_COL] = 1  # Cornac에서 지정한 데이터형식(UIR: User, Item, Rating)에 따라

# 데이터 분할
train, test = python_stratified_split(
    data, 
    filter_by="user", 
    ratio=0.7,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    seed=SEED
)

In [4]:
# (Recsys_feature_engineering.ipynb를 실행하여) 전처리된 Side information 가져오기
user_features, item_features = pd.read_pickle('features.pkl')

# 학습 rating matrix에 속하는 item만을 추출하고 train과 ids 순서를 맞추기 위해 아래와 같이 필터링
train_user_features = user_features[train[DEFAULT_USER_COL].unique()]
train_item_features = item_features[train[DEFAULT_ITEM_COL].unique()]

### Modeling

In [5]:
params = {
    'lamb': 60,
    'posB': True,
}

ease = EASE(**params, seed=SEED, verbose=VERBOSE)

In [6]:
params = {
    'name': "CEASE-user",
    'feature': "user",
    'lamb': 60,
    'alpha': 0.5,
    'extend': "collective"
}

cease_user = CEASE(**params, seed=SEED, verbose=VERBOSE)

In [7]:
params = {
    'name': "ADD-EASE-item",
    'feature': "item",
    'lamb': 60,
    'alpha': 0.5,
    'extend': "additive", # or "collective"
}

add_ease_item = CEASE(**params, seed=SEED, verbose=VERBOSE)

In [8]:
params = {
    'name': "CEASE-item",
    'feature': "item",
    'lamb': 60,
    'alpha': 0.5,
    'extend': "collective", # or "collective"
}

cease_item = CEASE(**params, seed=SEED, verbose=VERBOSE)

In [9]:
# Cornac에서 side-info를 모델에 전달하기 위한 수단인 FeatureModality 생성
user_feature_modality = FeatureModality(features=train_user_features, ids=None, normalize=True)
item_feature_modality = FeatureModality(features=train_item_features, ids=None, normalize=False)

### Evaluation

In [None]:
# 평가방법 설정
eval_method = BaseMethod.from_splits(
    train_data=np.array(train), 
    test_data=np.array(test), 
    exclude_unknowns=True,  # Unknown users and items will be ignored.
    user_feature=user_feature_modality,
    item_feature=item_feature_modality,
    verbose=True
)

# 평가척도 설정
metrics = [Recall(k=TOP_K), NDCG(k=TOP_K)]

# 실험 수행
cornac.Experiment(
    eval_method=eval_method,
    models=[ease, cease_user, add_ease_item,cease_item], ##ease, cease_user, add_ease_item,
    metrics=metrics,
).run()

### Deployment

In [34]:
# 전체 데이터(rating/side-info)로 다시 학습
# data.Dataset.from_uir()이 FeatureModality를 지원하지 않아 .fit()의 파라미터로 item feature를 전달
full_data = cornac.data.Dataset.from_uir(data.itertuples(index=False), seed=SEED)
full_user_features = user_features[data[DEFAULT_USER_COL].unique()]
full_item_features = item_features[data[DEFAULT_ITEM_COL].unique()]

model = CEASE(**params, verbose=VERBOSE, seed=SEED)  
model.fit(full_data, user_features=full_user_features, item_features=full_item_features)

# 모든 item에 대한 예측값 생성
all_pred = predict_ranking(
    model, data, 
    usercol=DEFAULT_USER_COL, itemcol=DEFAULT_ITEM_COL, 
    remove_seen=True
)

# Top-K item 생성
top_k = (
    all_pred
    .groupby(DEFAULT_USER_COL)
    .apply(lambda x: x.nlargest(TOP_K, DEFAULT_PREDICTION_COL))
    .reset_index(drop=True)
   # .drop(DEFAULT_PREDICTION_COL, axis=1)
    .sort_values(by=DEFAULT_USER_COL)
)

# submission 화일 저장
t = pd.Timestamp.now()
fname = f"submit_{model.name}_{t.month:02}{t.day:02}{t.hour:02}{t.minute:02}.csv"
top_k.to_csv(fname, index=False)

  0%|          | 0/8482 [00:00<?, ?it/s]

<font color='tomato'><font color="#CC3D3D"><p>
# End