<font color='tomato'><font color="#CC3D3D"><p>
# A Tutorial on Cornac version of ConvNCF

- *"Outer Product-based Neural Collaborative Filtering", X. He, et al., IJCAI 2018.*
- 원 논문에서는 BPR(Bayesian Personalized Ranking)을 사용하여 학습했으나, 코드의 복잡도를 고려하여 이 버전에서는 제외함. 
- 임베딩층, 컨볼루션층, 출력층 각각에 서로 다른 가중치 규제를 적용했으나, 이 버전에서는 적용하지 않음.

<img align='left' src='ConvNcf.png' width=800/>

### Setup

In [None]:
import pandas as pd
import numpy as np

from msr.cornac_utils import predict_ranking
from msr.python_splitters import python_stratified_split

# Cornac API 
import cornac
print(f"Cornac version: {cornac.__version__}")
from cornac.eval_methods import BaseMethod, RatioSplit, StratifiedSplit, CrossValidation
from cornac.metrics import Precision, Recall, NDCG, AUC, MAP

# Custom models
from recom_convncf import ConvNCF

In [None]:
# Data column definition
DEFAULT_USER_COL = 'resume_seq'
DEFAULT_ITEM_COL = 'recruitment_seq'
DEFAULT_RATING_COL = 'rating'
DEFAULT_PREDICTION_COL = 'prediction'

# Top k items to recommend
TOP_K = 5

# Random seed, Verbose, etc.
SEED = 202311
VERBOSE = True

### Preprocessing

In [None]:
# 데이터 로딩
data = pd.read_csv('apply_train.csv')
data[DEFAULT_RATING_COL] = 1  # Cornac에서 지정한 데이터형식(UIR: User, Item, Rating)에 따라

# 데이터 분할
train, test = python_stratified_split(
    data, 
    filter_by="user", 
    ratio=0.7,
    col_user=DEFAULT_USER_COL, col_item=DEFAULT_ITEM_COL,
    seed=SEED
)

### Modeling

In [None]:
params = {
    'num_factors': 64,
    'num_channel': 32,
    'act_fn': "relu",
    'n_epochs': 1000,
    'batch_size': 512,
    'num_neg': 4,
    'learner': "adam",     
    'learning_rate': 0.001,
}

model = ConvNCF(**params, seed=SEED, verbose=VERBOSE)

### Evaluation

In [None]:
# 평가방법 설정
eval_method = BaseMethod.from_splits(
    train_data=np.array(train), 
    test_data=np.array(test), 
    exclude_unknowns=True,  # Unknown users and items will be ignored.
    verbose=True
)

# 평가척도 설정
metrics = [Recall(k=TOP_K), NDCG(k=TOP_K)]

# 실험 수행
cornac.Experiment(
    eval_method=eval_method,
    models=[model],
    metrics=metrics,
).run()

### Deployment

In [None]:
# 전체 데이터로 다시 학습
full_data = cornac.data.Dataset.from_uir(data.itertuples(index=False), seed=SEED)
model = ConvNCF(**params, verbose=VERBOSE, seed=SEED)  
model.fit(full_data)

# 모든 item에 대한 예측값 생성
all_pred = predict_ranking(
    model, data, 
    usercol=DEFAULT_USER_COL, itemcol=DEFAULT_ITEM_COL, 
    remove_seen=True
)

# Top-K item 생성
top_k = (
    all_pred
    .groupby(DEFAULT_USER_COL)
    .apply(lambda x: x.nlargest(TOP_K, DEFAULT_PREDICTION_COL))
    .reset_index(drop=True)
    .drop(DEFAULT_PREDICTION_COL, axis=1)
    .sort_values(by=DEFAULT_USER_COL)
)

# submission 화일 저장
#t = pd.Timestamp.now()
#fname = f"submit_{model.name}_{t.month:02}{t.day:02}{t.hour:02}{t.minute:02}.csv"
#top_k.to_csv(fname, index=False)

<font color='tomato'><font color="#CC3D3D"><p>
# End