<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Item2Item recommendations with DKN 
The second task is about knowledge-aware item-to-item recommendations. We still use DKN for demonstration. 
The learning framework is illustrated as follows:
<img src="https://recodatasets.blob.core.windows.net/kdd2020/images/Item2item-framework.JPG"  width="500">

In [1]:
import sys
sys.path.append("../../../")
from reco_utils.recommender.deeprec.deeprec_utils import *
from reco_utils.recommender.deeprec.models.dkn_item2item import *
from reco_utils.recommender.deeprec.io.dkn_item2item_iterator import *
import time

import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)

In [2]:

data_path = 'data_folder/my/DKN-training-folder'
yaml_file = './dkn.yaml' #os.path.join(data_path, r'../../../../../../dkn.yaml')
train_file = os.path.join(data_path, r'item2item_train_instances.txt')
valid_file = os.path.join(data_path, r'item2item_valid_instances.txt')
news_feature_file = os.path.join(data_path, r'../paper_feature.txt')
wordEmb_file = os.path.join(data_path, r'word_embedding.npy')
entityEmb_file = os.path.join(data_path, r'entity_embedding.npy')
contextEmb_file = os.path.join(data_path, r'context_embedding.npy')
infer_embedding_file = os.path.join(data_path, r'infer_embedding_item2item.txt')
news_feature_file = os.path.join(data_path,  r'../paper_feature.txt')


In [3]:
epoch = 10
hparams = prepare_hparams(yaml_file,
                          news_feature_file=news_feature_file,
                          wordEmb_file=wordEmb_file,
                          entityEmb_file=entityEmb_file,
                          contextEmb_file=contextEmb_file,
                          epochs=epoch,
                          is_clip_norm=True,
                          max_grad_norm=0.5,
                          his_size=20,
                          MODEL_DIR=os.path.join(data_path, 'save_models'),
                          learning_rate=0.0002,
                          embed_l2=0.0,
                          layer_l2=0.0,
                          batch_size=32,
                          use_entity=True,
                          use_context=True
                          )
print(hparams.values)

<bound method HParams.values of HParams([('DNN_FIELD_NUM', None), ('EARLY_STOP', 100), ('FEATURE_COUNT', None), ('FIELD_COUNT', None), ('L', None), ('MODEL_DIR', 'data_folder/my/DKN-training-folder/save_models'), ('PAIR_NUM', None), ('SUMMARIES_DIR', None), ('T', None), ('activation', ['sigmoid']), ('att_fcn_layer_sizes', None), ('attention_activation', 'relu'), ('attention_dropout', 0.0), ('attention_layer_sizes', 32), ('attention_size', None), ('batch_size', 32), ('cate_embedding_dim', None), ('cate_vocab', None), ('contextEmb_file', 'data_folder/my/DKN-training-folder/context_embedding.npy'), ('cross_activation', 'identity'), ('cross_l1', 0.0), ('cross_l2', 0.0), ('cross_layer_sizes', None), ('cross_layers', None), ('data_format', 'dkn'), ('decay', None), ('dilations', None), ('dim', 32), ('doc_size', 15), ('dropout', [0.0]), ('dtype', 32), ('embed_l1', 0.0), ('embed_l2', 0.0), ('embed_size', None), ('embedding_dropout', 0.3), ('enable_BN', False), ('entityEmb_file', 'data_folder/my

To build an item2item recommendation model based on the Recommender repo, you only need to modify two files: 
1. Data Loader :  dkn_item2item_iterator.py
2. Model : dkn_item2item.py

<img src="https://recodatasets.blob.core.windows.net/kdd2020/images%2Fcode-changed-item2item.JPG" width="700">

In [4]:
input_creator = DKNItem2itemTextIterator
hparams.neg_num=9

A special parameter is `neg_num`. It indicates how many negative instances exist in a group for softmax computation.
Training and validation instances are organized as follows: 

<img src="https://recodatasets.blob.core.windows.net/kdd2020/images/item2item-instances.JPG" width="700">


In [5]:
model = DKNItem2Item(hparams, input_creator)

In [6]:
t01 = time.time()
print(model.run_eval(valid_file))
t02 = time.time()
print((t02 - t01) / 60)

{'group_auc': 0.8428, 'mean_mrr': 0.6828, 'ndcg@2': 0.6328, 'ndcg@4': 0.7062, 'ndcg@6': 0.7361}
0.8355694611867269


In [7]:
model.fit(train_file, valid_file)

at epoch 1
train info: logloss loss:50.493303010424015
eval info: group_auc:0.951, mean_mrr:0.8777, ndcg@2:0.8724, ndcg@4:0.8962, ndcg@6:0.9019
at epoch 1 , train time: 38.5 eval time: 49.2
at epoch 2
train info: logloss loss:47.23231726804394
eval info: group_auc:0.9542, mean_mrr:0.8896, ndcg@2:0.8861, ndcg@4:0.9053, ndcg@6:0.9103
at epoch 2 , train time: 38.1 eval time: 49.8
at epoch 3
train info: logloss loss:46.11532519220992
eval info: group_auc:0.9553, mean_mrr:0.8954, ndcg@2:0.8915, ndcg@4:0.9095, ndcg@6:0.9143
at epoch 3 , train time: 38.1 eval time: 49.5
at epoch 4
train info: logloss loss:45.50106118659405
eval info: group_auc:0.9558, mean_mrr:0.8973, ndcg@2:0.893, ndcg@4:0.9108, ndcg@6:0.9159
at epoch 4 , train time: 38.2 eval time: 49.4
at epoch 5
train info: logloss loss:45.06542354921413
eval info: group_auc:0.9551, mean_mrr:0.8986, ndcg@2:0.8945, ndcg@4:0.9115, ndcg@6:0.9162
at epoch 5 , train time: 38.0 eval time: 49.4
at epoch 6
train info: logloss loss:44.746302176232

<reco_utils.recommender.deeprec.models.dkn_item2item.DKNItem2Item at 0x7f4c08939a20>

In [8]:
model.run_get_embedding(news_feature_file, infer_embedding_file)

<reco_utils.recommender.deeprec.models.dkn_item2item.DKNItem2Item at 0x7f4c08939a20>

Again, we compre with DKN performance between using knowledge entities or without using knowledge entities (DKN(-)):

| Models | Group-AUC | MRR |NDCG@2 | NDCG@4 |
| :------| :------: | :------: | :------: | :------ |
| DKN | 0.9557 | 0.8993 | 0.8951 | 0.9123 |
| DKN(-) | 0.9506 | 0.8817 | 0.8758 | 0.8982 |

