# 推荐系统demo
本notebook是一个推荐系统的简单demo，可以用于各种推荐模型的简单测试。

同目录下的utils.py包含一些数据处理和模型训练流程的通用代码，dataset文件夹下面包含三个数据集
- MovieLens数据集，仅包含user和item的id以及评分
- criteo数据集，包含数十个离散型特征和连续型特征
- Amazon数据集，包含用户的历史行为特征

推荐使用conda安装所需的包，包括：
- pytorch
- numpy
- pandas
- sklearn
- matplotlib


In [1]:
import torch
import torch.nn.functional as F

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Training on [{}].'.format(device))

Training on [cpu].


下面的代码块准备了movielens数据集，可以更换为另外两种数据集。

In [3]:
from utils import create_dataset
dataset = create_dataset('movielens', device=device)
data = dataset.train_valid_test_split()
field_dims, (train_X, train_y), (valid_X, valid_y), (test_X, test_y) = data

下面的代码块定义了一个简单的模型，将embedding合并之后通过一层全连接层和sigmoid激活函数，之后直接输出。

依照论文实现其他模型时在这里修改模型结构。

In [18]:
class model(torch.nn.Module):
    def __init__(self, field_dims, embed_dim = 4):
        super(model, self).__init__()
        print(field_dims)
        self.embedding_list = torch.nn.ModuleList([torch.nn.Embedding(dim, embed_dim) for dim in field_dims])
        print(self.embedding_list)
        self.linear = torch.nn.Linear(len(field_dims)*embed_dim, 1)
    
    def forward(self, X):
        all_emb = torch.cat([embedding(X[:, i]) for i, embedding in enumerate(self.embedding_list)], dim = 1)
        logit = self.linear(all_emb)
        return F.sigmoid(logit)


In [20]:
%%time

from utils import Trainer
EMBEDDING_DIM = 8
LEARNING_RATE = 1e-4
REGULARIZATION = 1e-6
BATCH_SIZE = 4096
EPOCH = 1000
TRIAL = 100

mm = model(field_dims, EMBEDDING_DIM).to(device)
optimizer = torch.optim.Adam(mm.parameters(), lr=LEARNING_RATE, weight_decay=REGULARIZATION)
criterion = torch.nn.BCELoss()
print(train_X.shape)
# print(mm(train_X[0].unsqueeze(0)))
trainer = Trainer(mm, optimizer, criterion, BATCH_SIZE)
trainer.train(train_X, train_y, epoch=EPOCH, trials=TRIAL, valid_X=valid_X, valid_y=valid_y)
test_loss, test_auc = trainer.test(test_X, test_y)
print('test_loss:  {:.5f} | test_auc:  {:.5f}'.format(test_loss, test_auc))


[611, 193610]
ModuleList(
  (0): Embedding(611, 8)
  (1): Embedding(193610, 8)
)
torch.Size([80000, 2])


  2%|▏         | 22/1000 [00:02<02:10,  7.52it/s]


KeyboardInterrupt: 