In [None]:
!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.10.0+cu113.html

# 第八课 图深度学习的应用（三）

在前面两节实践课中，我们了解了图深度学习在知识图谱和计算机视觉中的应用。在今天这节课中，我们将学习图深度学习在推荐系统中的应用。

## GNN在推荐系统的应用

在本节中我们主要以LightGCN为例，介绍GNN在推荐系统中的应用。

注：LightGCN来自论文LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation，见链接https://arxiv.org/abs/2002.02126 。

## 1. 推荐系统数据集介绍

### 1.1 初识Movielens

Movielens是推荐系统研究中常用的数据集。从它的名字可以看出，这是一个跟电影有关的数据集，它包含了从IMDB数据库中上用户对电影的评分信息。官网下载地址是https://grouplens.org/datasets/movielens/ 。这里面提供了多个相关数据集，在本次实践课上我们选用小的那个文件(ml-lastest-small.zip)，它仅有1MB。解压后的数据集文件我已经放在了`data/MovieLens/raw/ml-latest-small`文件夹下面：
 ```
|-- Lecture8.ipynb 
|-- data
    |-- MovieLens
        |-- raw
            |-- ml-latest-small
                |-- README.txt
                |-- links.csv
                |-- movies.csv
                |-- ratings.csv
                |-- tags.csv
 ```
 我们需要用的文件有两个：`movies.csv`和`ratings.csv`。

首先我们读取电影信息的表格。

In [None]:
import torch
import pandas as pd
df1 = pd.read_csv('data/MovieLens/raw/ml-latest-small/movies.csv', index_col='movieId')

In [None]:
df1

Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
...,...,...
193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
193585,Flint (2017),Drama
193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


可以看到，第一列是电影名称，第二列是电影的种类。

In [None]:
df1['genres'].str.get_dummies('|') # 用get_dummies()实现one-hot编码

Unnamed: 0_level_0,(no genres listed),Action,Adventure,Animation,Children,Comedy,Crime,Documentary,Drama,Fantasy,Film-Noir,Horror,IMAX,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,0,0,1,1,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0
5,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
193581,0,1,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
193583,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0
193585,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
193587,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


值得注意的是，get_dummies()可以直接实现one-hot编码。例如，movieId为1的电影的种类是冒险类(Adventure)、动画类(Animation)、儿童类(Children)、喜剧类(Comedy)和科幻类(Fantasy)。

In [None]:
# 我们可以把上述编码转换成tensor
genres = df1['genres'].str.get_dummies('|').values 
genres = torch.from_numpy(genres).to(torch.float) 
genres

tensor([[0., 0., 1.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 1., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

In [None]:
# movieID不是连续的，这里我们重新映射一下，使序号变得连续 (后面会用到)并且从零开始
movie_mapping = {idx: i for i, idx in enumerate(df1.index)}
movie_mapping

{1: 0,
 2: 1,
 3: 2,
 4: 3,
 5: 4,
 6: 5,
 7: 6,
 8: 7,
 9: 8,
 10: 9,
 11: 10,
 12: 11,
 13: 12,
 14: 13,
 15: 14,
 16: 15,
 17: 16,
 18: 17,
 19: 18,
 20: 19,
 21: 20,
 22: 21,
 23: 22,
 24: 23,
 25: 24,
 26: 25,
 27: 26,
 28: 27,
 29: 28,
 30: 29,
 31: 30,
 32: 31,
 34: 32,
 36: 33,
 38: 34,
 39: 35,
 40: 36,
 41: 37,
 42: 38,
 43: 39,
 44: 40,
 45: 41,
 46: 42,
 47: 43,
 48: 44,
 49: 45,
 50: 46,
 52: 47,
 53: 48,
 54: 49,
 55: 50,
 57: 51,
 58: 52,
 60: 53,
 61: 54,
 62: 55,
 63: 56,
 64: 57,
 65: 58,
 66: 59,
 68: 60,
 69: 61,
 70: 62,
 71: 63,
 72: 64,
 73: 65,
 74: 66,
 75: 67,
 76: 68,
 77: 69,
 78: 70,
 79: 71,
 80: 72,
 81: 73,
 82: 74,
 83: 75,
 85: 76,
 86: 77,
 87: 78,
 88: 79,
 89: 80,
 92: 81,
 93: 82,
 94: 83,
 95: 84,
 96: 85,
 97: 86,
 99: 87,
 100: 88,
 101: 89,
 102: 90,
 103: 91,
 104: 92,
 105: 93,
 106: 94,
 107: 95,
 108: 96,
 110: 97,
 111: 98,
 112: 99,
 113: 100,
 116: 101,
 117: 102,
 118: 103,
 119: 104,
 121: 105,
 122: 106,
 123: 107,
 125: 108,
 126: 10

下面我们读取用户评分表。

In [None]:
df2 = pd.read_csv('data/MovieLens/raw/ml-latest-small/ratings.csv') 

In [None]:
df2

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
...,...,...,...,...
100831,610,166534,4.0,1493848402
100832,610,168248,5.0,1493850091
100833,610,168250,5.0,1494273047
100834,610,168252,5.0,1493846352


上面的用户-电影表对应了一个交互矩阵，矩阵中的元素表示用户对电影的评分。值得注意的是，我们可以将该矩阵转换成一个二分图，如下图所示。

![](graph.png)

In [None]:
# 在转换成图之前，我们把用户的ID也映射（重新排序）一下，使其连续且从零开始
user_mapping = {idx: i for i, idx in enumerate(df2['userId'].unique())}
# data['user'].num_nodes = len(user_mapping)

In [None]:
num_users = len(user_mapping)
num_movies = len(movie_mapping)
print('用户数量:', num_users, '物品数量:', num_movies)

用户数量: 610 物品数量: 9742


In [None]:
user_src = [user_mapping[idx] for idx in df2['userId']] # 起始节点
movie_dst = [movie_mapping[idx]+num_users for idx in df2['movieId']] # 终止节点
edge_index = torch.tensor([user_src, movie_dst])
rating = torch.from_numpy(df2['rating'].values).to(torch.long)

In [None]:
print('边索引:', edge_index)
print('边的标签:', rating)

边索引: tensor([[    0,     0,     0,  ...,   609,   609,   609],
        [  610,   612,   615,  ..., 10072, 10073, 10113]])
边的权重: tensor([4, 4, 4,  ..., 5, 5, 3])


In [None]:
edge_index.shape, rating.shape

(torch.Size([2, 100836]), torch.Size([100836]))

### 1.2 构造图数据集

接下来我们演示如何构建一个可以被图神经网络使用的Movielens图数据。

首先，我们的标签信息是什么呢？很显然，标签信息就是用户评分`rating`。我们把`rating`作为`edge_index`的标签。

In [None]:
edge_label_index = torch.arange(len(rating)) # 带标签的边索引的序号

有了`edge_index`之后，我们就可以直接用它们来表示这个用户-电影二分图了，我们也可以构造一个稀疏矩阵（这个我们留为作业）。但是，前面得到的`edge_index`是有向的，我们需要把它变成无向边（对称的）。

In [None]:
edge_index

tensor([[    0,     0,     0,  ...,   609,   609,   609],
        [  610,   612,   615,  ..., 10072, 10073, 10113]])

In [None]:
num_nodes = edge_index.max().item() + 1
num_nodes

10352

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
edge_index = edge_index.to(device)
rating = rating.to(device)

In [None]:
def to_undirected(edge_index):
    edge_index_rev = torch.stack([edge_index[1], edge_index[0]]) # 反向边
    edge_index_sym = torch.cat([edge_index, edge_index_rev], dim=1)
    return edge_index_sym

然后，我们需要构造训练集、验证集和测试集。

In [None]:
_N = len(rating)
indicies_perm = torch.randperm(_N)
idx_train = indicies_perm[: int(0.8*_N)]
train_edge_index = edge_index[:, idx_train]
train_edge_label = rating[idx_train]

idx_val = indicies_perm[int(0.8*_N): int(0.9*_N)]
val_edge_index = edge_index[:, idx_val]
val_edge_label = rating[idx_val]

idx_test = indicies_perm[int(0.9*_N): ]
test_edge_index = edge_index[:, idx_test]
test_edge_label = rating[idx_test]

In [None]:
train_graph_edge_index = to_undirected(train_edge_index)
test_graph_edge_index = to_undirected(torch.cat([train_edge_index, val_edge_index], dim=1))
train_graph_edge_index.shape, test_graph_edge_index.shape

(torch.Size([2, 161336]), torch.Size([2, 181504]))

到这里，我们就构建完毕图数据集了。有的同学可能会问了，“那节点的特征是什么呢？”对于这里的图数据集，我们可以利用一个可学习的embedding层作为节点特征，我们第二节要讲的模型(LightGCN)就是这么做的。那么还有的同学会问了，“我们可不可以用电影的标题和种类的文本特征呢？” 答案是，当然可以。我们会在第三节提到这种方式。

## 2. 使用LightGCN进行推荐

我们用$\mathbf{E}^{(k)}$表示节点在第K层LightGCN的embedding，${\bf A}$表示图的邻接矩阵。LightGCN的聚合信息（Propagation）过程可以表示如下：

$$
\mathbf{E}^{(k+1)}=\left(\mathbf{D}^{-\frac{1}{2}} \mathbf{A D}^{-\frac{1}{2}}\right) \mathbf{E}^{(k)}
$$

其中$\alpha_{0}=\alpha_{1}=\ldots=\alpha_{K}=\frac{1}{K + 1}$是一个常量。不难看出，LightGCN聚合信息的过程是不包含参数的，这点和GCN不一样（GCN里包含了变换矩阵${\bf W}$）。最后的节点的embedding是对每一层的embedding进行加权求和。

$$
\begin{aligned}
\mathbf{E} &=\alpha_{0} \mathbf{E}^{(0)}+\alpha_{1} \mathbf{E}^{(1)}+\alpha_{2} \mathbf{E}^{(2)}+\ldots+\alpha_{K} \mathbf{E}^{(K)} \\
&=\alpha_{0} \mathbf{E}^{(0)}+\alpha_{1} \tilde{\mathbf{A}} \mathbf{E}^{(0)}+\alpha_{2} \tilde{\mathbf{A}}^{2} \mathbf{E}^{(0)}+\ldots+\alpha_{K} \tilde{\mathbf{A}}^{K} \mathbf{E}^{(0)}
\end{aligned}
$$
其中$\tilde{\mathbf{A}} =\mathbf{D}^{-\frac{1}{2}} \mathbf{A D}^{-\frac{1}{2}} $。

那我们来用PyG实现一下LightGCN的聚合过程。

In [None]:
from torch_geometric.nn.conv import MessagePassing
from torch_geometric.nn.conv.gcn_conv import gcn_norm

class LGConv(MessagePassing):
    def __init__(self, normalize=True, **kwargs):
        kwargs.setdefault('aggr', 'add') # 设置聚合信息的方式为求和(add)
        super().__init__()
        self.normalize = normalize

    def forward(self, x, edge_index, edge_weight=None):
        """前向传播，聚合邻居的信息"""
        if self.normalize:
            out = gcn_norm(edge_index, edge_weight, x.size(self.node_dim),
                           add_self_loops=False, dtype=x.dtype) # LightGCN中不需要对图加一个自环。
            edge_index, edge_weight = out

        return self.propagate(edge_index, x=x, edge_weight=edge_weight, size=None)

    def message(self, x_j, edge_weight):
        """聚合信息的时候，怎么加权"""
        return x_j if edge_weight is None else edge_weight.view(-1, 1) * x_j

In [None]:
from torch.nn import ModuleList, Embedding, Linear, ReLU
import torch.nn.functional as F

class LightGCN(torch.nn.Module):
    def __init__(self, num_nodes, embedding_dim, num_layers):
        super().__init__()

        self.num_nodes = num_nodes
        self.embedding_dim = embedding_dim
        self.num_layers = num_layers

        alpha = 1. / (num_layers + 1)
        self.alpha = torch.tensor([alpha] * (num_layers + 1))

        self.embedding = Embedding(num_nodes, embedding_dim)
        self.convs = ModuleList([LGConv() for _ in range(num_layers)])
        self.decoder = torch.nn.Sequential(Linear(2 * embedding_dim, embedding_dim), 
                                           ReLU(), Linear(embedding_dim, 1))
        self.reset_parameters()

    def reset_parameters(self):
        self.embedding.reset_parameters()

    def get_embedding(self, edge_index):
        x = self.embedding.weight # 输入特征
        out = x * self.alpha[0]

        for i in range(self.num_layers):
            x = self.convs[i](x, edge_index) 
            out = out + x * self.alpha[i + 1]
        return out

    def forward(self, edge_index, edge_label_index):
        """预测节点对的rating"""
        out = self.get_embedding(edge_index) # 每个节点的embedding
        out_src = out[edge_label_index[0]] # 起始节点的embedding
        out_dst = out[edge_label_index[1]] # 终止节点的embedding
        pred = self.decoder(torch.cat([out_src, out_dst], dim=-1))
        return pred

In [None]:
#  由于这个数据集的标签非常不均衡：大量的评分是3和4，而仅有少量的评分是0和1。因此我们使用加权的MSE损失函数
weight = torch.bincount(train_edge_label)
weight = weight.max() / weight
weight

tensor([25.9817,  7.7208,  2.6808,  1.0710,  1.0000,  2.6757], device='cuda:0')

In [None]:
def weighted_mse_loss(pred, target, weight=None):
    weight = 1. if weight is None else weight[target].to(pred.dtype)
    return (weight * (pred - target.to(pred.dtype)).pow(2)).mean()

In [None]:
model = LightGCN(num_nodes=num_nodes, embedding_dim=32, num_layers=10).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train():
    model.train()
    optimizer.zero_grad()
    pred = model(train_graph_edge_index, train_edge_index)
    target = train_edge_label.view(-1,1)
    loss = weighted_mse_loss(pred, target, weight)
    loss.backward()
    optimizer.step()
    return float(loss)


@torch.no_grad()
def test(graph, dev_edge_index, dev_edge_label):
    model.eval()
    pred = model(graph, dev_edge_index)
    pred = pred.clamp(min=0, max=5) # 限制预测值在0-5
    target = dev_edge_label.view(-1,1)
    rmse = F.mse_loss(pred, target).sqrt()
    return float(rmse)


for epoch in range(1, 301):
    loss = train()
    train_rmse = test(train_graph_edge_index, train_edge_index, train_edge_label)
    val_rmse = test(train_graph_edge_index, val_edge_index, val_edge_label)
    test_rmse = test(test_graph_edge_index, test_edge_index, test_edge_label)
    if epoch % 20 == 0:
        print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Train: {train_rmse:.4f}, '
              f'Val: {val_rmse:.4f}, Test: {test_rmse:.4f}')

Epoch: 020, Loss: 6.7103, Train: 1.7786, Val: 1.8073, Test: 1.7968
Epoch: 040, Loss: 5.4593, Train: 1.4295, Val: 1.4765, Test: 1.4625
Epoch: 060, Loss: 4.8252, Train: 1.2800, Val: 1.3275, Test: 1.3081
Epoch: 080, Loss: 4.1315, Train: 1.2389, Val: 1.2931, Test: 1.2642
Epoch: 100, Loss: 3.4720, Train: 1.1670, Val: 1.2295, Test: 1.1946
Epoch: 120, Loss: 2.9886, Train: 1.1112, Val: 1.1872, Test: 1.1497
Epoch: 140, Loss: 2.6723, Train: 1.0705, Val: 1.1615, Test: 1.1266
Epoch: 160, Loss: 2.4885, Train: 1.0466, Val: 1.1510, Test: 1.1183
Epoch: 180, Loss: 2.3864, Train: 1.0324, Val: 1.1466, Test: 1.1158
Epoch: 200, Loss: 2.3285, Train: 1.0234, Val: 1.1449, Test: 1.1149
Epoch: 220, Loss: 2.2945, Train: 1.0175, Val: 1.1444, Test: 1.1148
Epoch: 240, Loss: 2.2732, Train: 1.0138, Val: 1.1446, Test: 1.1150
Epoch: 260, Loss: 2.2588, Train: 1.0111, Val: 1.1448, Test: 1.1152
Epoch: 280, Loss: 2.2469, Train: 1.0091, Val: 1.1455, Test: 1.1160
Epoch: 300, Loss: 2.2362, Train: 1.0062, Val: 1.1445, Test: 1.

训练完模型后，我们来测试一下推荐的效果：

In [None]:
user_id = 0
movies = torch.LongTensor(movie_dst)
movies

tensor([  610,   612,   615,  ..., 10072, 10073, 10113])

In [None]:
edge_index_candidates = torch.vstack([torch.LongTensor(len(movie_dst) * [user_id]), movies])
edge_index_candidates

tensor([[   10,    10,    10,  ...,    10,    10,    10],
        [  610,   612,   615,  ..., 10072, 10073, 10113]])

In [None]:
with torch.no_grad():
    pred = model(test_graph_edge_index, edge_index_candidates).view(-1)

In [None]:
ind = pred.topk(10).indices
movie_recommend = movies[ind] - num_users
movie_recommend

tensor([8961, 3505, 3505, 3505, 4595, 4122, 4122, 9131, 8212, 6183])

In [None]:
movie_inverse_mapping = {y: x for x,y in movie_mapping.items()}
final_id = [movie_inverse_mapping[i] for i in movie_recommend.numpy()]

In [None]:
df1.title[final_id].values

array(['Villain (1971)', 'Phantom of the Paradise (1974)',
       'Phantom of the Paradise (1974)', 'Phantom of the Paradise (1974)',
       'Alien Contamination (1980)', 'Android (1982)', 'Android (1982)',
       'Cosmic Scrat-tastrophe (2015)',
       "Craig Ferguson: I'm Here To Help (2013)",
       'Go for Zucker! (Alles auf Zucker!) (2004)'], dtype=object)

我们还可以写一个接口来对外提供推荐结果。

In [None]:
def recommend(k=10):
    user_id = int(input("您想对哪位用户进行推荐？请输入用户编号:\n"))
    edge_index_candidates = torch.vstack([torch.LongTensor(len(movie_dst) * [user_id]), movies])
    with torch.no_grad():
        pred = model(test_graph_edge_index, edge_index_candidates).view(-1)
    ind = pred.topk(10).indices
    movie_recommend = movies[ind] - num_users
    final_id = [movie_inverse_mapping[i] for i in movie_recommend.numpy()]
    final_movies = df1.title[final_id].values
    print('为该用户推荐的评分最高的%s部电影是:'% k)
    for ii, m in enumerate(final_movies):
        print('%s 电影名: %s' % (ii+1, m))

In [None]:
recommend()

您想对哪位用户进行推荐？请输入用户编号:
0
为该用户推荐的评分最高的10部电影是:
1 电影名: Villain (1971)
2 电影名: Phantom of the Paradise (1974)
3 电影名: Phantom of the Paradise (1974)
4 电影名: Phantom of the Paradise (1974)
5 电影名: Alien Contamination (1980)
6 电影名: Android (1982)
7 电影名: Android (1982)
8 电影名: Cosmic Scrat-tastrophe (2015)
9 电影名: Craig Ferguson: I'm Here To Help (2013)
10 电影名: Go for Zucker! (Alles auf Zucker!) (2004)


## 3. 利用Heterogeneous GNN进行推荐

上述的LightGCN将用户-电影二分图视为同质图，下面我们展示利用PyG中的异质图神经网络来完成该问题。

我们可以直接使用PyG来加载Movielens图数据。数据我已经直接存在data目录下了，所以不需要下载就能直接加载数据集。

In [None]:
import torch
from torch.nn import Linear
import torch.nn.functional as F
import torch_geometric.transforms as T
from datasets import MovieLens, RandomLinkSplit
from torch_geometric.nn import SAGEConv, to_hetero

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
path = 'data/MovieLens'
dataset = MovieLens(path, model_name='all-MiniLM-L6-v2')  # all-MiniLM-L6-v2表示一个transformer模型
data = dataset[0].to(device)

值得注意的是，这里加载数据集的时候我们用到了transformer来提取电影的种类和标题的文本特征，并将其作为节点特征。

接下来我们进行额外的预处理：

In [None]:
# 添加用户节点的节点特征
data['user'].x = torch.eye(data['user'].num_nodes, device=device)
del data['user'].num_nodes

# 添加反向边('movie', 'rev_rates', 'user')
data = T.ToUndirected()(data)
del data['movie', 'rev_rates', 'user'].edge_label  # 去掉反向边的边标签 

In [None]:
data

HeteroData(
  [1mmovie[0m={ x=[9742, 404] },
  [1muser[0m={ x=[610, 610] },
  [1m(user, rates, movie)[0m={
    edge_index=[2, 100836],
    edge_label=[100836]
  },
  [1m(movie, rev_rates, user)[0m={ edge_index=[2, 100836] }
)

然后我们划分数据集

In [None]:
train_data, val_data, test_data = RandomLinkSplit(
    num_val=0.1,
    num_test=0.1,
    neg_sampling_ratio=0.0,
    edge_types=[('user', 'rates', 'movie')],
    rev_edge_types=[('movie', 'rev_rates', 'user')],
)(data)

In [None]:
train_data

HeteroData(
  [1mmovie[0m={ x=[9742, 404] },
  [1muser[0m={ x=[610, 610] },
  [1m(user, rates, movie)[0m={
    edge_index=[2, 80670],
    edge_label=[80670],
    edge_label_index=[2, 80670]
  },
  [1m(movie, rev_rates, user)[0m={ edge_index=[2, 80670] }
)

In [None]:
val_data

HeteroData(
  [1mmovie[0m={ x=[9742, 404] },
  [1muser[0m={ x=[610, 610] },
  [1m(user, rates, movie)[0m={
    edge_index=[2, 80670],
    edge_label=[10083],
    edge_label_index=[2, 10083]
  },
  [1m(movie, rev_rates, user)[0m={ edge_index=[2, 80670] }
)

In [None]:
test_data

HeteroData(
  [1mmovie[0m={ x=[9742, 404] },
  [1muser[0m={ x=[610, 610] },
  [1m(user, rates, movie)[0m={
    edge_index=[2, 90753],
    edge_label=[10083],
    edge_label_index=[2, 10083]
  },
  [1m(movie, rev_rates, user)[0m={ edge_index=[2, 90753] }
)

In [None]:
# 同样地，我们使用加权MSE损失
weight = torch.bincount(train_data['user', 'movie'].edge_label)
weight = weight.max() / weight

def weighted_mse_loss(pred, target, weight=None):
    weight = 1. if weight is None else weight[target].to(pred.dtype)
    return (weight * (pred - target.to(pred.dtype)).pow(2)).mean()

构建自编码器来完成预测评分的任务。这里我们使用`SAGEConv`（GraphSAGE）。

In [None]:
class GNNEncoder(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x


class EdgeDecoder(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        self.lin1 = Linear(2 * hidden_channels, hidden_channels)
        self.lin2 = Linear(hidden_channels, 1)

    def forward(self, z_dict, edge_label_index):
        row, col = edge_label_index
        z = torch.cat([z_dict['user'][row], z_dict['movie'][col]], dim=-1)

        z = self.lin1(z).relu()
        z = self.lin2(z)
        return z.view(-1)


class Model(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        self.encoder = GNNEncoder(hidden_channels, hidden_channels)
        self.encoder = to_hetero(self.encoder, data.metadata(), aggr='sum') # 将encoder变为能用于异质图的模型
        self.decoder = EdgeDecoder(hidden_channels)

    def forward(self, x_dict, edge_index_dict, edge_label_index):
        z_dict = self.encoder(x_dict, edge_index_dict)
        return self.decoder(z_dict, edge_label_index)

下面我们来训练这个模型。

In [None]:
model = Model(hidden_channels=32).to(device)

with torch.no_grad():
    model.encoder(train_data.x_dict, train_data.edge_index_dict)

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train():
    model.train()
    optimizer.zero_grad()
    pred = model(train_data.x_dict, train_data.edge_index_dict,
                 train_data['user', 'movie'].edge_label_index)
    target = train_data['user', 'movie'].edge_label
    loss = weighted_mse_loss(pred, target, weight)
    loss.backward()
    optimizer.step()
    return float(loss)


@torch.no_grad()
def test(data):
    model.eval()
    pred = model(data.x_dict, data.edge_index_dict,
                 data['user', 'movie'].edge_label_index)
    pred = pred.clamp(min=0, max=5)
    target = data['user', 'movie'].edge_label.float()
    rmse = F.mse_loss(pred, target).sqrt()
    return float(rmse)


for epoch in range(1, 301):
    loss = train()
    train_rmse = test(train_data)
    val_rmse = test(val_data)
    test_rmse = test(test_data)
    if epoch % 20 == 0:
        print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Train: {train_rmse:.4f}, '
              f'Val: {val_rmse:.4f}, Test: {test_rmse:.4f}')

Epoch: 020, Loss: 5.8017, Train: 1.2790, Val: 1.2948, Test: 1.2884
Epoch: 040, Loss: 4.3860, Train: 1.0836, Val: 1.0911, Test: 1.0893
Epoch: 060, Loss: 3.6048, Train: 1.1922, Val: 1.1923, Test: 1.1871
Epoch: 080, Loss: 3.2995, Train: 1.1425, Val: 1.1601, Test: 1.1529
Epoch: 100, Loss: 3.0741, Train: 1.1084, Val: 1.1450, Test: 1.1350
Epoch: 120, Loss: 2.9755, Train: 1.1073, Val: 1.1480, Test: 1.1385
Epoch: 140, Loss: 2.8910, Train: 1.1255, Val: 1.1668, Test: 1.1584
Epoch: 160, Loss: 2.7696, Train: 1.0441, Val: 1.0926, Test: 1.0821
Epoch: 180, Loss: 2.8414, Train: 1.0082, Val: 1.0639, Test: 1.0518
Epoch: 200, Loss: 2.5415, Train: 1.0646, Val: 1.1292, Test: 1.1188
Epoch: 220, Loss: 2.4486, Train: 1.0325, Val: 1.1076, Test: 1.0966
Epoch: 240, Loss: 2.3862, Train: 1.0159, Val: 1.0983, Test: 1.0863
Epoch: 260, Loss: 2.3625, Train: 1.0632, Val: 1.1456, Test: 1.1346
Epoch: 280, Loss: 2.3132, Train: 0.9968, Val: 1.0900, Test: 1.0756
Epoch: 300, Loss: 2.3369, Train: 0.9690, Val: 1.0653, Test: 1.