The following is from [this article](https://medium.com/towards-data-science/hands-on-graph-neural-networks-with-pytorch-pytorch-geometric-359487e221a8) in Medium and [this Github repo](https://github.com/khuangaf/Pytorch-Geometric-YooChoose).

You will learn how to construct your own GNN with PyTorch Geometric, and how to use GNN to solve a real-world problem (Recsys Challenge 2015).

In this blog post, we will be using PyTorch and PyTorch Geometric (PyG), a Graph Neural Network framework built on top of PyTorch that runs blazingly fast. It is several times faster than the most well-known GNN framework, DGL.

Aside from its remarkable speed, PyG comes with a collection of well-implemented GNN models illustrated in various papers. Therefore, it would be very handy to reproduce the experiments with PyG.

Given its advantage in speed and convenience, without a doubt, PyG is one of the most popular and widely used GNN libraries. Let’s dive into the topic and get our hands dirty!

# 1. PyTorch Geometric Basics

In [1]:
import csv
import os
import pickle
import warnings

import numpy as np
import pandas as pd
import torch
from torch_geometric.data import Data, InMemoryDataset

warnings.filterwarnings("ignore", category=DeprecationWarning)

This section will walk you through the basics of PyG. Essentially, it will cover `torch_geometric.data` and `torch_geometric.nn`. You will learn how to pass geometric data into your GNN, and how to design a custom MessagePassing layer, the core of GNN.

## 1.1. Data

The `torch_geometric.data` module contains a `Data` class that allows you to create graphs from your data very easily. You only need to specify:

1. the attributes/ features associated with each node
2. the connectivity/adjacency of each node (edge index)

Let’s use the following graph to demonstrate how to create a Data object

<img src="example_graph.webp" style="width:400px;height:300px;background-color:white">

So there are 4 nodes in the graph, v0 … v3, each of which is associated with a 2-dimensional feature vector, and a label y indicating its class. These two can be represented as FloatTensors:

In [2]:
# Features
x = torch.tensor([[2, 1], [5, 6], [3, 7], [12, 0]], dtype=torch.float)
# Class
y = torch.tensor([0, 1, 0, 1], dtype=torch.float)

The graph connectivity (edge index) should be confined with the COO format, i.e. the first list contains the index of the source nodes, while the index of target nodes is specified in the second list.

In [3]:
edge_index = torch.tensor([[0, 1, 2, 0, 3], [1, 0, 1, 3, 2]], dtype=torch.long)

Note that the order of the edge index is irrelevant to the Data object you create since such information is only for computing the adjacency matrix. Therefore, the above edge_index express the same information as the following one.

In [4]:
edge_index = torch.tensor([[0, 2, 1, 0, 3], [3, 1, 0, 1, 2]], dtype=torch.long)

Putting them together, we can create a `Data` object as shown below:

In [5]:
data = Data(x=x, y=y, edge_index=edge_index)

In [6]:
data

Data(x=[4, 2], edge_index=[2, 5], y=[4])

## 1.2. Dataset

The dataset creation procedure is not very straightforward, but it may seem familiar to those who’ve used torchvision, as PyG is following its convention. PyG provides two different types of dataset classes, **InMemoryDataset** and **Dataset**. As they indicate literally, the former one is for data that fit in your RAM, while the second one is for much larger data. Since their implementations are quite similar, I will only cover InMemoryDataset.

To create an InMemoryDataset object, there are 4 functions you need to implement:

- `raw_file_names()`

It returns a list that shows a list of raw, unprocessed file names. If you only have a file then the returned list should only contain 1 element. In fact, you can simply return an empty list and specify your file later in process().

- `processed_file_names()`

Similar to the last function, it also returns a list containing the file names of all the processed data. After process() is called, Usually, the returned list should only have one element, storing the only processed data file name.

- `download()`

This function should download the data you are working on to the directory as specified in `self.raw_dir`. If you don’t need to download data, simply drop in `pass` in the function.

- `process()`

This is the **most important method of Dataset**. You need to gather your data into a list of Data objects. Then, call `self.collate()` to compute the slices that will be used by the DataLoader object. The following shows an example of the custom dataset. The following shows an example of the custom dataset from [PyG official website](https://pytorch-geometric.readthedocs.io/en/latest/tutorial/create_dataset.html).

In [7]:
import torch
from torch_geometric.data import InMemoryDataset, download_url


class MyOwnDataset(InMemoryDataset):
    def __init__(self, root, transform=None, pre_transform=None, pre_filter=None):
        super().__init__(root, transform, pre_transform, pre_filter)
        self.data, self.slices = torch.load(self.processed_paths[0])

    @property
    def raw_file_names(self):
        return ["some_file_1", "some_file_2", ...]

    @property
    def processed_file_names(self):
        return ["data.pt"]

    def download(self):
        # Download to `self.raw_dir`.
        download_url(url, self.raw_dir)
        ...

    def process(self):
        # Read data into huge `Data` list.
        data_list = [...]

        if self.pre_filter is not None:
            data_list = [data for data in data_list if self.pre_filter(data)]

        if self.pre_transform is not None:
            data_list = [self.pre_transform(data) for data in data_list]

        data, slices = self.collate(data_list)
        # Because saving a huge python list is really slow, we collate the list into one huge
        # torch_geometric.data.Data object via torch_geometric.data.InMemoryDataset.collate()
        # before saving . The collated data object has concatenated all examples into one big data object
        # and, in addition, returns a slices dictionary to reconstruct single examples from this object.
        # Finally, we need to load these two objects in the constructor into the properties self.data
        # and self.slices.

        torch.save((data, slices), self.processed_paths[0])

## 1.3. DataLoader

The `DataLoader` class allows you to feed data by batch into the model effortlessly. To create a `DataLoader` object, you simply specify the `Dataset` and the `batch_size` you want.

Every iteration of a `DataLoader` object yields a Batch object, which is very much like a `Data` object but with an attribute, “batch”. It indicates which graph each node is associated with. Since a `DataLoader` aggregates `x`, `y`, and `edge_index` from different samples/ graphs into Batches, the GNN model needs this “batch” information to know which nodes belong to the same graph within a batch to perform computation.

## 1.4. MessagePassing

Message passing is the essence of GNN which describes how node embeddings are learned. I have talked about in my last post, so I will just briefly run through this with terms that conform to the PyG documentation.

<img src="message_passing.webp" style="width:500px;height:50px;background-color:white">

`x` denotes the node embeddings, `e` denotes the edge features, `𝜙` denotes the message function, `□` denotes the aggregation function, `𝛾` denotes the update function. If the edges in the graph have no feature other than connectivity, `e` is essentially the edge index of the graph. The superscript represents the index of the layer. When `k=1`, `x` represents the input feature of each node. Below I will illustrate how each function works:

- `propagate(edge_index, size=None, **kwargs)`:
It takes in edge index and other optional information, such as node features (embedding). Calling this function will consequently call message and update.

- `message(**kwargs)`:
You specify how you construct “message” for each of the node pair (x_i, x_j). Since it follows the calls of propagate, it can take any argument passing to propagate. One thing to note is that you can define the mapping from arguments to the specific nodes with “_i” and “_j”. Therefore, you must be very careful when naming the argument of this function.

- `update(aggr_out, **kwargs)`
It takes in the aggregated message and other arguments passed into propagate, assigning a new embedding value for each node.

### Example

Let’s see how we can implement a SageConv layer from the paper [“Inductive Representation Learning on Large Graphs”](https://arxiv.org/abs/1706.02216). The message passing formula of SageConv is defined as:

<img src="equ_1.webp" style="width:500px;height:100px;background-color:white">

Here, we use max pooling as the aggregation method. Therefore, the right-hand side of the first line can be written as:

<img src="equ_2.webp" style="width:500px;height:50px;background-color:white">

which illustrates how the “message” is constructed. Each neighboring node embedding is multiplied by a weight matrix, added a bias and passed through an activation function. This can be easily done with `torch.nn.Linear`.

In [8]:
from torch_geometric.nn import MessagePassing


class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(SAGEConv, self).__init__(aggr="max")
        self.lin = torch.nn.Linear(in_channels, out_channels)
        self.act = torch.nn.ReLU()

    def message(self, x_j):
        # x_j has shape [E, in_channels]

        x_j = self.lin(x_j)
        x_j = self.act(x_j)

        return x_j

As for the update part, the aggregated message and the current node embedding is aggregated. Then, it is multiplied by another weight matrix and applied another activation function.

In [9]:
class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(SAGEConv, self).__init__(aggr="max")
        self.update_lin = torch.nn.Linear(
            in_channels + out_channels, in_channels, bias=False
        )
        self.update_act = torch.nn.ReLU()

    def update(self, aggr_out, x):
        # aggr_out has shape [N, out_channels]

        new_embedding = torch.cat([aggr_out, x], dim=1)
        new_embedding = self.update_lin(new_embedding)
        new_embedding = torch.update_act(new_embedding)

        return new_embedding

Putting it together, we have the following SageConv layer.

In [10]:
import torch
from torch.nn import Linear, ReLU
from torch.nn import Sequential as Seq
from torch_geometric.nn import MessagePassing
from torch_geometric.utils import add_self_loops, remove_self_loops

In [11]:
class SAGEConv(MessagePassing):
    def __init__(self, in_channels, out_channels):
        super(SAGEConv, self).__init__(aggr="max")  # "Max" aggregation.
        self.lin = torch.nn.Linear(in_channels, out_channels)
        self.act = torch.nn.ReLU()
        self.update_lin = torch.nn.Linear(
            in_channels + out_channels, in_channels, bias=False
        )
        self.update_act = torch.nn.ReLU()

    def forward(self, x, edge_index):
        # x has shape [N, in_channels]
        # edge_index has shape [2, E]

        edge_index, _ = remove_self_loops(edge_index)
        edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

        return self.propagate(edge_index, size=(x.size(0), x.size(0)), x=x)

    def message(self, x_j):
        # x_j has shape [E, in_channels]

        x_j = self.lin(x_j)
        x_j = self.act(x_j)

        return x_j

    def update(self, aggr_out, x):
        # aggr_out has shape [N, out_channels]

        new_embedding = torch.cat([aggr_out, x], dim=1)

        new_embedding = self.update_lin(new_embedding)
        new_embedding = self.update_act(new_embedding)

        return new_embedding

# 2. A Real-World Example — RecSys Challenge 2015

The RecSys Challenge 2015 is challenging data scientists to build a session-based recommender system. Participants in this challenge are asked to solve two tasks:

1. Predict whether there will be a buy event followed by a sequence of clicks
2. Predict which item will be bought

First, we download the data from the official website of RecSys Challenge 2015 and construct a Dataset. We’ll start with the first task as that one is easier.

The challenge provides two main sets of data, *yoochoose-clicks.dat*, and *yoochoose-buys.dat*, containing click events and buy events, respectively.

## 2.1. Preprocessing

After downloading the data, we preprocess it so that it can be fed to our model. `item_ids` are categorically encoded to ensure the encoded `item_ids`, which will later be mapped to an embedding matrix, starts at 0.

In [12]:
from sklearn.preprocessing import LabelEncoder

In [13]:
df = pd.read_csv("data/yoochoose-clicks.dat", header=None)
df.columns = ["session_id", "timestamp", "item_id", "category"]

  df = pd.read_csv("data/yoochoose-clicks.dat", header=None)


In [14]:
df

Unnamed: 0,session_id,timestamp,item_id,category
0,1,2014-04-07T10:51:09.277Z,214536502,0
1,1,2014-04-07T10:54:09.868Z,214536500,0
2,1,2014-04-07T10:54:46.998Z,214536506,0
3,1,2014-04-07T10:57:00.306Z,214577561,0
4,2,2014-04-07T13:56:37.614Z,214662742,0
...,...,...,...,...
33003939,11299809,2014-09-25T09:33:22.412Z,214819412,S
33003940,11299809,2014-09-25T09:43:52.821Z,214830939,S
33003941,11299811,2014-09-24T19:02:09.741Z,214854855,S
33003942,11299811,2014-09-24T19:02:11.894Z,214854838,S


In [15]:
buy_df = pd.read_csv("data/yoochoose-buys.dat", header=None)
buy_df.columns = ["session_id", "timestamp", "item_id", "price", "quantity"]

In [16]:
buy_df

Unnamed: 0,session_id,timestamp,item_id,price,quantity
0,420374,2014-04-06T18:44:58.314Z,214537888,12462,1
1,420374,2014-04-06T18:44:58.325Z,214537850,10471,1
2,281626,2014-04-06T09:40:13.032Z,214535653,1883,1
3,420368,2014-04-04T06:13:28.848Z,214530572,6073,1
4,420368,2014-04-04T06:13:28.858Z,214835025,2617,1
...,...,...,...,...,...
1150748,11368701,2014-09-26T07:52:51.357Z,214849809,554,2
1150749,11368691,2014-09-25T09:37:44.206Z,214700002,6806,5
1150750,11523941,2014-09-25T06:14:47.965Z,214578011,14556,1
1150751,11423202,2014-09-26T18:49:34.024Z,214849164,1046,1


In [17]:
item_encoder = LabelEncoder()

In [18]:
df["item_id"] = item_encoder.fit_transform(df.item_id)
df

Unnamed: 0,session_id,timestamp,item_id,category
0,1,2014-04-07T10:51:09.277Z,2053,0
1,1,2014-04-07T10:54:09.868Z,2052,0
2,1,2014-04-07T10:54:46.998Z,2054,0
3,1,2014-04-07T10:57:00.306Z,9876,0
4,2,2014-04-07T13:56:37.614Z,19448,0
...,...,...,...,...
33003939,11299809,2014-09-25T09:33:22.412Z,39347,S
33003940,11299809,2014-09-25T09:43:52.821Z,42548,S
33003941,11299811,2014-09-24T19:02:09.741Z,50432,S
33003942,11299811,2014-09-24T19:02:11.894Z,50425,S


Since the data is quite large, we subsample it for easier demonstration.

In [19]:
# randomly sample a couple of them
sampled_session_id = np.random.choice(df.session_id.unique(), 1000000, replace=False)
df = df.loc[df.session_id.isin(sampled_session_id)]
df.nunique()

session_id    1000000
timestamp     3561375
item_id         35755
category          237
dtype: int64

To determine the ground truth, i.e. whether there is any buy event for a given session, we simply check if a session_id in yoochoose-clicks.dat presents in yoochoose-buys.dat as well.

In [20]:
df["label"] = df.session_id.isin(buy_df.session_id)
df.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["label"] = df.session_id.isin(buy_df.session_id)


Unnamed: 0,session_id,timestamp,item_id,category,label
10,3,2014-04-02T13:17:46.940Z,28989,0,False
11,3,2014-04-02T13:26:02.515Z,35310,0,False
12,3,2014-04-02T13:30:12.318Z,43178,0,False
61,23,2014-04-04T07:43:18.598Z,29373,0,False
62,23,2014-04-04T07:49:04.634Z,29373,0,False


In [21]:
df["label"].value_counts()

label
False    3208103
True      354202
Name: count, dtype: int64

## 2.2. Dataset Construction

The data is ready to be transformed into a Dataset object after the preprocessing step. Here, we treat each item in a session as a node, and therefore all items in the same session form a graph. To build the dataset, we group the preprocessed data by session_id and iterate over these groups. In each iteration, the item_id in each group are categorically encoded again since for each graph, the node index should count from 0. Thus, we have the following:

In [22]:
import torch
from torch_geometric.data import InMemoryDataset
from tqdm import tqdm

In [23]:
class YooChooseBinaryDataset(InMemoryDataset):
    def __init__(self, root, transform=None, pre_transform=None):
        super(YooChooseBinaryDataset, self).__init__(root, transform, pre_transform)
        self.data, self.slices = torch.load(self.processed_paths[0])

    @property
    def raw_file_names(self):
        return []

    @property
    def processed_file_names(self):
        return ["yoochoose_click_binary_1M_sess.dataset"]

    def download(self):
        pass

    def process(self):
        data_list = []

        # process by session_id
        grouped = df.groupby("session_id")
        for session_id, group in tqdm(grouped):
            sess_item_id = LabelEncoder().fit_transform(group.item_id)
            group = group.reset_index(drop=True)
            group["sess_item_id"] = sess_item_id
            node_features = (
                group.loc[group.session_id == session_id, ["sess_item_id", "item_id"]]
                .sort_values("sess_item_id")
                .item_id.drop_duplicates()
                .values
            )

            node_features = torch.LongTensor(node_features).unsqueeze(1)
            target_nodes = group.sess_item_id.values[1:]
            source_nodes = group.sess_item_id.values[:-1]

            edge_index = torch.tensor([source_nodes, target_nodes], dtype=torch.long)
            x = node_features

            y = torch.FloatTensor([group.label.values[0]])

            data = Data(x=x, edge_index=edge_index, y=y)
            data_list.append(data)

        data, slices = self.collate(data_list)
        torch.save((data, slices), self.processed_paths[0])

After building the dataset, we call `shuffle()` to make sure it has been randomly shuffled and then split it into three sets for training, validation, and testing.

In [24]:
dataset = YooChooseBinaryDataset(root="data/")

In [25]:
train_dataset = dataset[:800000]
val_dataset = dataset[800000:900000]
test_dataset = dataset[900000:]
len(train_dataset), len(val_dataset), len(test_dataset)

(800000, 100000, 100000)

In [26]:
dataset[3]

Data(x=[3, 1], edge_index=[2, 2], y=[1])

## 2.3. Build a Graph Neural Network

The following custom GNN takes reference from one of the examples in PyG’s official Github repository. I changed the GraphConv layer with our self-implemented SAGEConv layer illustrated above. In addition, the output layer was also modified to match with a binary classification setup.

In [27]:
embed_dim = 128

In [28]:
import torch.nn.functional as F
from torch_geometric.nn import TopKPooling
from torch_geometric.nn import global_max_pool as gmp
from torch_geometric.nn import global_mean_pool as gap

In [29]:
class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.conv1 = SAGEConv(embed_dim, 128)
        self.pool1 = TopKPooling(128, ratio=0.8)
        self.conv2 = SAGEConv(128, 128)
        self.pool2 = TopKPooling(128, ratio=0.8)
        self.conv3 = SAGEConv(128, 128)
        self.pool3 = TopKPooling(128, ratio=0.8)
        self.item_embedding = torch.nn.Embedding(
            num_embeddings=df.item_id.max() + 1, embedding_dim=embed_dim
        )
        self.lin1 = torch.nn.Linear(256, 128)
        self.lin2 = torch.nn.Linear(128, 64)
        self.lin3 = torch.nn.Linear(64, 1)
        self.bn1 = torch.nn.BatchNorm1d(128)
        self.bn2 = torch.nn.BatchNorm1d(64)
        self.act1 = torch.nn.ReLU()
        self.act2 = torch.nn.ReLU()

    def forward(self, data):
        x, edge_index, batch = data.x, data.edge_index, data.batch
        x = self.item_embedding(x)
        x = x.squeeze(1)

        x = F.relu(self.conv1(x, edge_index))

        x, edge_index, _, batch, _, _ = self.pool1(x, edge_index, None, batch)
        x1 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

        x = F.relu(self.conv2(x, edge_index))

        x, edge_index, _, batch, _, _ = self.pool2(x, edge_index, None, batch)
        x2 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

        x = F.relu(self.conv3(x, edge_index))

        x, edge_index, _, batch, _, _ = self.pool3(x, edge_index, None, batch)
        x3 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

        x = x1 + x2 + x3

        x = self.lin1(x)
        x = self.act1(x)
        x = self.lin2(x)
        x = self.act2(x)
        x = F.dropout(x, p=0.5, training=self.training)

        x = torch.sigmoid(self.lin3(x)).squeeze(1)

        return x

## 2.4. Training

Training our custom GNN is very easy, we simply iterate the DataLoader constructed from the training set and back-propagate the loss function. Here, we use Adam as the optimizer with the learning rate set to 0.005 and Binary Cross Entropy as the loss function.

In [30]:
from torch_geometric.data import DataLoader

In [31]:
def train():
    model.train()

    loss_all = 0
    for data in train_loader:
        data = data.to(device)
        optimizer.zero_grad()
        output = model(data)
        label = data.y.to(device)
        loss = crit(output, label)
        loss.backward()
        loss_all += data.num_graphs * loss.item()
        optimizer.step()
    return loss_all / len(train_dataset)

In [32]:
batch_size = 1024

In [33]:
device = torch.device("cuda")
model = Net().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.005)
crit = torch.nn.BCELoss()
train_loader = DataLoader(train_dataset, batch_size=batch_size)



## 2.5. Validation

This label is highly unbalanced with an overwhelming amount of negative labels since most of the sessions are not followed by any buy event. In other words, a dumb model guessing all negatives would give you above 90% accuracy. Therefore, instead of accuracy, Area Under Curve (AUC) is a better metric for this task as it only cares if the positive examples are scored higher than the negative examples. We use the off-the-shelf AUC calculation function from Sklearn.

In [34]:
from sklearn.metrics import roc_auc_score

In [35]:
def evaluate(loader):
    model.eval()

    predictions = []
    labels = []

    with torch.no_grad():
        for data in loader:
            data = data.to(device)
            pred = model(data).detach().cpu().numpy()

            label = data.y.detach().cpu().numpy()
            predictions.append(pred)
            labels.append(label)

    predictions = np.hstack(predictions)
    labels = np.hstack(labels)

    return roc_auc_score(labels, predictions)

## 2.6. Result

In [36]:
val_loader = DataLoader(val_dataset, batch_size=batch_size)

In [37]:
test_loader = DataLoader(test_dataset, batch_size=batch_size)

In [38]:
for epoch in range(20):
    loss = train()
    train_acc = evaluate(train_loader)
    val_acc = evaluate(val_loader)
    test_acc = evaluate(test_loader)
    print(
        "Epoch: {:03d}, Loss: {:.5f}, Train Auc: {:.5f}, Val Auc: {:.5f}, Test Auc: {:.5f}".format(
            epoch, loss, train_acc, val_acc, test_acc
        )
    )



Epoch: 000, Loss: 0.20935, Train Auc: 0.73691, Val Auc: 0.72700, Test Auc: 0.69942




Epoch: 001, Loss: 0.19090, Train Auc: 0.77644, Val Auc: 0.73261, Test Auc: 0.69701




Epoch: 002, Loss: 0.18170, Train Auc: 0.79200, Val Auc: 0.73049, Test Auc: 0.68921




Epoch: 003, Loss: 0.17503, Train Auc: 0.80633, Val Auc: 0.72338, Test Auc: 0.67858




Epoch: 004, Loss: 0.16821, Train Auc: 0.80995, Val Auc: 0.72259, Test Auc: 0.67965




Epoch: 005, Loss: 0.16320, Train Auc: 0.82540, Val Auc: 0.71272, Test Auc: 0.67124




Epoch: 006, Loss: 0.15854, Train Auc: 0.83057, Val Auc: 0.70657, Test Auc: 0.66338




Epoch: 007, Loss: 0.15418, Train Auc: 0.84155, Val Auc: 0.69290, Test Auc: 0.65316




Epoch: 008, Loss: 0.14954, Train Auc: 0.83834, Val Auc: 0.68186, Test Auc: 0.64070




Epoch: 009, Loss: 0.14703, Train Auc: 0.84243, Val Auc: 0.67614, Test Auc: 0.63447




Epoch: 010, Loss: 0.14428, Train Auc: 0.84583, Val Auc: 0.67663, Test Auc: 0.63867




Epoch: 011, Loss: 0.14210, Train Auc: 0.85269, Val Auc: 0.67275, Test Auc: 0.63442




Epoch: 012, Loss: 0.14031, Train Auc: 0.85781, Val Auc: 0.66431, Test Auc: 0.63264




Epoch: 013, Loss: 0.13942, Train Auc: 0.86169, Val Auc: 0.67141, Test Auc: 0.63711




Epoch: 014, Loss: 0.13816, Train Auc: 0.86897, Val Auc: 0.65952, Test Auc: 0.62737




Epoch: 015, Loss: 0.13683, Train Auc: 0.86566, Val Auc: 0.66992, Test Auc: 0.63191




Epoch: 016, Loss: 0.13541, Train Auc: 0.87268, Val Auc: 0.65312, Test Auc: 0.62646




Epoch: 017, Loss: 0.13353, Train Auc: 0.87514, Val Auc: 0.65024, Test Auc: 0.62247




Epoch: 018, Loss: 0.13171, Train Auc: 0.87840, Val Auc: 0.65044, Test Auc: 0.61964




Epoch: 019, Loss: 0.12978, Train Auc: 0.87814, Val Auc: 0.65587, Test Auc: 0.62458


With only 1 Million rows of training data (around 10% of all data) and 1 epoch of training, we can obtain an AUC score of around 0.73 for validation and test set. The score is very likely to improve if more data is used to train the model with larger training steps.