# Graph ConvNet

Neural network based on the paper: `Joshi, Laurent, and Bresson, ‘An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem’.`

Open questions:
* Difference between Linear and Embedding layer.
* Does it make a difference that no two Linear layers were used for the edge layer?
* Why are the biases disabled for eq 2 and eq 3 - in the paper they are present?
* How does Batch Normalisation work?
* How are class weights calculated
* log_softmax and NLLLoss vs CrossEntropy (CrossEntropy calculation meaning)

In [407]:
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

from sklearn.utils.class_weight import compute_class_weight
from scipy.spatial.distance import pdist, squareform

## Model Inputs

Inputs for the model for the TSP.

The following is how the data is generated:

```python
solver = TSPSolver.from_data(nodes_coord[:,0], nodes_coord[:,1], norm="GEO")  
solution = solver.solve()
f.write( " ".join( str(x)+str(" ")+str(y) for x,y in nodes_coord) )
f.write( str(" ") + str('output') + str(" ") )
f.write( str(" ").join( str(node_idx+1) for node_idx in solution.tour) )
f.write( str(" ") + str(solution.tour[0]+1) + str(" ") )
f.write( "\n" )
```

One line: `x_1, y_1, ... , x_n, y_n output 0 4 1 ... 2 0`

The tours are cyclic.

| Variable | Meaning | Dimensions |
| -------- | ------- | ---------- |
| batch_edges | Adj matrix special connections* | B x num_nodes x num_nodes
| batch_edges_values | Distance Matrix | B x num_nodes x num_nodes
| batch_edges_target | Target adj matrix | B x num_nodes x num_nodes
| batch_nodes | Ones vector | B x num_nodes
| batch_nodes_coord | Coordinates | B x num_nodes x 2
| *batch_nodes_target* | Value represents ordering in tour | B x num_nodes


*special connections:
* 1 - k-nearest neighbour
* 2 - self connections
* 0 - otherwise

In [396]:
# Some testing data
np.random.seed(42)

batch_nodes_coord = np.random.random((1, 4, 2))
batch_edges_values = np.zeros((1, 4, 4))
batch_edges = np.ones((1, 4, 4))
batch_edges_target = np.random.randint(0, 2, (1, 4, 4))

In [397]:
batch_nodes_coord[0]

array([[0.37454012, 0.95071431],
       [0.73199394, 0.59865848],
       [0.15601864, 0.15599452],
       [0.05808361, 0.86617615]])

In [398]:
batch_edges_target

array([[[1, 1, 1, 0],
        [1, 0, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 0, 0]]])

In [399]:
for i in range(len(batch_nodes_coord)):
    node_coords = batch_nodes_coord[i]
    dist_matrix = squareform(pdist(node_coords, metric='euclidean'))
    batch_edges_values[i] = dist_matrix

In [418]:
nodes_coord = torch.tensor(batch_nodes_coord, dtype=torch.float)
edges_values = torch.tensor(batch_edges_values, dtype=torch.float)
edges = torch.tensor(batch_edges, dtype=torch.long)
edges_target = torch.tensor(batch_edges_target, dtype=torch.long)

## Compute class weights

Due to the class imbalance, we need to calculate the class weights.

In [419]:
class_labels = batch_edges_target.flatten()

edge_class_weights = compute_class_weight('balanced',
                                          classes=np.unique(labels),
                                          y=class_labels)
edge_class_weights = torch.tensor(edge_class_weights, dtype=torch.float)

## The Model

In [420]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print("Device", device)

Device cpu


In [421]:
# == NORMALISATION LAYERS ==
# The normalisation layers are required because the tensors need to
# be transposed for batch normalisation

class EdgeNorm(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.batch_norm = nn.BatchNorm2d(hidden_dim, track_running_stats = False)
    
    def forward(self, e):
        """
        Args:
            e: Edge features (batch x num_nodes x num_nodes x hidden_dim)
        """
        # transpose because batch norm works on the second dim
        e_trans = e.transpose(1, 3).contiguous() # B x hidden_dim x num_nodes x num_nodes
        e_trans_batch_norm = self.batch_norm(e_trans)
        e_batch_norm = e_trans_batch_norm.transpose(1, 3).contiguous() # B x num_nodes x num_nodes x hidden_dim
        
        return e_batch_norm

class NodeNorm(nn.Module):
    def __init__(self, hidden_dim):
        super().__init__()
        self.batch_norm = nn.BatchNorm1d(hidden_dim, track_running_stats = False)
    
    def forward(self, x):
        """
        Args:
            x: Node features (batch x num_nodes x hidden_dim)
        """
        # transpose because batch norm works on the second dim
        x_trans = x.transpose(1, 2).contiguous() # B x hidden_dim x num_nodes
        x_trans_batch_norm = self.batch_norm(x_trans)
        x_batch_norm = x_trans_batch_norm.transpose(1, 2).contiguous() # B x num_nodes x hidden_dim
        
        return x_batch_norm


class EdgeFeatureLayer(nn.Module):
    """
    W_3 e_ij + W_4 (x_i + x_j) <-- currently the case, but should be: W_3 e_ij + W_4 x_i + W_5 x_j
    """
    def __init__(self, hidden_dim):
        super().__init__()
        self.W_3 = nn.Linear(hidden_dim, hidden_dim)
        # TODO: Why are not two Linear layers used W_4 and W_5 - does it make a difference?
        self.W_4 = nn.Linear(hidden_dim, hidden_dim)

    def forward(self, x, e):
        """
        Args:
            x: Node features (batch x num_nodes x hidden_dim)
            e: Edge features (batch x num_nodes x num_nodes x hidden_dim)
        """
        Ue = self.W_3(e)
        Vx = self.W_4(x)
        
        # this enables us to make use of broadcasting to get a B x num_nodes x num_nodes x hidden_dim tensor
        Vx_cols = Vx.unsqueeze(1) # B x num_nodes x hidden_dim => B x 1 x num_nodes x hidden_dim
        Vx_rows = Vx.unsqueeze(2) # B x num_nodes x hidden_dim => B x num_nodes x 1 x hidden_dim
        
        e_new = Ue + Vx_rows + Vx_cols
        
        return e_new
    
class NodeFeatureLayer(nn.Module):
    """
    W_1 x_i + ( sum_j( n_ij * W_2 x_j ) )
    
    where: n_ij = gate_ij / sum_j ( gate_ij + e )
    """
    def __init__(self, hidden_dim):
        super().__init__()
        self.W_1 = nn.Linear(hidden_dim, hidden_dim)
        self.W_2 = nn.Linear(hidden_dim, hidden_dim)
        self.epsilon = 1e-20
        
    def forward(self, x, edge_gate):
        """
        Args:
            x: Node features (batch x num_nodes x hidden_dim)
            edge_gate: Edge gate run through a sigmoid (batch x num_nodes x num_nodes x hidden_dim)
        """
        W_1_x = self.W_1(x)
        W_2_x = self.W_2(x) # B x num_nodes x hidden_dim
        
        n_ij = edge_gate / (self.epsilon + torch.sum(edge_gate, dim=2)) # B x num_nodes x num_nodes x hidden_dim
        x_new = torch.sum(n_ij * W_2_x, dim=2) # B x num_nodes x hidden_dim
        
        return x_new

# == GRAPH LAYER ==

class GraphLayer(nn.Module):
    """
    Graph layer for x_i and e_ij
    """
    def __init__(self, hidden_dim):
        super().__init__()
        self.node_feat = NodeFeatureLayer(hidden_dim)
        self.node_norm = NodeNorm(hidden_dim)
        self.edge_feat = EdgeFeatureLayer(hidden_dim)
        self.edge_norm = EdgeNorm(hidden_dim)
        
    def forward(self, x, e):
        """
        Args:
            x: Node features (batch x node_num x hidden_dim)
            e: Edge features (batch x num_nodes x num_nodes x hidden_dim)
        Return:
            x: Aggrgated node features (batch x node_num x hidden_dim)
            e: Aggragated edge features (batch x num_nodes x num_nodes x hidden_dim)
        """
        # edges
        e_feat = self.edge_feat(x, e)
        
        # edge gates
        e_gates = F.sigmoid(e_feat)
        
        # nodes
        x_feat = self.node_feat(x, e_gates)
        
        # normalisation
        x_norm = self.node_norm(x_feat)
        e_norm = self.edge_norm(e_feat)
        
        # activation
        x_act = F.relu(x_norm)
        e_act = F.relu(e_norm)
        
        # combine
        x_new = x + x_act
        e_new = e + e_act
        
        return x_new, e_new
        
# == MLP (Edge predictions) ==

class MLP(nn.Module):
    def __init__(self, in_dim, out_dim, hidden_layers):
        super().__init__()
        
        self.layers = nn.Sequential()
        
        for i in range(hidden_layers - 1):
            i += 1
            self.layers.add_module(f'lin{i}', nn.Linear(in_dim, in_dim))
            self.layers.add_module(f'relu{i}', nn.ReLU())
            
        self.layers.add_module('final', nn.Linear(in_dim, out_dim))
        
    def forward(self, e):
        """
        Args:
            e: Edge features (batch x num_nodes x num_nodes x hidden_dim)
        Returns:
            y: Edge predictions (batch x num_nodes x num_nodes x out_dim)
        """
        
        return self.layers(e)

    
# == MAIN NETWORK ==

class GraphNet(nn.Module):
    def __init__(self):
        super().__init__()
        # configs
        self.hidden_dim = 8
        self.num_gcn_layers = 2
        self.num_mlp_layers = 2
        
        # embeddings
        # TODO: Why is bias turned off when in the paper they don't mention anything?
        self.node_coord_embedding = nn.Linear(2, self.hidden_dim, bias=False)
        self.distance_embedding = nn.Linear(1, self.hidden_dim // 2, bias=False)
        # TODO: Don't understand the use of the Embedding layer
        self.edge_type_embedding = nn.Embedding(3, self.hidden_dim // 2) # 3 for the special cases 0, 1, 2 (more memory efficient)
        
        # GCN layers
        self.gcn_layers = nn.ModuleList([
            GraphLayer(hidden_dim=self.hidden_dim) for _ in range(self.num_gcn_layers)
        ])
        
        # edge prediction MLP
        self.mlp_edges = MLP(in_dim=self.hidden_dim, out_dim=2, hidden_layers=self.num_mlp_layers)
        
    def forward(self, node_coords, distance_matrix, edge_types):
        """
        Args:
            node_coords: Coordinates for each node (batch x num_nodes x num_nodes)
            distance_matrix: Distance matrix between nodes (batch x num_nodes x num_nodes)
            edge_types: Edge types to help learn faster (batch x num_nodes x num_nodes)
        """
        # eq 2
        x = self.node_coord_embedding(node_coords) # B x num_nodes x hidden_dim
        
        # eq 3
        dist_unsqueezed = distance_matrix.unsqueeze(3) # B x num_nodes x num_nodes x 1
        e_dist = self.distance_embedding(dist_unsqueezed) # B x num_nodes x num_nodes x hidden_dim // 2
        e_types = self.edge_type_embedding(edge_types) # B x num_nodes x num_nodes x hidden_dim // 2
        e = torch.cat((e_dist, e_types), dim=3) # B x num_nodes x num_nodes x hidden_dim
        
        # eq 4 and 5
        for gcn_layer in self.gcn_layers:
            _x, e = gcn_layer(x, e) # B x num_nodes x hidden_dim, B x num_nodes x num_nodes x hidden_dim
            
        # eq 6
        y_edge_preds = self.mlp_edges(e) # B x num_nodes x num_nodes x 2
        
        return y_edge_preds

model = GraphNet().to(device)

In [422]:
y_pred = model.forward(nodes_coord, edges_values, edges)

## Loss functions

In [431]:
def loss(y_pred, y_target, class_weights):
    y = y_pred.permute(0, 3, 1, 2)
    
    return nn.CrossEntropyLoss(edge_class_weights)(y, y_target)

In [432]:
loss(class_weights=edge_class_weights, y_pred=y_pred, y_target=edges_target)

tensor(0.8460, grad_fn=<NllLoss2DBackward0>)