## INSTALL spektral library

In [3]:
!pip install spektral

Collecting spektral
  Downloading spektral-1.2.0-py3-none-any.whl (140 kB)
[K     |████████████████████████████████| 140 kB 2.3 MB/s eta 0:00:01
[?25hCollecting scipy
  Downloading scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
[K     |████████████████████████████████| 34.5 MB 4.8 MB/s eta 0:00:01
[?25hCollecting pandas
  Downloading pandas-2.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB)
[K     |████████████████████████████████| 12.3 MB 4.1 MB/s eta 0:00:01
[?25hCollecting networkx
  Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 67 kB/s eta 0:00:014
Collecting tqdm
  Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 1.0 MB/s eta 0:00:01
[?25hCollecting lxml
  Downloading lxml-4.9.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (7.1 MB)
[K     |████████████████████████████████| 7.1 MB 3.5

## Load required modules

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Dense, Dropout,Input
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from spektral.layers import GINConv,GCNConv
from spektral.utils.sparse import sp_matrix_to_sp_tensor
from spektral.data import DisjointLoader, BatchLoader
from spektral.datasets import TUDataset


## Load TUDataset
https://chrsmrrs.github.io/datasets/docs/datasets/

The TUDataset class from the Spektral library provides access to several benchmark graph datasets for graph classification tasks. One such dataset is the "PROTEINS" dataset. It contains a collection of protein structures, represented as graphs. In these graphs, nodes represent amino acids and edges represent connections between them based on spatial distance.

Each graph in the "PROTEINS" dataset has the following properties:

**Graph**: A graph representing the protein structure, with nodes as amino acids and edges as their spatial connections.

**Node features**: Each node in the graph has a feature vector with 4 dimensions, representing the amino acid type, secondary structure, and other properties. These features are used as input to the graph neural network.

**Graph label**: Each graph in the dataset is labeled as either "enzymatic" or "non-enzymatic." The goal of the graph classification task is to predict this label based on the graph structure and node features.

When you load the "PROTEINS" dataset using the TUDataset class, it preprocesses the raw data and creates a dataset object with the following properties:

`n_graphs`: Number of graphs in the dataset.
`n_node_features`: Dimension of the node features (4 for the PROTEINS dataset).
`n_labels`: Number of unique labels in the dataset (2 for the PROTEINS dataset, i.e., enzymatic and non-enzymatic).

In [None]:
dataset = TUDataset("PROTEINS")

In [26]:
g=dataset[0]
g

Graph(n_nodes=42, n_node_features=4, n_edge_features=None, n_labels=2)

## SPLIT in train and test

In [3]:
split = int(0.8 * len(dataset))
dataset_train, dataset_test = dataset[:split], dataset[split:]

## Create a data loader for batching

In [4]:
batch_size = 32
loader_train = DisjointLoader(dataset_train, batch_size=batch_size, epochs=200, shuffle=False)
loader_test = DisjointLoader(dataset_test, batch_size=batch_size)

Output for `DisjointLoader()`

For each `batch`, returns a tuple (inputs, labels).

inputs is a tuple containing:

x: node attributes of shape [n_nodes, n_node_features];
a: adjacency matrices of shape [n_nodes, n_nodes];
e: edge attributes of shape [n_edges, n_edge_features];
i: batch index of shape [n_nodes].

## Define the GCN model:

In [5]:
def create_gcn_model():
    # Define input placeholders for node features, adjacency matrix, and segment indices
    X_in = Input(shape=(dataset.n_node_features,))
    A_in = Input((None,), sparse=True)
    I_in = Input(shape=(), dtype=tf.int32)

    # Apply the first GINConv layer with 32 units and ReLU activation
    X_1 = GINConv(32, activation="relu")([X_in, A_in])
    # Apply dropout with a rate of 0.5
    X_1 = Dropout(0.5)(X_1)

    # Apply the second GINConv layer with 32 units and ReLU activation
    X_2 = GINConv(32, activation="relu")([X_1, A_in])
    # Apply dropout with a rate of 0.5
    X_2 = Dropout(0.5)(X_2)

    # Aggregate the node features using the segment_mean function and the segment indices
    X_3 = tf.math.segment_mean(X_2, I_in)
    # Apply a dense output layer with the number of labels and softmax activation
    out = Dense(dataset.n_labels, activation="softmax")(X_3)

    # Create and return the model with the defined inputs and outputs
    model = Model(inputs=[X_in, A_in, I_in], outputs=out)
    return model


## Compile the model:

In [6]:
model = create_gcn_model()
optimizer = Adam(lr=0.01)
loss_fn = CategoricalCrossentropy()
#model.compile(optimizer=optimizer, loss="sparse_categorical_crossentropy", metrics=["accuracy"])

  super(Adam, self).__init__(name, **kwargs)


## Train the model:
Train the model using the data loader for the training set:

In [7]:
# Decorate the function with @tf.function to compile as a TensorFlow graph
# Use the input_signature from loader_train and relax shapes for varying graph sizes
@tf.function(input_signature=loader_train.tf_signature(), experimental_relax_shapes=True)
def train_step(inputs, target):
    # Create a GradientTape context to record operations for automatic differentiation
    with tf.GradientTape() as tape:
        # Compute model predictions with the inputs, set training=True for training-specific behaviors
        predictions = model(inputs, training=True)
        # Calculate the loss using the provided loss_fn and add the model's regularization losses
        loss = loss_fn(target, predictions) + sum(model.losses)

    # Compute gradients of the loss with respect to the model's trainable variables
    gradients = tape.gradient(loss, model.trainable_variables)
    # Apply the gradients to the model's variables using the optimizer's apply_gradients method
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    # Compute the accuracy using the categorical_accuracy function from TensorFlow
    # Calculate the mean accuracy using tf.reduce_mean
    acc = tf.reduce_mean(categorical_accuracy(target, predictions))

    # Return the loss and accuracy as output
    return loss, acc

## Function for Evaluate the model:

In [8]:
def evaluate(loader):
    output = []
    step = 0
    while step < loader.steps_per_epoch:
        step += 1
        inputs, target = loader.__next__()
        pred = model(inputs, training=False)
        outs = (
            loss_fn(target, pred),
            tf.reduce_mean(categorical_accuracy(target, pred)),
            len(target),  # Keep track of batch size
        )
        output.append(outs)
        if step == loader.steps_per_epoch:
            output = np.array(output)
            return np.average(output[:, :-1], 0, weights=output[:, -1])


In [11]:
# Initialize the epoch and step counters to -1
# Create an empty list for storing training results
epoch = step = -1
results = []

# Iterate through the batches in the loader_train data loader
for batch in loader_train:
    # Increment the step counter
    step += 1

    # Execute the train_step function with the current batch
    # Obtain the loss and accuracy
    loss, acc = train_step(*batch)

    # Append the loss and accuracy to the results list
    results.append((loss, acc))

    # Check if the current step is equal to the number of steps per epoch (loader_train.steps_per_epoch)
    if step == loader_train.steps_per_epoch:
        # Reset the step counter to 0
        # Increment the epoch counter
        step = 0
        epoch += 1

        # Evaluate the model on the test set using the evaluate function (which should be defined beforehand)
        # Store the test results in results_te
        results_te = evaluate(loader_test)

        # Print the epoch number, mean training loss and accuracy, and test loss and accuracy
        print(
            "Ep. {} - Loss: {:.3f} - Acc: {:.3f} - Test loss: {:.3f} - Test acc: {:.3f}".format(
                epoch, *np.mean(results, 0), *results_te
            )
        )

        # Reset the results list to start collecting results for the next epoch
        results = []


Ep. 0 - Loss: 0.672 - Acc: 0.708 - Test loss: 0.827 - Test acc: 0.000
Ep. 1 - Loss: 0.773 - Acc: 0.661 - Test loss: 0.889 - Test acc: 0.004
Ep. 2 - Loss: 0.706 - Acc: 0.716 - Test loss: 0.921 - Test acc: 0.004
Ep. 3 - Loss: 0.684 - Acc: 0.729 - Test loss: 0.992 - Test acc: 0.000
Ep. 4 - Loss: 0.656 - Acc: 0.735 - Test loss: 0.976 - Test acc: 0.000
Ep. 5 - Loss: 0.650 - Acc: 0.734 - Test loss: 0.949 - Test acc: 0.000
Ep. 6 - Loss: 0.645 - Acc: 0.738 - Test loss: 0.951 - Test acc: 0.000
Ep. 7 - Loss: 0.631 - Acc: 0.738 - Test loss: 0.965 - Test acc: 0.000
Ep. 8 - Loss: 0.626 - Acc: 0.738 - Test loss: 0.975 - Test acc: 0.000
Ep. 9 - Loss: 0.628 - Acc: 0.741 - Test loss: 1.001 - Test acc: 0.000
Ep. 10 - Loss: 0.605 - Acc: 0.735 - Test loss: 1.029 - Test acc: 0.000
Ep. 11 - Loss: 0.620 - Acc: 0.740 - Test loss: 1.057 - Test acc: 0.000
Ep. 12 - Loss: 0.606 - Acc: 0.740 - Test loss: 1.080 - Test acc: 0.004
Ep. 13 - Loss: 0.606 - Acc: 0.740 - Test loss: 1.108 - Test acc: 0.004
Ep. 14 - Loss: 0

In [90]:
for batch in loader_train:
    inputs, target = batch
    #print(len(batch[1]))
    #A_in = GCNConv.preprocess(inputs[1])
    #A_in = sp_matrix_to_sp_tensor(A_in)  # Convert to SparseTensor
    A_in = inputs[1]
    loss, acc = model.train_on_batch([inputs[0], A_in, inputs[2]], target)
    print("Loss:", loss, "Accuracy:", acc)


IndexError: tuple index out of range

In [70]:
dataset.n_graphs

1113

Evaluate the model: