![image.png](attachment:8b0c5a09-111d-4bae-a0d0-cdb794a7ec56.png)

# Feature based methods

In this notebook we will exploring a very naive (yet powerful) approach for solving graph-based supervised machine learning. The idea rely on the classic machine learning approach of handcrafted feature extraction.

In Chapter 1 you learned how local and global graph properties can be extracted from graphs. Those properties represent the graph itself and bring important informations which can be useful for classification.

In this demo, we will be using the PROTEINS dataset, already integrated in StellarGraph

In [1]:
from stellargraph import datasets
from IPython.display import display, HTML

dataset = datasets.PROTEINS()
print(dataset)
display(HTML(dataset.description))
graphs, graph_labels = dataset.load()
print(graphs)
print(graph_labels)

2023-03-22 11:13:53.921923: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-03-22 11:13:53.921952: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


<stellargraph.datasets.datasets.PROTEINS object at 0x7fbe903c1950>


2023-03-22 11:13:55.032139: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2023-03-22 11:13:55.032313: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-03-22 11:13:55.032325: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303)
2023-03-22 11:13:55.032342: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (5bd49021e10b): /proc/driver/nvidia/version does not exist
2023-03-22 11:13:55.032558: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the ap

[<stellargraph.core.graph.StellarGraph object at 0x7fbdec60cf90>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec60ca10>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec60cdd0>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec60ce50>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a0450>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a0dd0>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a92d0>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a9790>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a9c90>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a9d10>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a0e50>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5a9ed0>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5b3d90>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5b3450>, <stellargraph.core.graph.StellarGraph object at 0x7fbdec5b8490>, <stellargraph.core.graph

To compute the graph metrics, one way is to retrieve the adjacency matrix representation of each graph.

In [2]:
# convert graphs from StellarGraph format to numpy adj matrices
adjs = [graph.to_adjacency_matrix().A for graph in graphs]
# convert labes fom Pandas.Series to numpy array
labels = graph_labels.to_numpy(dtype=int)

In [3]:
import numpy as np
import networkx as nx

metrics = []
for adj in adjs:
  G = nx.from_numpy_matrix(adj)
  # basic properties
  num_edges = G.number_of_edges()
  # clustering measures
  cc = nx.average_clustering(G)
  # measure of efficiency
  eff = nx.global_efficiency(G)

  metrics.append([num_edges, cc, eff])



We can now exploit scikit-learn utilities to create a train and test set. In our experiments, we will be using 70% of the dataset as training set and the remaining as testset

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(metrics, labels, test_size=0.3, random_state=42)

As commonly done in many Machine Learning workflows, we preprocess features to have zero mean and unit standard deviation

In [5]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

It's now time for training a proper algorithm. We chose a support vector machine for this task

In [6]:
from sklearn import svm
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

clf = svm.SVC()
clf.fit(X_train_scaled, y_train)

y_pred = clf.predict(X_test_scaled)

print('Accuracy', accuracy_score(y_test,y_pred))
print('Precision', precision_score(y_test,y_pred))
print('Recall', recall_score(y_test,y_pred))
print('F1-score', f1_score(y_test,y_pred))

Accuracy 0.7455089820359282
Precision 0.7709251101321586
Recall 0.8413461538461539
F1-score 0.8045977011494253
