# Link Prediction in Ecological Networks using Latent Space Representation of Network Graphs - Optimal Values


This Jupyter Notebook is used to test the optimal values for four different variables (i.e. hyperparameters). This process and its results are analysed in the following sections.


The variables chosen for the analysis are:  number of folds and number of iterations for the classifier and number of walks and number of dimensions for the DeepWalk algorithm. For this purpose, a custom nested loop was created.

# Importing the necessarily libraries


In [1]:
try:
  import stellargraph as sg
except:
  %pip install -q stellargraph[demos]==1.2.1

from stellargraph.data import EdgeSplitter
from stellargraph import StellarGraph

In [2]:
try:
    import karateclub
except ModuleNotFoundError:
    ! pip install karateclub
from karateclub import DeepWalk

In [3]:
import networkx as nx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import requests, zipfile, io
import os.path

from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import RocCurveDisplay

## Loading the data

The GATEWAy dataset is automatically downloaded each time the notebook is run; there is no need to manually upload the dataset in Google Colab or similar platforms.

As a result, a .zip file is downloaded and then its contents are extracted. Afterwards, the relevant .csv file is loaded into a Pandas data frame.

In [4]:
file_name='283_2_FoodWebDataBase_2018_12_10.csv'

if not os.path.isfile(file_name):
  zip_file_url="https://idata.idiv.de/ddm/Data/DownloadZip/283?version=756"
  r = requests.get(zip_file_url)
  z = zipfile.ZipFile(io.BytesIO(r.content))
  z.extractall()

In [5]:
df = pd.read_csv('283_2_FoodWebDataBase_2018_12_10.csv', low_memory=False)
df.columns = df.columns.str.replace("[.]", "_")

  df.columns = df.columns.str.replace("[.]", "_")


## Defining the used functions

In this section, we define the various functions used for the purpose of running the link prediction pipeline. The functions work as follows:

`deepwalk_representations`: this function uses the DeepWalk algorithm to generate latent space respresentations of the graph's nodes

`connect_samples_with_node_embeddings`: this function creates the feature representations of edges, by developing the learned feature representations of the individual nodes using binary operators.

`train_model`: this function is used to call the classifier and feed it the samples and their labels.

`get_classifier`: this function defines the classifier model, its hyperparameters before finally standardising the data.

`model_evaluation`: this is a general function for evaluating the model, with its purpose being to call to call the connect_samples_with_node_embeddings and the get_score functions.

`get_score`: this function returns the ROC AUC score of the examined food web for after the training and evaluation of the model is complete. During Validation Phase, it also draws the ROC curves for each web.

`find_best_operator`: this function is used for the Binary Operator Selector and Training Phase to search for the best binary operator. It first trains the model on the Training Set and then evaluates the trained model on the Binary Operator Selector Set, for all four binary operators. This produces four different ROC AUC scores, one for each operator, and the binary operator with the highest score is selected for evaluation of the model in the Validation phase.

`Average, Hadamard, Weighted_L1, Weighted_L2`: these functions return the binary product of the two pairs of vectors that are given as an input, each one using a different operation.


In [6]:
def deepwalk_representations(graph):
  modelDeep = DeepWalk(walk_length=number_of_walks, dimensions=number_of_dimensions, seed=0)
  modelDeep.fit(graph.to_networkx())

  def node_embeddings(vector):
    return modelDeep.get_embedding()[vector]
  return node_embeddings

  
def connect_samples_with_node_embeddings(link_samples, node_embeddings, binary_operator):
    _z=[]
    for source, destination in link_samples:
      _z.append(binary_operator(node_embeddings(source), node_embeddings(destination)))
    return _z


def train_model(link_samples, link_labels, node_embeddings, binary_operator):
    clf = get_classifier()
    link_features = connect_samples_with_node_embeddings(link_samples, node_embeddings, binary_operator)
    clf.fit(link_features, link_labels)
    return clf


def get_classifier(max_iter=2000):
    lr_clf = LogisticRegressionCV(Cs=10, cv=number_of_folds, scoring="roc_auc", max_iter=number_of_iterations)
    return Pipeline(steps=[("sc", StandardScaler()), ("clf", lr_clf)])



def model_evaluation(clf, link_samples_test, link_labels_test, node_embeddings, binary_operator, validation_phase=False):
    link_features_test = connect_samples_with_node_embeddings(link_samples_test, node_embeddings, binary_operator)
    score = get_score(clf, link_features_test, link_labels_test, validation_phase)
    return score


def get_score(clf, link_features, link_labels, validation_phase=False):
    predicted = clf.predict_proba(link_features)
    positives = list(clf.classes_).index(1)

    return roc_auc_score(link_labels, predicted[:, positives])



def Average(a, b):
    return (a+b)/2.0

def Hadamard(a, b):
    return a*b

def Weighted_L1(a, b):
    return np.abs(a-b)

def Weighted_L2(a, b):
    return (a-b)**2

binary_operators = [Average, Hadamard, Weighted_L1, Weighted_L2]

def find_best_operator(binary_operator):
    clf = train_model(samples_train, labels_train, embedding_train, binary_operator)
    score = model_evaluation(clf, samples_model_selection, labels_model_selection, embedding_train, binary_operator)
    return {"classifier": clf,"binary_operator": binary_operator,"score": score}

## Model pipeline implementation

This part serves as the main part of the link prediction pipeline. First, the dataframe is sliced so that only one chosen food web is selected, forming a new, individual dataset. A total of seven food webs are chosen: Weddell Sea, Chesapeake Bay, Lough Hyne, Carpinteria, FloridaIslandE3, FloridaIslandE1 and Caribbean Reef.

Each individual dataset is then loaded into NetworkX as directed graph, with the columns `con_taxonomy` and `res_taxonomy` representing the pairs of nodes. The resulting NetworkX object then has its nodes converted to integers and is then loaded into the StellarGraph library as a directed multigraph.

The data is then separated into three different sets: the Training Set (used for training the model), the Binary Selector Model (used for selecting the optimal binary operator) and the Testing Set (used for evaluating the trained model). These sets are then employed to train and test the link prediction model. For more information on this procedure, refer to the Methodology section of the Dissertation-Research Paper.

A number of parameters are tested, each one with a set of different values. The parameters chosen for the analysis are:  number of folds and number of iterations for the classifier and number of walks and number of dimensions for the DeepWalk algorithm. A nested loop is created that goes through all possible combinations of the parameter values.

In [7]:
list_of_webs=["Weddell Sea", "Chesapeake Bay", "Lough Hyne",
              "Carpinteria", "FloridaIslandE3", "FloridaIslandE1", "Caribbean Reef"]

In [8]:
list_of_number_of_folds=[5,10,20]
list_of_iterations=[2000,5000]
list_of_number_of_walks=[10,16,32,64]
list_of_dimensions=[16,64,128]

In [9]:
record_of_scores=[]
record_of_number_of_folds=[]
record_of_iterations=[]
record_of_number_of_walks=[]
record_of_dimensions=[]
record_of_errors=[]

for number_of_folds in list_of_number_of_folds:
  for number_of_iterations in list_of_iterations:
    for number_of_walks in list_of_number_of_walks:
      for number_of_dimensions in list_of_dimensions:
        list_score=[]
        error_in_run=False
        
        for name in list_of_webs:
          try:
            df_individual_web=df.loc[df.foodweb_name==name]

            G = nx.from_pandas_edgelist(df_individual_web, "con_taxonomy", "res_taxonomy")
            G = nx.DiGraph(G)
            G = nx.convert_node_labels_to_integers(G, first_label=0, ordering='default')
            G = StellarGraph.from_networkx(G)

            edge_splitter_test = EdgeSplitter(G)
            graph_test, samples_test, labels_test = edge_splitter_test.train_test_split(p=0.1, method="global")



            edge_splitter_train = EdgeSplitter(graph_test, G)
            graph_train, samples, labels = edge_splitter_train.train_test_split(p=0.1, method="global")
            (samples_train,samples_model_selection,labels_train,labels_model_selection) = train_test_split(samples, labels, train_size=0.75, test_size=0.25)


              
            embedding_train = deepwalk_representations(graph_train)



            operator_products=[]
            for binary_operator in binary_operators:
              operator_products.append(find_best_operator(binary_operator))

            optimal_operator = max(operator_products, key=lambda product: product["score"])

            embedding_test = deepwalk_representations(graph_test)
            test_score = model_evaluation(optimal_operator["classifier"],samples_test,labels_test,embedding_test,optimal_operator["binary_operator"], validation_phase=True)
            
            list_score.append(test_score)

          except:
            error_in_run=True
            print(name, ' had an error')
            pass

        if error_in_run==True:
          record_of_errors.append(True)
        else:
          record_of_errors.append(False)
        
        record_of_scores.append(sum(list_score) / len(list_score))
        record_of_number_of_folds.append(number_of_folds)
        record_of_iterations.append(number_of_iterations)
        record_of_number_of_walks.append(number_of_walks)
        record_of_dimensions.append(number_of_dimensions)
        parameters='number_of_folds= ' + str(number_of_folds) + ', ' + 'number_of_iterations= ' + str(number_of_iterations) + ', ' + 'number_of_walks= ' + str(number_of_walks) + ', ' + 'number_of_dimensions= ' + str(number_of_dimensions) + ', ' + 'error_in_run= ' + str(error_in_run)
        print('Average of ROC AUC scores= ', record_of_scores[-1], ', parameters used: ', parameters)

** Sampled 3124 positive and 3124 negative edges. **
** Sampled 2811 positive and 2811 negative edges. **
** Sampled 2822 positive and 2822 negative edges. **
** Sampled 2540 positive and 2540 negative edges. **
** Sampled 1006 positive and 1006 negative edges. **
** Sampled 905 positive and 905 negative edges. **
Carpinteria  had an error
** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **
** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **
** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **
Average of ROC AUC scores=  0.6767038435685855 , parameters used:  number_of_folds= 5, number_of_iterations= 2000, number_of_walks= 10, number_of_dimensions= 16, error_in_run= True
** Sampled 3124 positive and 3124 negative edges. **
** Sampled 2811 positive and 2811 negative edges. **
** Sampled 2822 positive and 2822 negative edges. **
** Sampled 2

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **
Average of ROC AUC scores=  0.6042441248588153 , parameters used:  number_of_folds= 5, number_of_iterations= 2000, number_of_walks= 32, number_of_dimensions= 128, error_in_run= True
** Sampled 3124 positive and 3124 negative edges. **
** Sampled 2811 positive and 2811 negative edges. **
** Sampled 2822 positive and 2822 negative edges. **
** Sampled 2540 positive and 2540 negative edges. **
** Sampled 1006 positive and 1006 negative edges. **
** Sampled 905 positive and 905 negative edges. **
** Sampled 589 positive and 589 negative edges. **
** Sampled 530 positive and 530 negative edges. **
** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **
** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **
** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative ed

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **
Average of ROC AUC scores=  0.5736650477722279 , parameters used:  number_of_folds= 5, number_of_iterations= 2000, number_of_walks= 64, number_of_dimensions= 128, error_in_run= False
** Sampled 3124 positive and 3124 negative edges. **
** Sampled 2811 positive and 2811 negative edges. **
** Sampled 2822 positive and 2822 negative edges. **
** Sampled 2540 positive and 2540 negative edges. **
** Sampled 1006 positive and 1006 negative edges. **
** Sampled 905 positive and 905 negative edges. **
** Sampled 589 positive and 589 negative edges. **
** Sampled 530 positive and 530 negative edges. **
** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **
** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **
** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative e

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


Average of ROC AUC scores=  0.5461711621681965 , parameters used:  number_of_folds= 10, number_of_iterations= 2000, number_of_walks= 64, number_of_dimensions= 128, error_in_run= False
** Sampled 3124 positive and 3124 negative edges. **
** Sampled 2811 positive and 2811 negative edges. **
** Sampled 2822 positive and 2822 negative edges. **
** Sampled 2540 positive and 2540 negative edges. **
** Sampled 1006 positive and 1006 negative edges. **
** Sampled 905 positive and 905 negative edges. **
** Sampled 589 positive and 589 negative edges. **
** Sampled 530 positive and 530 negative edges. **
** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **
** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **
** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **
Average of ROC AUC scores=  0.7036401451397808 , parameters used:  number_of_folds= 10, numb

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **
Average of ROC AUC scores=  0.5924654516163917 , parameters used:  number_of_folds= 20, number_of_iterations= 2000, number_of_walks= 32, number_of_dimensions= 128, error_in_run= True
** Sampled 3124 positive and 3124 negative edges. **
** Sampled 2811 positive and 2811 negative edges. **
** Sampled 2822 positive and 2822 negative edges. **
** Sampled 2540 positive and 2540 negative edges. **
** Sampled 1006 positive and 1006 negative edges. **
** Sampled 905 positive and 905 negative edges. **
** Sampled 589 positive and 589 negative edges. **
** Sampled 530 positive and 530 negative edges. **
** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **
** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **
** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative e

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Average of ROC AUC scores=  0.6113593196982505 , parameters used:  number_of_folds= 20, number_of_iterations= 2000, number_of_walks= 64, number_of_dimensions= 128, error_in_run= False
** Sampled 3124 positive and 3124 negative edges. **
** Sampled 2811 positive and 2811 negative edges. **
** Sampled 2822 positive and 2822 negative edges. **
** Sampled 2540 positive and 2540 negative edges. **
** Sampled 1006 positive and 1006 negative edges. **
** Sampled 905 positive and 905 negative edges. **
** Sampled 589 positive and 589 negative edges. **
** Sampled 530 positive and 530 negative edges. **
** Sampled 721 positive and 721 negative edges. **
** Sampled 649 positive and 649 negative edges. **
** Sampled 701 positive and 701 negative edges. **
** Sampled 631 positive and 631 negative edges. **
** Sampled 659 positive and 659 negative edges. **
** Sampled 593 positive and 593 negative edges. **
Average of ROC AUC scores=  0.7087019291688479 , parameters used:  number_of_folds= 20, numb

## Results reporting and recording

The results of the tests of the link prediction model are displayed here in the form of Pandas data frame. The results are also recorded in a .csv file.

In [18]:
df_parameters=pd.DataFrame(list(zip(record_of_scores,record_of_number_of_folds,record_of_iterations,record_of_number_of_walks,record_of_dimensions,record_of_errors)), 
                           columns=['Average of ROC AUC scores', 'Folds', 'Iterations', 'Walks','Dimensions', 'Error occurred'])
df_parameters

Unnamed: 0,Average of ROC AUC scores,Folds,Iterations,Walks,Dimensions,Error occurred
0,0.676704,5,2000,10,16,True
1,0.683876,5,2000,10,64,True
2,0.694620,5,2000,10,128,True
3,0.727244,5,2000,16,16,True
4,0.652129,5,2000,16,64,False
...,...,...,...,...,...,...
67,0.637168,20,5000,32,64,False
68,0.439988,20,5000,32,128,True
69,0.674426,20,5000,64,16,False
70,0.524341,20,5000,64,64,False


In [11]:
df_parameters.to_csv('parameters.csv', index=False)

## Creating the	correlation coefficient matrix  

In order to determine the optimal values, a correlation coefficient matrix using the previously generated Pandas data frame. The matrix is presented below.

In [19]:
df_parameters.corr().style.background_gradient(cmap='coolwarm')

Unnamed: 0,Average of ROC AUC scores,Folds,Iterations,Walks,Dimensions,Error occurred
Average of ROC AUC scores,1.0,-0.01747,-0.102931,-0.273358,-0.589225,0.09538
Folds,-0.01747,1.0,-0.0,0.0,-0.0,-0.094491
Iterations,-0.102931,-0.0,1.0,-0.0,-0.0,-0.058926
Walks,-0.273358,0.0,-0.0,1.0,-0.0,0.008439
Dimensions,-0.589225,-0.0,-0.0,-0.0,1.0,0.113024
Error occurred,0.09538,-0.094491,-0.058926,0.008439,0.113024,1.0


Below is the same correlation coefficient matrix but now only the runs where no error occurred are considered.

In [20]:
df_parameters.loc[df_parameters['Error occurred']==False].corr().style.background_gradient(cmap='coolwarm')

  smin = np.nanmin(s.to_numpy()) if vmin is None else vmin
  smax = np.nanmax(s.to_numpy()) if vmax is None else vmax


Unnamed: 0,Average of ROC AUC scores,Folds,Iterations,Walks,Dimensions,Error occurred
Average of ROC AUC scores,1.0,0.169898,0.118057,-0.149976,-0.652409,
Folds,0.169898,1.0,0.019637,-0.119434,-0.048538,
Iterations,0.118057,0.019637,1.0,-0.204798,-0.08345,
Walks,-0.149976,-0.119434,-0.204798,1.0,-0.010558,
Dimensions,-0.652409,-0.048538,-0.08345,-0.010558,1.0,
Error occurred,,,,,,


# References


[1] McKinney, W., 2022. Pandas documentation. pandas documentation - pandas 1.4.3 documentation. Available at: https://pandas.pydata.org/docs/ [Accessed August 12, 2022]. 

[2] Oliphant, T., 2022. NumPy documentation. NumPy documentation - NumPy v1.23 Manual. Available at: https://numpy.org/doc/stable/ [Accessed August 12, 2022]. 

[3] Data61, 2022. Stellargraph documentation. Welcome to StellarGraph's documentation! - StellarGraph 1.2.1 documentation. Available at: https://stellargraph.readthedocs.io/en/stable/ [Accessed August 12, 2022].

[4] Rozemberczki, B., 2022. Karate Club documentation. Karate Club Documentation - karateclub documentation. Available at: https://karateclub.readthedocs.io/en/latest/ [Accessed August 12, 2022]. 

[5] Hagberg, A., Swart, P. &amp; Schult, D., 2014. NetworkX documentation. NetworkX documentation - NetworkX 1.9 documentation. Available at: https://networkx.org/documentation/networkx-1.9/ [Accessed August 12, 2022]. 

[6] Hunter, J.D., 2022. Matplotlib 3.5.3 documentation. Matplotlib documentation - Matplotlib 3.5.3 documentation. Available at: https://matplotlib.org/stable/index.html [Accessed August 12, 2022]. 

[7] Cournapeau, D., 2022. scikit-learn Machine Learning in Python. scikit. Available at: https://scikit-learn.org/stable/ [Accessed August 12, 2022]. 