### Model achitecture and hyperparameter tuning 
No we are ready to define embeddings model architecture and perform hyperparameter tuning using maggy.
![Training Dataset](./images/maggy_hp.png)

We are going to use StellarGraph library to compute node embeddings. StellarGraph supports loading data via Pandas DataFrames, NumPy arrays, Neo4j and NetworkX graphs. 

---
**NOTE**:

Loading large scale dataset in to StellarGraph for training can not be handled with above mentioned fameworks. It will require loading data using frameworks such as `tf.data`. 

If your training datasets measure from couple of GB to 100s of GBs or even TBs contact us at Logical Clocks and we will help you to setup distributed training pipelines. 

---

## Install required libraries. Such as StellarGraph. 
##### To do this 1st navigate to python environment:
![Incremental Feature Engineering](./images/navigate_to_python.gif)

##### And then install library by name and version
![Incremental Feature Engineering](./images/install_lib_by_name.gif)

## Define hyperparameter searchspace for maggy

In [1]:
from maggy import Searchspace
sp = Searchspace(walk_number=('DISCRETE', [2, 3]), walk_length=('INTEGER', [2, 3]) , emb_size=('DISCRETE', [16, 32]))

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log
12,application_1631881319280_0004,pyspark,idle,Link,Link


SparkSession available as 'spark'.
Hyperparameter added: walk_number
Hyperparameter added: walk_length
Hyperparameter added: emb_size

### Define hopsworks experiments wrapper function and put all the training logic there. 

In [2]:
def embeddings_computer(walk_number, walk_length, emb_size):
    
    import os
    import sys
    import uuid
    import random    
    
    # pandas and numpy
    import pandas as pd
    import numpy as np

    # hops utility libraries
    from hops import hdfs
    from hops import pandas_helper as pandas
    from hops import model as hops_model
    from hops import tensorboard
    
    # tensorlfow
    import tensorflow as tf
    from tensorflow import keras  
    
    # stellargraph library for graph neural networks
    import stellargraph as sg
    from stellargraph import StellarGraph
    from stellargraph import StellarDiGraph
    from stellargraph.data import BiasedRandomWalk
    from stellargraph.data import UnsupervisedSampler
    from stellargraph.mapper import Node2VecLinkGenerator, Node2VecNodeGenerator
    from stellargraph.layer import Node2Vec, link_classification

    ## Connect to hsfs and retrieve datasets for training and evaluation 
    import hsfs
    # Create a connection
    connection = hsfs.connection(engine = "training")
    # Get the feature store handle for the project's feature store
    fs = connection.get_feature_store()

    # Get nodes and edges training datasets metadata objects from hsfs
    node_td = fs.get_training_dataset("node_td", 1)
    edge_td = fs.get_training_dataset("edges_td", 1)
    
    # Get fg as pandas dataframe
    node_pdf = node_td.read()
    edge_pdf = edge_td.read()

    # define static hyperparameters
    batch_size = 512
    epochs = 10
    num_samples = [20, 20]
    layer_sizes = [100, 100]
    learning_rate = 1e-2


    # construct Stellargraph object for training
    node_data = pd.DataFrame(node_pdf[['type']], index=node_pdf['id'])
        
    print('Defining StellarDiGraph')
    G =StellarDiGraph(node_data,
                      edges=edge_pdf, 
                      edge_type_column="tx_type")


    nodes = list(G.nodes())

    walker = BiasedRandomWalk(
        G,
        n=walk_number,
        length=walk_length,
        p=0.5,  # defines probability, 1/p, of returning to source node
        q=2.0,  # defines probability, 1/q, for moving to a node away from the source node
    )
    unsupervised_samples = UnsupervisedSampler(G, nodes=list(G.nodes()), walker=walker)
    generator = Node2VecLinkGenerator(G, batch_size)
    node2vec = Node2Vec(emb_size, generator=generator)
    
    x_inp, x_out = node2vec.in_out_tensors()
    prediction = link_classification(
        output_dim=1, output_act="sigmoid", edge_embedding_method="dot"
    )(x_out)

    # define and train keras model
    print('Defining the model')
    model = keras.Model(inputs=x_inp, outputs=prediction)

    model.compile(
        optimizer=keras.optimizers.Adam(lr=1e-3),
        loss=keras.losses.binary_crossentropy,
        metrics=[keras.metrics.binary_accuracy],
    )
        
    # Save the weights using the `checkpoint_path` format
    
    print('Training the model')
    history = model.fit(
        generator.flow(unsupervised_samples),
        epochs=epochs,
        verbose=0,
        use_multiprocessing=False,
        workers=4,
        shuffle=True
    )

    binary_accuracy = history.history['binary_accuracy'][-1]
        
    return binary_accuracy    

## Use above experiments wrapper function to conduct maggy hyperparameter search experiments.

In [3]:
from maggy import experiment
result = experiment.lagom(embeddings_computer, 
                           searchspace=sp, 
                           optimizer='randomsearch', 
                           direction='max',
                           num_trials=2, 
                           name='EMBEDDINGS',
                           hb_interval=5, 
                           es_interval=5,
                           es_min=5
                          )

HBox(children=(FloatProgress(value=0.0, description='Maggy experiment', max=2.0, style=ProgressStyle(descriptiâ€¦

0: Connected. Call `.close()` to terminate connection gracefully.
0: Defining StellarDiGraph
0: link_classification: using 'dot' method to combine node embeddings into edge embeddings
0: 
0: Defining the model
0: Training the model
0: 
0: 
0: Connected. Call `.close()` to terminate connection gracefully.
0: Defining StellarDiGraph
0: link_classification: using 'dot' method to combine node embeddings into edge embeddings
0: Defining the model
0: Training the model
Started Maggy Experiment: EMBEDDINGS, application_1631881319280_0004, run 1

------ RandomSearch Results ------ direction(max) 
BEST combination {"walk_number": 3, "walk_length": 3, "emb_size": 32} -- metric 0.6784741878509521
WORST combination {"walk_number": 2, "walk_length": 3, "emb_size": 16} -- metric 0.6134476661682129
AVERAGE metric -- 0.6459609270095825
EARLY STOPPED Trials -- 0
Total job time 0 hours, 1 minutes, 57 seconds

Finished Experiment


## Output best hyperparameter as a json file, so we can use it next step

In [4]:
import json
from hops import hdfs
EMBEDDINGS_HYPERPARAMS_FILE = 'embeddings_best_hp.json'
hdfs.dump(json.dumps(result['best_hp']), "Resources/" + EMBEDDINGS_HYPERPARAMS_FILE)

### Managing experiments
Experiments service provides a unified view of all the experiments run using the `experiment` module.
<br>
As demonstrated in the gif it provides general information about the experiment and the resulting metric. Experiments can be visualized meanwhile or after training in a TensorBoard.
<br>
<br>
![Image7-Monitor.png](./images/experiments.gif)