### Anomaly detection model achitecture and hyperparameter tuning 
No we are ready to define GAN based anomaly detection model architecture and perform hyperparameter tuning using maggy. For more details about this model refer to https://arxiv.org/pdf/1905.11034.pdf.
![Training Dataset](./images/maggy_hp.png)

In [1]:
spark

Starting Spark application


ID,Application ID,Kind,State,Spark UI,Driver log
49,application_1651406915314_0007,pyspark,idle,Link,Link


SparkSession available as 'spark'.
<pyspark.sql.session.SparkSession object at 0x7f179e574850>

In [2]:
from hops import experiment
from hops import hdfs
import json

In [3]:
best_hyperparams_path = "Resources/embeddings_best_hp.json"
best_hyperparams = json.loads(hdfs.load(best_hyperparams_path))
args_dict = {}
for key in best_hyperparams.keys():
    args_dict[key] = [best_hyperparams[key]]

### Define hopsworks experiments wrapper function and put all the training logic there. 

In [5]:
def experiment_wrapper(
    latent_dim,
    discriminator_n_layers,
    discriminator_activation_fn,
    discriminator_middle_layer_activation_fn,    
    
    discriminator_batch_norm,
    discriminator_dropout_rate, 
    discriminator_learning_rate,
    discriminator_extra_steps,

    generator_start_n_units,
    generator_n_layers,
    generator_activation_fn,
    generator_middle_layer_activation_fn,        
    generator_batch_norm,
    generator_dropout_rate, 
    generator_learning_rate,

    encoder_start_n_units,
    encoder_n_layers,
    encoder_activation_fn,
    encoder_middle_layer_activation_fn,        
    encoder_batch_norm,
    encoder_dropout_rate, 
    encoder_learning_rate):
        
    import tensorflow as tf
    from adversarialaml.gan_enc_ano import GanAnomalyDetector,  GanAnomalyMonitor 
    from hops import tensorboard

    ## Connect to hsfs and retrieve datasets for training and evaluation 
    import hsfs
    # Create a connection
    connection = hsfs.connection(engine = "training")
    # Get the feature store handle for the project's feature store
    fs = connection.get_feature_store()

    ben_td = fs.get_training_dataset("gan_non_sar_training_df", 1)
    eval_td = fs.get_training_dataset("gan_eval_df", 1)
    
    int_to_act_fn = {
        1: 'linear',        
        2: 'relu',
        3: 'leaky_relu',
        4: 'selu',
        5: 'tanh'
    }
    
    # Set the number of epochs for trainining.
    EPOCHS = 2
    BATCH_SIZE = 32
    TOTAL_SAMPLES = 6366
    STEPS_PER_EPOCH=TOTAL_SAMPLES//BATCH_SIZE

    train_input = ben_td.tf_data(target_name='is_sar', is_training=True)
    train_input_processed = train_input.tf_record_dataset(process=True, batch_size=BATCH_SIZE, num_epochs=EPOCHS)
    eval_input = eval_td.tf_data(target_name='is_sar', is_training=True)
    eval_input_processed = eval_input.tf_record_dataset(process=True, batch_size=1, num_epochs=EPOCHS)    
        
    discriminator_activation_fn=int_to_act_fn[discriminator_activation_fn]
    discriminator_middle_layer_activation_fn=int_to_act_fn[discriminator_middle_layer_activation_fn]
    
    if discriminator_dropout_rate > 0.0:
        discriminator_batch_dropout = True
    else:
        discriminator_batch_dropout = False
    
    if discriminator_dropout_rate > 0.0:
        generator_batch_dropout=True
    else:
        generator_batch_dropout=False

    if encoder_dropout_rate > 0.0:
        encoder_batch_dropout=True
    else:
        encoder_batch_dropout=False   


    if discriminator_batch_norm==0:
        discriminator_batch_norm = False
    else:
        discriminator_batch_norm = True

    generator_activation_fn=int_to_act_fn[generator_activation_fn]
    generator_middle_layer_activation_fn=int_to_act_fn[generator_middle_layer_activation_fn]

    if generator_batch_norm==0:
        generator_batch_norm = False
    else:
        generator_batch_norm = True

    encoder_activation_fn=int_to_act_fn[encoder_activation_fn]
    encoder_middle_layer_activation_fn=int_to_act_fn[encoder_middle_layer_activation_fn]
                
    if encoder_batch_norm==0:
        encoder_batch_norm=False
    else:
        encoder_batch_norm=True        
        

    discriminator_double_neurons=False
    discriminator_bottleneck_neurons=True
    generator_double_neurons=True
    generator_bottleneck_neurons=False
        
    # Instantiate the GanAnomalyDetector model.
    gan_anomaly_detector = GanAnomalyDetector(
                input_dim=args_dict['emb_size'][0],
                latent_dim=latent_dim,

                discriminator_start_n_units=args_dict['emb_size'][0],
                discriminator_n_layers=discriminator_n_layers,
                discriminator_activation_fn=discriminator_activation_fn,
                discriminator_middle_layer_activation_fn=discriminator_middle_layer_activation_fn,
                discriminator_double_neurons=discriminator_double_neurons,
                discriminator_bottleneck_neurons=discriminator_bottleneck_neurons,
                discriminator_batch_norm=discriminator_batch_norm,
                discriminator_batch_dropout=discriminator_batch_dropout,
                discriminator_dropout_rate=discriminator_dropout_rate,
                discriminator_learning_rate=discriminator_learning_rate,
                discriminator_extra_steps=discriminator_extra_steps,

                generator_start_n_units=generator_start_n_units,
                generator_n_layers=generator_n_layers,
                generator_activation_fn=generator_activation_fn,
                generator_middle_layer_activation_fn=generator_middle_layer_activation_fn,        
                generator_double_neurons=generator_double_neurons,
                generator_bottleneck_neurons=generator_bottleneck_neurons,
                generator_batch_norm=generator_batch_norm,
                generator_batch_dropout=generator_batch_dropout,
                generator_dropout_rate=generator_dropout_rate,
                generator_learning_rate=generator_learning_rate,

                encoder_start_n_units=encoder_start_n_units,
                encoder_n_layers=encoder_n_layers,
                encoder_activation_fn=encoder_activation_fn,
                encoder_middle_layer_activation_fn=encoder_middle_layer_activation_fn,        
                encoder_batch_norm=encoder_batch_norm,
                encoder_batch_dropout=encoder_batch_dropout,
                encoder_dropout_rate=encoder_dropout_rate,
                encoder_learning_rate=encoder_learning_rate,
    )
    
    # Compile the WGAN model.
    gan_anomaly_detector.compile()
    
    # Start training the model.
    history = gan_anomaly_detector.fit(train_input_processed, epochs=EPOCHS, steps_per_epoch=STEPS_PER_EPOCH)

    metrics={'metric': history.history["g_loss"][0]} 
    
    return metrics

## The searchspace can be instantiated with parameters

In [6]:
from maggy import Searchspace
sp = Searchspace(

    latent_dim=('DISCRETE', [16, 32]),
    discriminator_n_layers=('INTEGER', [2, 3]),
    discriminator_activation_fn=('INTEGER', [1, 4]),
    discriminator_middle_layer_activation_fn=('INTEGER', [1, 4]),    
    discriminator_batch_norm=('INTEGER', [0, 1]), 
    discriminator_dropout_rate=('DOUBLE', [0.0, 0.1]), 
    discriminator_learning_rate=('DOUBLE', [0.0001, 0.0002]),
    discriminator_extra_steps=('INTEGER', [2, 3]),

    generator_start_n_units=('DISCRETE', [16, 32]),
    generator_n_layers=('INTEGER', [2, 3]),
    generator_activation_fn=('INTEGER', [1, 5]),
    generator_middle_layer_activation_fn=('INTEGER', [1, 4]),    
    generator_batch_norm=('INTEGER', [0, 1]),
    generator_dropout_rate=('DISCRETE', [0.0, 0.1]), 
    generator_learning_rate=('DISCRETE', [0.0001, 0.0002]),

    encoder_start_n_units=('DISCRETE', [16, 32]),
    encoder_n_layers=('INTEGER', [2, 3]),
    encoder_activation_fn=('INTEGER', [2, 4]),
    encoder_middle_layer_activation_fn=('INTEGER', [1, 4]),        
    encoder_batch_norm=('INTEGER', [0, 1]),
    encoder_dropout_rate=('DOUBLE', [0.0, 0.1]), 
    encoder_learning_rate=('DOUBLE', [0.0001, 0.0002]),
)

Hyperparameter added: latent_dim
Hyperparameter added: discriminator_n_layers
Hyperparameter added: discriminator_activation_fn
Hyperparameter added: discriminator_middle_layer_activation_fn
Hyperparameter added: discriminator_batch_norm
Hyperparameter added: discriminator_dropout_rate
Hyperparameter added: discriminator_learning_rate
Hyperparameter added: discriminator_extra_steps
Hyperparameter added: generator_start_n_units
Hyperparameter added: generator_n_layers
Hyperparameter added: generator_activation_fn
Hyperparameter added: generator_middle_layer_activation_fn
Hyperparameter added: generator_batch_norm
Hyperparameter added: generator_dropout_rate
Hyperparameter added: generator_learning_rate
Hyperparameter added: encoder_start_n_units
Hyperparameter added: encoder_n_layers
Hyperparameter added: encoder_activation_fn
Hyperparameter added: encoder_middle_layer_activation_fn
Hyperparameter added: encoder_batch_norm
Hyperparameter added: encoder_dropout_rate
Hyperparameter added:

## Use above experiments wrapper function to conduct hops training experiments.

In [7]:
from maggy import experiment
result = experiment.lagom(experiment_wrapper, 
                           searchspace=sp, 
                           optimizer='randomsearch', 
                           direction='min',
                           num_trials=2, 
                           name='ganaml',
                           hb_interval=5, 
                           es_interval=5,
                           es_min=5
                          )

HBox(children=(FloatProgress(value=0.0, description='Maggy experiment', max=2.0, style=ProgressStyle(descripti…

0: Connected. Call `.close()` to terminate connection gracefully.
0: 
0: 
0: 
0: 
0: 
0: 
0: Epoch 1/2
0: 
0: 
1: Connected. Call `.close()` to terminate connection gracefully.
1: 
1: 
1: 
1: 
1: 
1: 
1: Epoch 1/2
1: 
1: 
0: Epoch 2/2
1: Epoch 2/2
Started Maggy Experiment: ganaml, application_1651406915314_0007, run 1

------ RandomSearch Results ------ direction(min) 
BEST combination {"latent_dim": 32, "discriminator_n_layers": 3, "discriminator_activation_fn": 1, "discriminator_middle_layer_activation_fn": 4, "discriminator_batch_norm": 1, "discriminator_dropout_rate": 0.0352678688936785, "discriminator_learning_rate": 0.00011325225509138715, "discriminator_extra_steps": 3, "generator_start_n_units": 16, "generator_n_layers": 3, "generator_activation_fn": 4, "generator_middle_layer_activation_fn": 3, "generator_batch_norm": 0, "generator_dropout_rate": 0.0, "generator_learning_rate": 0.0001, "encoder_start_n_units": 32, "encoder_n_layers": 3, "encoder_activation_fn": 2, "encoder_mid

In [8]:
import json
from hops import hdfs
EMBEDDINGS_HYPERPARAMS_FILE = 'gan_best_hp.json'
hdfs.dump(json.dumps(result['best_hp']), "Resources/" + EMBEDDINGS_HYPERPARAMS_FILE)

### Managing experiments
Experiments service provides a unified view of all the experiments run using the `experiment` module.
<br>
As demonstrated in the gif it provides general information about the experiment and the resulting metric. Experiments can be visualized meanwhile or after training in a TensorBoard.
<br>
<br>
![Image7-Monitor.png](./images/experiments.gif)