# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 03: Model training & UI Exploration</span>

<span style="font-width:bold; font-size: 1.4rem;">In this last notebook, we will train a model on the dataset we created in the previous tutorial. We will train our model using standard Python and Scikit-learn, although it could just as well be trained with other machine learning frameworks such as PySpark, TensorFlow, and PyTorch. We will also show some of the exploration that can be done in Hopsworks, notably the search functions and the lineage. </span>

## **🗒️ This notebook is divided in 3 main sections:** 
1. **Loading the training data**
2. **Train the model**
3. **Explore feature groups and views** via the UI.

![tutorial-flow](images/03_model.png)

## Connect to hsfs and retrieve datasets for training and evaluation 

In [3]:
import hsfs
# Create a connection
connection = hsfs.connection()
# Get the feature store handle for the project's feature store
fs = connection.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.




### get feature view objects 

In [4]:
feature_view_non_sar = fs.get_feature_view('non_sar_transactions_view', 1)
feature_view_sar = fs.get_feature_view('sar_transactions_view', 1)

## <span style="color:#ff5f27;"> ✨ Load Training Data </span>

First, we'll need to fetch the training dataset that we created in the previous notebook. 

In [8]:
version_non_sar_td, non_sar_td = feature_view_non_sar.get_training_dataset()

2022-06-07 17:27:30,047 INFO: USE `aml_demo_featurestore`
2022-06-07 17:27:30,862 INFO: WITH right_fg0 AS (SELECT *
FROM (SELECT `fg2`.`type` `type`, `fg2`.`is_sar` `is_sar`, `fg2`.`id` `join_pk_id`, `fg2`.`tran_timestamp` `join_evt_tran_timestamp`, `fg0`.`monthly_in_count` `monthly_in_count`, `fg0`.`monthly_in_total_amount` `monthly_in_total_amount`, `fg0`.`monthly_in_mean_amount` `monthly_in_mean_amount`, `fg0`.`monthly_in_std_amount` `monthly_in_std_amount`, `fg0`.`monthly_out_count` `monthly_out_count`, `fg0`.`monthly_out_total_amount` `monthly_out_total_amount`, `fg0`.`monthly_out_mean_amount` `monthly_out_mean_amount`, `fg0`.`monthly_out_std_amount` `monthly_out_std_amount`, RANK() OVER (PARTITION BY `fg2`.`id`, `fg2`.`tran_timestamp` ORDER BY `fg0`.`tran_timestamp` DESC) pit_rank_hopsworks
FROM `aml_demo_featurestore`.`party_fg_1` `fg2`
INNER JOIN `aml_demo_featurestore`.`transactions_monthly_fg_1` `fg0` ON `fg2`.`id` = `fg0`.`id` AND `fg2`.`tran_timestamp` >= `fg0`.`tran_timest



In [9]:
version_sar_td, sar_td = feature_view_non_sar.get_training_dataset()

2022-06-07 17:32:09,009 INFO: USE `aml_demo_featurestore`
2022-06-07 17:32:09,844 INFO: WITH right_fg0 AS (SELECT *
FROM (SELECT `fg2`.`type` `type`, `fg2`.`is_sar` `is_sar`, `fg2`.`id` `join_pk_id`, `fg2`.`tran_timestamp` `join_evt_tran_timestamp`, `fg0`.`monthly_in_count` `monthly_in_count`, `fg0`.`monthly_in_total_amount` `monthly_in_total_amount`, `fg0`.`monthly_in_mean_amount` `monthly_in_mean_amount`, `fg0`.`monthly_in_std_amount` `monthly_in_std_amount`, `fg0`.`monthly_out_count` `monthly_out_count`, `fg0`.`monthly_out_total_amount` `monthly_out_total_amount`, `fg0`.`monthly_out_mean_amount` `monthly_out_mean_amount`, `fg0`.`monthly_out_std_amount` `monthly_out_std_amount`, RANK() OVER (PARTITION BY `fg2`.`id`, `fg2`.`tran_timestamp` ORDER BY `fg0`.`tran_timestamp` DESC) pit_rank_hopsworks
FROM `aml_demo_featurestore`.`party_fg_1` `fg2`
INNER JOIN `aml_demo_featurestore`.`transactions_monthly_fg_1` `fg0` ON `fg2`.`id` = `fg0`.`id` AND `fg2`.`tran_timestamp` >= `fg0`.`tran_timest



## <span style="color:#ff5f27;"> 🏃 Train Model</span>

Next we'll train a model. Here, we set the class weight of the positive class to be twice as big as the negative class.

## define a model

In [None]:
import tensorflow as tf


def make_discriminator_model_cnn(input_dim):
    inputs = tf.keras.layers.Input(shape=(input_dim[0], input_dim[1]))
    x = tf.keras.layers.Conv1D(filters=128, kernel_size=1, padding='same', kernel_initializer="uniform")(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.MaxPooling1D(pool_size=2, padding='same')(x)
    x = tf.keras.layers.Conv1D(filters=64, kernel_size=1, padding='same', kernel_initializer="uniform")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)

    # dense output layer
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.LeakyReLU(0.2)(x)
    x = tf.keras.layers.Dense(128)(x)
    x = tf.keras.layers.LeakyReLU(0.2)(x)
    prediction = tf.keras.layers.Dense(1)(x)
    return tf.keras.Model(inputs=inputs, outputs=prediction)


def make_generator_model_cnn(input_dim, latent_dim):
    latent_inputs = tf.keras.layers.Input(shape=(latent_dim[0], latent_dim[1]))
    x = tf.keras.layers.Conv1D(filters=4, kernel_size=1, padding='same', kernel_initializer="uniform")(latent_inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.UpSampling1D(2)(x)
    x = tf.keras.layers.Conv1D(filters=8, kernel_size=1, padding='same', kernel_initializer="uniform")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.UpSampling1D(2)(x)
    x = tf.keras.layers.Conv1D(filters=16, kernel_size=1, padding='same', kernel_initializer="uniform")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.UpSampling1D(2)(x)
    x = tf.keras.layers.Conv1D(filters=input_dim[1], kernel_size=1, padding='same', kernel_initializer="uniform")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    return tf.keras.Model(inputs=latent_inputs, outputs=x)


def make_encoder_model_cnn(input_dim):
    inputs = tf.keras.layers.Input(shape=(input_dim[0], input_dim[1]))
    x = tf.keras.layers.Conv1D(filters=16, kernel_size=1, padding='same', kernel_initializer="uniform")(inputs)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.MaxPooling1D(pool_size=2, padding='same')(x)
    x = tf.keras.layers.Conv1D(filters=8, kernel_size=1, padding='same', kernel_initializer="uniform")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.MaxPooling1D(pool_size=2, padding='same')(x)
    x = tf.keras.layers.Conv1D(filters=4, kernel_size=1, padding='same', kernel_initializer="uniform")(x)
    x = tf.keras.layers.BatchNormalization()(x)
    x = tf.keras.layers.LeakyReLU(alpha=0.2)(x)
    x = tf.keras.layers.MaxPooling1D(pool_size=2, padding='same')(x)
    return tf.keras.Model(inputs=inputs, outputs=x)


def encoder_loss(generated_fake_data, generator_reconstructed_encoded_fake_data, global_batch_size):
    generator_reconstracted_data = tf.cast(generator_reconstructed_encoded_fake_data, tf.float32)
    mse = tf.keras.losses.MeanSquaredError(reduction=tf.keras.losses.Reduction.NONE)
    per_batch_loss = mse(generated_fake_data, generator_reconstracted_data)
    # per_batch_loss = tf.math.reduce_sum(tf.math.pow(generated_fake_data - generator_reconstracted_data, 2), axis=[-1])
    beta_cycle_gen = 10.0
    per_batch_loss = per_batch_loss * beta_cycle_gen
    # loss = tf.nn.compute_average_loss(per_batch_loss, global_batch_size=BATCH_SIZE)
    return tf.reduce_sum(per_batch_loss) * (1. / global_batch_size)

# Define the loss functions for the discriminator,
# which should be (fake_loss - real_loss).
# We will add the gradient penalty later to this loss function.
def discriminator_loss(real_sample, fake_sample, beta_cycle_gen, global_batch_size):
    real_loss = tf.reduce_mean(real_sample)
    fake_loss = tf.reduce_mean(fake_sample)
    per_batch_loss = fake_loss - real_loss
    per_batch_loss = per_batch_loss * beta_cycle_gen
    return tf.reduce_sum(per_batch_loss) * (1. / global_batch_size)


# Define the loss functions for the generator.
def generator_loss(fake_sample, global_batch_size):
    per_batch_loss = -tf.reduce_mean(fake_sample)
    return tf.reduce_sum(per_batch_loss) * (1. / global_batch_size)


def experiment_wrapper():
    
    import os
    import sys
    import uuid
    import random  
    import numpy as np
        
    from pydoop import hdfs as pydoop_hdfs
    from hops import hdfs
    from hops import tensorboard
    from hops import model as hops_model
    
    import tensorflow as tf
    
    from adversarialaml import gan_enc_anomaly_model
    from adversarialaml import gan_enc_anomaly_trainer
    from adversarialaml import orbit
    
    ######################################
    NCCL_SOCKET_NTHREADS = '16'
    NCCL_NSOCKS_PERTHREAD = '16'
    os.environ['NCCL_IB_DISABLE'] = '1'
    os.environ['NCCL_DEBUG'] = 'INFO'
    os.environ['NCCL_SOCKET_NTHREADS'] = NCCL_SOCKET_NTHREADS
    os.environ['NCCL_NSOCKS_PERTHREAD'] = NCCL_NSOCKS_PERTHREAD
    
    """
    from tensorflow.keras import mixed_precision
    policy = tf.keras.mixed_precision.Policy('mixed_float16')
    mixed_precision.set_global_policy(policy)
    """
    
    discriminator_bytes_per_pack= 4 * 1024 * 1024
    generator_bytes_per_pack= 4 * 1024 * 1024
    encoder_bytes_per_pack= 1 * 1024 * 1024
    
    # Define distribution strategy
    options = tf.distribute.experimental.CommunicationOptions(
        #bytes_per_pack=1 * 1024 * 1024,
        #timeout_seconds=120.0,
        implementation=tf.distribute.experimental.CommunicationImplementation.NCCL
    )
    # Define distribution strategy
    strategy = tf.distribute.MultiWorkerMirroredStrategy(communication_options=options)
    #strategy = tf.distribute.OneDeviceStrategy(device="/gpu:0")

    data_options = tf.data.Options()
    data_options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
        
    ######################################
    BATCH_SIZE_PER_REPLICA = 4096
    WINDOW_SIZE = 32
    EPOCHS = 10000000 
    
    # Define global batch size
    BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
    TOTAL_SAMPLES = 100000
    STEPS_PER_EPOCH=5000000 #TOTAL_SAMPLES//BATCH_SIZE
    VALIDATION_STEPS=2000          

    INPUT_DIM = [32, 1] 
    LATENT_DIM = [4, 4] # TODO (davit): this is hard coded 
    D_STEPS = 5
    GP_WEIGHT = 10 
    
    def windowed_dataset(dataset, window_size):
        ds = dataset.window(window_size, shift=1, drop_remainder=True)
        ds = ds.flat_map(lambda w: w.batch(window_size))
        return ds

    def input_fn(batch_size, window_size, epochs, steps_per_epoch):
      x = np.random.random((100001, 1))
      x = tf.cast(x, tf.float32)
      dataset = tf.data.Dataset.from_tensor_slices(x)
      dataset = windowed_dataset(dataset, window_size)
      dataset = dataset.repeat(epochs*steps_per_epoch)
      dataset = dataset.cache()    
      dataset = dataset.batch(batch_size)
      dataset = dataset.prefetch(50000000) #tf.data.experimental.AUTOTUNE
        
      options = tf.data.Options()
      options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.OFF
      return dataset.with_options(options)

    train_dataset = input_fn(BATCH_SIZE, WINDOW_SIZE, EPOCHS, STEPS_PER_EPOCH)
    
    #####################################################
    # construct model under distribution strategy scope
    with strategy.scope(): 
        discriminator_model = gan_enc_anomaly_model.make_discriminator_model_cnn(INPUT_DIM)
        generator_model = gan_enc_anomaly_model.make_generator_model_cnn(INPUT_DIM, LATENT_DIM)
        encoder_model = gan_enc_anomaly_model.make_encoder_model_cnn(INPUT_DIM)
        #train_dataset = strategy.distribute_datasets_from_function(
        #  lambda input_context: data_input(train_dataset_files, input_context, BATCH_SIZE, WINDOW_SIZE, EPOCHS))        
        
    # Define optimizers
    with strategy.scope():
        discriminator_optimizer = tf.keras.optimizers.Adam(lr=0.0001)
        generator_optimizer = tf.keras.optimizers.Adam(lr=0.0001)
        encoder_optimizer = tf.keras.optimizers.Adam(lr=0.0001)
        
    # construct model under distribution strategy scope
    with strategy.scope(): 
        discriminator_model = make_discriminator_model_ff()
        generator_model = make_generator_model_ff()
        encoder_model = make_encoder_model_ff()

        

## <span style="color:#ff5f27;">  Use the model to score transactions </span>
We trained model based on January - February data. Now lets retrieve March data and score whether transactions are fraudulend or not   
