# BiLSTM with _goalReduced_
## Characteristics
1. **Network type**: BiLSTM cells with Attention System<br><br>
2. **goalReduced**: the goal of this network is predict **one** of the goal letter<br>
    2.1 You must train 4 of those networks, one for each goal's letter<br>
    2.2 The network that predict position *N* of the goal take in input *N-1* goal's letters<br>

-----------------
## 1. Prepare the environment

In [1]:
#!rm -r model_*
!rm -r __py*
!rm tf.log
!rm temp_model.info

rm: cannot remove '__py*': No such file or directory
rm: cannot remove 'tf.log': No such file or directory
rm: cannot remove 'temp_model.info': No such file or directory


In [2]:
%load_ext tensorboard.notebook

In [3]:
import tensorflow as tf
from tensorflow import data
from tensorflow.nn import rnn_cell
from tensorflow.nn import static_bidirectional_rnn
import multiprocessing
import random as rnd
import logging
import sys
import json
import time
import datetime

tf.logging.info('version:{}'.format(tf.__version__))
tf.logging.set_verbosity(tf.logging.DEBUG)
random_int = rnd.randint(0,1000)

INFO:tensorflow:version:1.13.1


--------------------
## 2. Hyperparameters

In [4]:
# General params
NETWORK_TYPE, GOAL_TYPE = 'BiLSTM_AttentionSystem', 'goalReduced'
LOGGING_TO_FILE = True

# Debug params
DEBUG_HOOK = False


# Input params

CSV_PATH_TRAIN = "../database/dbML/knuthFast/knuthFast_2019-07-30_0.9_75844_train.csv"
CSV_PATH_EVAL = "../database/dbML/knuthFast/knuthFast_2019-07-30_0.9_8516_eval.csv"


# Goal position
GOAL_POS = 0 #[0-3]


# Model dir
MODEL_DIR_PATH = "../networks/Trained/hopeful/model_dir_647_0/"


# Train params
LEN_TRAIN_DB = 75844
EPOCHS = 10
ALPHA = 0.001


# Network
FW_FORGET_BIAS = 1
BW_FORGET_BIAS = 1
LSTM_SIZE = 256
ATTENTION_SIZE = 256
MAXOUT_SIZE = 256
DROP_REPRESENTATION = 0.3 

L2_ACTIVATED = True
L2_SCALE = 0.001


# Input 
BATCH_TRAIN = 4
MULTI_THREADING = True

# Eval
BATCH_EVAL = 4
SHUFFLE_EVAL = True
EVAL_STEPS = None
NUM_EVALS = EPOCHS

### 2.1 Epochs to steps

In [5]:
STEPS = int(EPOCHS * (LEN_TRAIN_DB / BATCH_TRAIN))
SAVE_CHECKPOINTS_STEPS = int(STEPS / NUM_EVALS) # Do eval each N steps (and save the model)

print('[INFO] {} epochs are equivalent to {} steps'.format(EPOCHS, STEPS))
print('[INFO] Eval will be executed each {} steps for a total of {} times'.format(SAVE_CHECKPOINTS_STEPS, NUM_EVALS))

[INFO] 10 epochs are equivalent to 189610 steps
[INFO] Eval will be executed each 18961 steps for a total of 10 times


## 3. Persistance

In [6]:
# What save in model.info
model_params = {}
model_params['_NETWORK_TYPE'] = NETWORK_TYPE
model_params['CSV_PATH_TRAIN'] = CSV_PATH_TRAIN
model_params['CSV_PATH_EVAL'] = CSV_PATH_EVAL
model_params['MODEL_DIR_PATH'] = MODEL_DIR_PATH
model_params['EPOCHS'] = EPOCHS
model_params['ALPHA'] = ALPHA
model_params['FW_FORGET_BIAS'] = FW_FORGET_BIAS
model_params['BW_FORGET_BIAS'] = BW_FORGET_BIAS
model_params['LSTM_SIZE'] = LSTM_SIZE
model_params['NUM_EVALS'] = NUM_EVALS
model_params['BATCH_TRAIN'] = BATCH_TRAIN
model_params['BATCH_EVAL'] = BATCH_EVAL
model_params['SHUFFLE_EVAL'] = SHUFFLE_EVAL
model_params['EVAL_STEPS'] = EVAL_STEPS
model_params['MULTI_THREADING'] = MULTI_THREADING
model_params['INPUT_ENCODING'] = 'Peg: Hot encoding, Tips: One Hot Encoding'
model_params['GOAL'] = GOAL_TYPE
model_params['ATTENTION_SIZE'] = ATTENTION_SIZE
model_params['MAXOUT_SIZE'] = MAXOUT_SIZE
model_params['DROP_REPRESENTATION']= DROP_REPRESENTATION
model_params['GOAL_POS'] = GOAL_POS
model_params['loss_fn'] = "mean_pairwise_squared_error"
if L2_ACTIVATED: model_params['L2_SCALE'] = L2_SCALE

In [7]:
if LOGGING_TO_FILE:
    tf_log = logging.getLogger('tensorflow')
    for el in tf_log.handlers:
        tf_log.removeHandler(el)
    fh = logging.FileHandler('tf.log')
    fh.setLevel(logging.DEBUG)
    tf_log.addHandler(fh)

------------------------------
## 4. Estimator components
### 4.1 Input functions

In [8]:
FEATURES_NAME = ['Guess 1', 'Guess 2', 'Guess 3', 'Guess 4', 'Guess 5',
          'Guess 6', 'Guess 7', 'Guess 8', 'Guess 9', 'Guess 10']

TARGET_NAME = 'PASSWORD'

HEADER = FEATURES_NAME + [TARGET_NAME]

HEADER_DEFAULTS = [['<pad>'], ['<pad>'],['<pad>'],['<pad>'],['<pad>'],['<pad>'],
                  ['<pad>'],['<pad>'],['<pad>'],['<pad>'],['<pad>']]

def input_fn_builder(files_name_pattern, mode, skip_header_lines=0, num_epochs=1, batch_size=32):   
    '''
    Input function builder, the input_fn returnet could be used
    in order to feed an tf.estimator.
    
    # Params
        mode = {"train", "eval"}
    '''    
    
    # utils
    def parse_csv_row(csv_row):
        columns = tf.decode_csv(csv_row, record_defaults=HEADER_DEFAULTS, field_delim=',')
        features = dict(zip(HEADER, columns))    
        target = features[TARGET_NAME]
        return features, target

    # input function definition
    def input_fn_def(files_name_pattern, mode, skip_header_lines, num_epochs, batch_size):
        shuffle = True if mode == tf.estimator.ModeKeys.TRAIN else False
        num_threads = multiprocessing.cpu_count() if MULTI_THREADING else 1
        buffer_size = 2 * batch_size + 1
        file_names = tf.matching_files(files_name_pattern) # <matching_files> accept wildcard
        dataset = data.TextLineDataset(filenames=file_names)
        dataset = dataset.skip(skip_header_lines)
        if shuffle:
            dataset = dataset.shuffle(buffer_size)
        dataset = dataset.map(lambda csv_row: parse_csv_row(csv_row), 
                              num_parallel_calls=num_threads)
        dataset = dataset.batch(batch_size)
        dataset = dataset.repeat(num_epochs)
        dataset = dataset.prefetch(buffer_size)
        return dataset
    
    # function builder
    if mode == 'train':
        mode_key = tf.estimator.ModeKeys.TRAIN
    elif mode == 'eval':
        mode_key = tf.estimator.ModeKeys.EVAL
    else:
        tf.logging.error('[input_fn_builder] invalid mode:{}'.format(mode))
                         
    input_fn = lambda: input_fn_def(files_name_pattern, mode_key, 
                                    skip_header_lines, num_epochs, batch_size)
    return input_fn

In [9]:
input_fn = input_fn_builder(files_name_pattern=CSV_PATH_TRAIN, mode='train', 
                            num_epochs=STEPS, batch_size=BATCH_TRAIN, skip_header_lines=1)

eval_fn = input_fn_builder(files_name_pattern=CSV_PATH_EVAL, mode='eval',
                           num_epochs=1, batch_size=BATCH_EVAL, skip_header_lines=1)

### 4.2 Train and Evaluate

In [10]:
# Used for prediction
def serving_input_fn(): 
    receiver_tensor = {}
    for el in HEADER:
        receiver_tensor[el] = tf.placeholder(tf.string, [None])

    features = {key: tensor
                for key, tensor in receiver_tensor.items()
                }

    return tf.estimator.export.ServingInputReceiver(features, receiver_tensor)


# Train & eval the model and export for prediction
def my_train_and_evaluate(model_fn, input_fn, eval_fn, train_steps, params = None):    
        
    
    exporter=tf.estimator.FinalExporter(name='predict', # Needed for prediction
                                        serving_input_receiver_fn=serving_input_fn,
                                        as_text=False)

    eval_spec = tf.estimator.EvalSpec(input_fn=eval_fn,
                                      exporters=exporter, 
                                      steps=EVAL_STEPS,
                                      start_delay_secs = 10, 
                                      throttle_secs=1,
                                      name='eval_spec'
                                      )

    # Create estimator        
    run_config = tf.estimator.RunConfig(tf_random_seed=42, # consistency
                                        save_checkpoints_steps = SAVE_CHECKPOINTS_STEPS,
                                        keep_checkpoint_max=10
                                        )
    
    my_estimator = tf.estimator.Estimator(model_fn=model_fn,
                                          model_dir=MODEL_DIR_PATH,
                                          params=params,
                                          config=run_config
                                         )
    
    # Early stopping
    early_stopping = tf.contrib.estimator.stop_if_no_decrease_hook(
                                                    my_estimator,
                                                    metric_name='my_acc',
                                                    max_steps_without_decrease=10000,
                                                    min_steps=10000)
    # Train spec
    train_spec = tf.estimator.TrainSpec(input_fn,
                                        max_steps=train_steps,
                                        hooks=[early_stopping]
                                       )
    
    # Launch estimator
    tf.estimator.train_and_evaluate(my_estimator, train_spec, eval_spec)
    tf.logging.info('****[END]**** \n')

In [25]:
def create_estimator(model_fn, input_fn, eval_fn, train_steps, params = None):    
        
    
    exporter=tf.estimator.FinalExporter(name='predict', # Needed for prediction
                                        serving_input_receiver_fn=serving_input_fn,
                                        as_text=False)

    eval_spec = tf.estimator.EvalSpec(input_fn=eval_fn,
                                      exporters=exporter, 
                                      steps=EVAL_STEPS,
                                      start_delay_secs = 10, 
                                      throttle_secs=1,
                                      name='eval_spec'
                                      )

    # Create estimator        
    run_config = tf.estimator.RunConfig(tf_random_seed=42, # consistency
                                        save_checkpoints_steps = SAVE_CHECKPOINTS_STEPS,
                                        keep_checkpoint_max=10
                                        )
    
    my_estimator = tf.estimator.Estimator(model_fn=model_fn,
                                          model_dir=MODEL_DIR_PATH,
                                          params=params,
                                          config=run_config
                                         )
    return my_estimator

### 4.3 Input hot encoding

In [11]:
def pad_encoding(code, tips):
    with tf.name_scope('Pad_encoder') as scope:
        if tips:
            zeros_dims = tf.stack([tf.shape(code)[0], 34]) 
        else:
            zeros_dims = tf.stack([tf.shape(code)[0], 24]) 
        pad_one_hot = tf.fill(zeros_dims, 0.0)
        return pad_one_hot   

    
def tips_converter(peg):
     with tf.name_scope('Tips_encoder_slave') as scope:
        peg_ones = tf.ones(shape=[peg]) 
        peg_zeros = tf.zeros(shape=[5 - peg]) 
        peg_encoded = tf.concat([peg_ones, peg_zeros], axis=0)
        return peg_encoded

    
def guess_and_psw_encoding(code, tips, pegs_table, tips_table):
    with tf.name_scope('Encoder') as scope:
        with tf.name_scope('Pegs_extractor') as scope:
            for idx in range(4):
                piece = tf.strings.substr(code, idx, 1)
                piece_id = pegs_table.lookup(piece)
                piece_one_hot = tf.one_hot(piece_id, 6) # [A-->F] = 6
                if idx == 0:
                    code_one_hot = piece_one_hot
                else:
                    code_one_hot = tf.concat([code_one_hot, piece_one_hot],axis=1) 

        # <code> is the password (last column of csv)
        if not tips:
            return code_one_hot

        # <code> is a guess
        with tf.name_scope('Tips_encoder_Master') as scope:
            blk_peg = tf.strings.substr(code, 4, 1) # Extract tips: e.g. '0'
            wht_peg = tf.strings.substr(code, 5, 1)
            blk_peg = tf.to_float(tips_table.lookup(blk_peg)) # string to number: e.g. '0'
            wht_peg = tf.to_float(tips_table.lookup(wht_peg))
            blk_peg = tf.add(blk_peg, tf.constant(1,dtype=tf.float32)) # e.g. '0' will have one '1'
            wht_peg = tf.add(wht_peg, tf.constant(1,dtype=tf.float32))
            blk_peg = tf.map_fn(tips_converter, blk_peg) # e.g. '0' --> [10000] 
            wht_peg = tf.map_fn(tips_converter, wht_peg)
            code_one_hot = tf.concat([code_one_hot, blk_peg], axis=1) # Attach peg to the psw encoded
            code_one_hot = tf.concat([code_one_hot, wht_peg], axis=1) 

            return code_one_hot


def one_hot_converter(code, tips=True): # code.shape = (?,) - where ? is the batch size
    """ 
    Convert each element of a match in machine readable data:
    Code: A --> [100000] B --> [010000] C --> [001000] ... F --> [000001]
    Tips: 0 --> [1000]   1 --> [11000]  2 --> [11100]  ... 4 --> [11111]
    """        
    
    # Char to Int mapping, the position will be the idx
    with tf.name_scope('Hot_encoding_slave') as scope:
        mapping_strings = tf.constant(['A', 'B', 'C', 'D', 'E', 'F'])
        mapping_tips = tf.constant(['0','1','2','3','4'])

        pegs_table = tf.contrib.lookup.index_table_from_tensor(mapping=mapping_strings, default_value=-1)
        tips_table = tf.contrib.lookup.index_table_from_tensor(mapping=mapping_tips, default_value=-1)

        # Check if code is pad
        is_pad = tf.strings.regex_full_match(code, tf.constant('<pad>'))        
        is_pad = tf.math.reduce_all(is_pad)

        # Code encoding
        code_one_hot = tf.cond(is_pad, \
                               lambda: pad_encoding(code, tips), \
                               lambda: guess_and_psw_encoding(code, tips, pegs_table, tips_table))     
    
        return code_one_hot


def feature_encoder(match):
    with tf.name_scope('Hot_encoding_match') as scope:
        match_converted = []
        for guess in match.values():
            code_one_hot = one_hot_converter(guess, tips=True)
            match_converted.append(code_one_hot)
        return match_converted


def labels_encoder(password):
    with tf.name_scope('Hot_encoding_psw') as scope:
        password_hot = one_hot_converter(password, tips=False)
        return password_hot

--------------------
### 4.4 Model Function

In [12]:
# Custom Maxout
def max_out(inputs, num_units, axis=None):
    shape = inputs.get_shape().as_list()
    if shape[0] is None:
        shape[0] = -1
    if axis is None:  # Assume that channel is the last dimension
        axis = -1
    num_channels = shape[axis]
    if num_channels % num_units:
        raise ValueError('number of features({}) is not '
                         'a multiple of num_units({})'.format(num_channels, num_units))
    shape[axis] = num_units
    shape += [num_channels // num_units]
    outputs = tf.reduce_max(tf.reshape(inputs, shape), -1, keep_dims=False) # TODO: change reduce_max (deprecated)
    return outputs

In [13]:
# Model function
def test_model_fn(features, labels, mode, params=None):
    
    ##################
    # INPUT ENCODING 
    ##################
    tf.logging.debug('Params:{}'.format(params))
    with tf.name_scope('Hot_encoding_master') as scope:
        
        psw = features.pop('PASSWORD') # Used features.pop instead of labels because "goalReduced"        
        psw_encoded = labels_encoder(psw) #(?, 24)
        goal_pos_start = GOAL_POS * 6
        psw_cutted = psw_encoded[:,goal_pos_start:goal_pos_start+6] # Reduced Goal
    
        match_encoded = feature_encoder(features) # [(?,?=34) ...] x 10
        match_encoded = tf.stack(match_encoded, axis=1) # (?,10,?=34)
        match_encoded = tf.reshape(match_encoded, shape=[-1,10,34])
        match_encoded = tf.unstack(match_encoded, axis=1) # [(?,34)...] x 10
        
        # Add psw pre-predicted
        psw_pre_predicted = psw_encoded[:,0:GOAL_POS*6] #(?,[0,6,12,18]) See the psw you don't need to predict
        paddings = [[0,0],[0, (4-GOAL_POS) * 6 + 10]]
        psw_pre_predicted_padded = tf.pad(psw_pre_predicted, paddings) #(?,34)
        match_encoded.append(psw_pre_predicted_padded)  # [(?,34)...,(?,[0-6-12-18])] x 11
    
    
    ####################
    # NETWORK STRUCTURE
    ####################
    
    # Network design
    with tf.name_scope('Network_Design') as scope:
        lstm_fw_cell = rnn_cell.LSTMCell(LSTM_SIZE, forget_bias=FW_FORGET_BIAS, name="fw_cell")
        lstm_bw_cell = rnn_cell.LSTMCell(LSTM_SIZE, forget_bias=BW_FORGET_BIAS, name="bw_cell")
        
        bilstm, _, _ = static_bidirectional_rnn(lstm_fw_cell, 
                                                lstm_bw_cell, 
                                                inputs=match_encoded, 
                                                dtype=tf.float32)
        
        lstm_output = tf.stack(bilstm, axis=0) # (11, ?, 128)
        
        # Attention system
        with tf.variable_scope('Attention_system') as scope:
            inputs = tf.transpose(lstm_output, [1, 0, 2]) #(T,B,D) => (B,T,D) 
            hidden_size = inputs.shape[2].value 
            w_omega = tf.Variable(tf.random_normal([hidden_size, ATTENTION_SIZE], stddev=0.1), name="w_omega")
            b_omega = tf.Variable(tf.random_normal([ATTENTION_SIZE], stddev=0.1), name="b_omega")
            u_omega = tf.Variable(tf.random_normal([ATTENTION_SIZE], stddev=0.1), name="u_omega")
            with tf.name_scope('1_v_build'):
                v = tf.tanh(tf.tensordot(inputs, w_omega, axes=1) + b_omega)
            vu = tf.tensordot(v, u_omega, axes=1, name='2_vu_4softmax')
            alphas = tf.nn.softmax(vu, name='3_softmax_build_alphas')       
            representation = tf.reduce_sum(inputs * tf.expand_dims(alphas, -1), 1,name='4_r_build')
        
        is_training = True if mode == tf.estimator.ModeKeys.TRAIN else False
        representation = tf.layers.dropout(representation,
                                           rate=DROP_REPRESENTATION,
                                           training=is_training)
        
        representation = max_out(representation, MAXOUT_SIZE)
        logits = tf.layers.dense(inputs=representation, units=6, name='logits_layer', activation=tf.nn.softmax) # (?, 6)

    
    #####################
    # 1 - PREDICT MODE
    ####################
    
    # Prediction Form
    with tf.name_scope('Prediction_Prepare') as scope:
        piece_id = tf.math.argmax(logits, axis=1) 
        pieces_one_hot = tf.one_hot(piece_id, 6) # convert logits to nearest legal peg
        prediction = pieces_one_hot
        
    
    if mode == tf.estimator.ModeKeys.PREDICT:
        probabilities = logits # fully_connected have already softmax 
        predictions = {'Prediction': prediction, 'probabilites': probabilities, 'logits': logits}
        export_outputs = {'prediction': tf.estimator.export.PredictOutput(predictions)}
        return tf.estimator.EstimatorSpec(mode, 
                                          predictions=predictions, 
                                          export_outputs=export_outputs)
    
    #############
    # METRICS
    #############
    
    # Custom accuracy - Numbers of "1" that coincide between hot(psw_predicted) and hot(psw)
    with tf.name_scope('Custom_accuracy') as scope:
        equality = tf.equal(prediction, psw_cutted)
        equality = tf.math.reduce_all(equality, axis=1)
        custom_accuracy = tf.reduce_mean(tf.cast(equality, tf.float32))
        
        ones_acc = tf.math.multiply(psw_cutted, prediction)
        ones_acc = tf.math.reduce_sum(ones_acc, axis=1)
    
    
    with tf.name_scope('Custom_metrics'):
        accuracy_of_ones = tf.metrics.accuracy(labels=tf.fill(tf.shape(ones_acc), 1.0), 
                                               predictions=ones_acc, 
                                               name='acc_of_ones')
        
        accuracy = tf.metrics.accuracy(labels=psw_cutted, 
                                       predictions=prediction, 
                                       name='my_accuracy')

        precision = tf.metrics.precision(labels=psw_cutted, 
                                         predictions=prediction, 
                                         name='my_precision')

        tf.summary.scalar('my_acc', accuracy[1])
        tf.summary.scalar('my_prec', precision[1])
        tf.summary.scalar('my_acc_of_one', accuracy_of_ones[1])
        
        
    # Loss
    with tf.name_scope('Loss_prepare') as scope:
        loss = tf.losses.mean_pairwise_squared_error(labels=psw_cutted, predictions=logits) # [!] model.info
        if L2_ACTIVATED:
            l2_regularizer = tf.contrib.layers.l2_regularizer(scale=L2_SCALE, scope='l2_regularization') 
            weights = tf.trainable_variables() 
            regularization_penalty = tf.contrib.layers.apply_regularization(l2_regularizer, weights)
            regularized_loss = loss + regularization_penalty
            loss = regularized_loss
            
            
    #################
    # DEBUG INSIGHTS
    #################
    
    dict_debug = {}
    dict_debug['Match encoded[0]'] = match_encoded[0]
    dict_debug['Logits'] = logits
    dict_debug['Prediction'] = prediction
    dict_debug['Password'] = psw_encoded
    dict_debug['Password_cutted'] = psw_cutted
    dict_debug['Equality'] = equality
    dict_debug['Accuracy_of_ones'] = ones_acc
    dict_debug['Real_Accuracy'] = custom_accuracy
    dict_debug['Psw_Pre_Predicted'] = psw_pre_predicted
    dict_debug['Psw_Pre_Predicted_Padded_1'] = psw_pre_predicted_padded[:,:18]
    dict_debug['Psw_Pre_Predicted_Padded_2'] = psw_pre_predicted_padded[:,18:]
    
    # Hooks
    debug_hook = [tf.train.LoggingTensorHook(dict_debug, every_n_iter=30)] if DEBUG_HOOK else []     
  

        
    #####################
    # TRAIN & EVAL MODE
    ####################
    
    # 2 - Evaluation mode
    if mode == tf.estimator.ModeKeys.EVAL: 
        custom_metrics = {'my_acc':accuracy, 'my_prec':precision, 'my_acc_of_one':accuracy_of_ones}
        return tf.estimator.EstimatorSpec(mode, 
                                          loss=loss,
                                          eval_metric_ops=custom_metrics)
    
    # 3 - Train mode
    optimizer = tf.train.AdamOptimizer(learning_rate=ALPHA)
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())

    return tf.estimator.EstimatorSpec(mode,
                                      loss=loss,
                                      training_hooks = debug_hook,
                                      train_op=train_op
                                      )

-------------------------------
## 5. Run
### Tensorboard

### Train

In [15]:
# ##########
# # RUN
# ##########
# def save_params(params, path):
#     with open(path, 'w') as f:
#         json.dump(params, f, indent=1, sort_keys=True)
#         print('{} saved'.format(path))

# save_params(model_params,"temp_model.info")

# time_start = time.time()
# my_train_and_evaluate(test_model_fn, input_fn, eval_fn, STEPS, model_params)
# total_time = time.time() - time_start

# time_consumed = str(datetime.timedelta(seconds=total_time))[:-7]
# model_params['train_duration'] = time_consumed

# save_params(model_params, MODEL_DIR_PATH+"/model.info")

----------------------
WIT

In [16]:
#@title Invoke What-If Tool for test data and the trained model {display-mode: "form"}

num_datapoints = 2000  #@param {type: "number"}
tool_height_in_px = 1000  #@param {type: "number"}

from witwidget.notebook.visualization import WitConfigBuilder
from witwidget.notebook.visualization import WitWidget

In [17]:
import pandas as pd
import tensorflow as tf
import numpy as np

In [18]:
FEATURES_NAME = ['Guess 1', 'Guess 2', 'Guess 3', 'Guess 4', 'Guess 5',
          'Guess 6', 'Guess 7', 'Guess 8', 'Guess 9', 'Guess 10']

TARGET_NAME = 'PASSWORD'

features_and_labels = FEATURES_NAME + [TARGET_NAME]

In [19]:
# Optimize the df load
feature_type = {}
cuts_type = {}
feature_type[TARGET_NAME] = str
for feature in FEATURES_NAME:
    feature_type[feature] = str
    cuts_type[feature] = int
    
path_df_train = "../database/dbML/hopeful/JOINED_hopeful_2019-07-01_0.7_train.csv"
df_train = pd.read_csv(path_df_train, delimiter=',',encoding='utf-8', skip_blank_lines=True, dtype=feature_type)

In [20]:
# Creates a tf feature spec from the dataframe and columns specified.
def create_feature_spec(df, columns=None):
    feature_spec = {}
    if columns == None:
        columns = df.columns.values.tolist()
    for f in columns:
        if df[f].dtype is np.dtype(np.int64):
            feature_spec[f] = tf.FixedLenFeature(shape=(), dtype=tf.int64)
        elif df[f].dtype is np.dtype(np.float64):
            feature_spec[f] = tf.FixedLenFeature(shape=(), dtype=tf.float32)
        else:
            feature_spec[f] = tf.FixedLenFeature(shape=(), dtype=tf.string)
    return feature_spec

In [21]:
# Create a feature spec for the classifier
feature_spec = create_feature_spec(df_train, features_and_labels)

In [22]:
# Converts a dataframe into a list of tf.Example protos.
def df_to_examples(df, columns=None):
    examples = []
    if columns == None:
        columns = df.columns.values.tolist()
    for index, row in df.iterrows():
        example = tf.train.Example()
        for col in columns:
            if df[col].dtype is np.dtype(np.int64):
                example.features.feature[col].int64_list.value.append(int(row[col]))
            elif df[col].dtype is np.dtype(np.float64):
                example.features.feature[col].float_list.value.append(row[col])
            elif row[col] == row[col]:
                example.features.feature[col].bytes_list.value.append(row[col].encode('utf-8'))
        examples.append(example)
    return examples

In [23]:
num_datapoints = 2000  
tool_height_in_px = 1000  

test_examples = df_to_examples(df_train[0:num_datapoints])

In [27]:
classifier = create_estimator(test_model_fn, input_fn, eval_fn, STEPS, model_params)

I0823 18:20:45.915928 139886312556352 estimator.py:201] Using config: {'_model_dir': '../networks/Trained/hopeful/model_dir_647_0/', '_tf_random_seed': 42, '_save_summary_steps': 100, '_save_checkpoints_steps': 18961, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 10, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3977128f28>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


In [28]:
# Setup the tool with the test examples and the trained classifier
config_builder = WitConfigBuilder(test_examples).set_estimator_and_feature_spec(classifier, feature_spec)

WitWidget(config_builder, height=tool_height_in_px)

WitWidget(config={'model_type': 'classification', 'label_vocab': [], 'are_sequence_examples': False, 'inferenc…

W0823 18:20:54.355251 139886312556352 deprecation.py:323] From /mnt/c/Users/Simone Guardati/UbuntuWorkspace/envMM/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
I0823 18:20:54.425079 139886312556352 estimator.py:1111] Calling model_fn.
I0823 18:20:54.425833 139886312556352 <ipython-input-13-2ac873cfbdee>:7] Params:{'_NETWORK_TYPE': 'BiLSTM_AttentionSystem', 'CSV_PATH_TRAIN': '../database/dbML/knuthFast/knuthFast_2019-07-30_0.9_75844_train.csv', 'CSV_PATH_EVAL': '../database/dbML/knuthFast/knuthFast_2019-07-30_0.9_8516_eval.csv', 'MODEL_DIR_PATH': '../networks/Trained/hopeful/model_dir_647_0/', 'EPOCHS': 10, 'ALPHA': 0.001, 'FW_FORGET_BIAS': 1, 'BW_FORGET_BIAS': 1, 'LSTM_SIZE': 256, 'NUM_EVALS': 10, 'BATCH_TRAIN': 4, 'BATCH_EVAL': 4, 'SHUFFLE_EVAL': True, 'EVAL_STEPS': None, 'MU


For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.



W0823 18:20:57.004391 139886312556352 deprecation.py:323] From <ipython-input-13-2ac873cfbdee>:33: LSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
W0823 18:20:57.005520 139886312556352 deprecation.py:323] From <ipython-input-13-2ac873cfbdee>:39: static_bidirectional_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell, unroll=True))`, which is equivalent to this API
W0823 18:20:57.006254 139886312556352 deprecation.py:323] From /mnt/c/Users/Simone Guardati/UbuntuWorkspace/envMM/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:1565: static_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Plea