# Deep Products: Deep Tag Labeler

This is the first project for the book Deep Products, about using NLP and weakly supervised learning to build complete machine learning products. Using the non-code text of Stack Overflow posts (question and answers) to tag them using a multi-class, multi-label classifier using LSTMs and Emlo embeddings.

In [1]:
import json
import os
import re

import pandas as pd
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub

from tqdm import tqdm_notebook
#import tensorflow_addons as tfa

print(
    tf.test.is_gpu_available(
        cuda_only=False,
        min_cuda_compute_capability=None
    )
)
print()
print(
    tf.compat.v2.config.experimental.list_physical_devices('GPU')
)

True

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU')]


In [2]:
np.random.seed(seed=1337)

## Load a Stratified Sample of Answered Stack Overflow Questions with Tags

We load a sample pulled from all answered questions from Stack Overflow. This data was converted from XML to parquet format via [code/stackoverflow/xml_to_parquet.py](stackoverflow/sample_json.spark.py) and then a more balanced stratified sample was computed for tags with over 50,000, 20,000 and 10,000 instances that reduced the maximum imbalance from 100-1000:1 to 8:1 using [code/stackoverflow/get_questions.spark.py](stackoverflow/get_questions.spark.py).

These scripts were run using a Spark cluster via Amazon Elastic MapReduce using 13 r5.12xlarge machines for about 24 hours at a cost of about \\$300 per full run, and about \\$1,500 overall to create and debug. Big data is expensive.

With this dataset the challenge isn't the number of records per say but rather the imbalance of the dataset if we wish to expand the number of tags the model can predict beyond low 3 digits. This leads us to some of the other techniques we'll cover involving weakly supervised learning.

In [3]:
sorted_all_tags = json.load(open('data/stackoverflow/08-05-2019/sorted_all_tags.50000.json'))
max_index = sorted_all_tags[-1][0] + 1

In [4]:
import pyarrow
posts_df = pd.read_parquet(
    'data/stackoverflow/08-05-2019/Questions.Stratified.Final.50000.parquet',
    columns=['_Body'] + ['label_{}'.format(i) for i in range(0, max_index)],
    engine='pyarrow'
)
posts_df.head(5)

Unnamed: 0,_Body,label_0,label_1,label_2,label_3,label_4,label_5,label_6,label_7,label_8,...,label_14,label_15,label_16,label_17,label_18,label_19,label_20,label_21,label_22,label_23
0,"[C, Mono, Winforms, MessageBox, problem, I, fi...",1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"[Are, NET, data, providers, Oracle, require, O...",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"[How, I, focus, foreign, window, I, applicatio...",1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"[Default, button, hit, windows, forms, trying,...",1,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"[Can, I, avoid, JIT, net, Say, code, always, g...",1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [5]:
print(
    '{:,}'.format(
        len(posts_df.index)
    )
)

1,293,018


## Map from Tags to IDs

In [6]:
tag_index = json.load(open('data/stackoverflow/08-05-2019/tag_index.50000.json'))
index_tag = json.load(open('data/stackoverflow/08-05-2019/index_tag.50000.json'))

## Count the Most Common Tags

In [7]:
label_counts = json.load(open('data/stackoverflow/08-05-2019/label_counts.50000.json'))

# Sanity check the difference files
assert(len(label_counts.keys()) == len(tag_index.keys()) == len(index_tag.keys()) == len(sorted_all_tags))

## Make Record Count a Multiple of the Batch Size and Post Sequence Length

The Elmo embedding requires that the number of records be a multiple of the batch size times the number of tokens in the padded posts.

In [8]:
import math

BATCH_SIZE = 1024
MAX_LEN = 100
TOKEN_COUNT = 10000
EMBED_SIZE = 50

# Convert label columns to numpy array
labels = posts_df[list(posts_df.columns)[1:]].to_numpy()

# training_count must be a multiple of the BATCH_SIZE times the MAX_LEN for the Elmo embedding layer
highest_factor = math.floor(len(posts_df.index) / (BATCH_SIZE * MAX_LEN))
training_count = highest_factor * BATCH_SIZE * MAX_LEN
print('Highest Factor: {:,} Training Count: {:,}'.format(highest_factor, training_count))

# Remove stopwords - now done in Spark, so can remove once that runs
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import RegexpTokenizer
stop_words = set(stopwords.words('english'))
tokenizer = RegexpTokenizer(r'\w+')

documents = []
for body in posts_df[0:training_count]['_Body'].values.tolist():
    words = body.tolist()
    documents.append(' '.join(words))

labels = labels[0:training_count]

# Lengths for x and y match
assert( len(documents) == training_count == labels.shape[0])

Highest Factor: 12 Training Count: 1,228,800


[nltk_data] Downloading package stopwords to /home/ubuntu/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [9]:
# sentences = [' '.join(x) for x in posts_text]
# sentences

## Create an Elmo Embedding Layer using Tensorflow Hub

Note that this layer takes a padded two-dimensional array of strings.

In [10]:
# # From https://www.depends-on-the-definition.com/named-entity-recognition-with-residual-lstm-and-elmo/
# tf.compat.v1.disable_eager_execution()

# sess = tf.compat.v1.Session()

# elmo_model = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)

# sess.run(tf.global_variables_initializer())
# sess.run(tf.tables_initializer())

# def ElmoEmbedding(x):
#     return elmo_model(inputs={
#                             "tokens": tf.squeeze(tf.cast(x, tf.string)),
#                             "sequence_len": tf.constant(BATCH_SIZE*[MAX_LEN])
#                       },
#                       signature="tokens",
#                       as_dict=True)["elmo"]

# text_input = Input(shape=(max_len,), dtype=tf.string)
# elmo_embedding = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(text_input)

## Create a Glove Embedding Layer

In [11]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(num_words=TOKEN_COUNT)
tokenizer.fit_on_texts(documents)
# encoded_docs = tokenizer.texts_to_matrix(posts_text, mode='tfidf')
sequences = tokenizer.texts_to_sequences(documents)

padded_sequences = pad_sequences(
    sequences,
    maxlen=MAX_LEN,
    dtype='int32',
    padding='post',
    truncating='pre',
    value=1
)

print(max([len(x) for x in padded_sequences]), min([len(x) for x in padded_sequences]))
assert( min([len(x) for x in padded_sequences]) == MAX_LEN == max([len(x) for x in padded_sequences]))

padded_sequences.shape

100 100


(1228800, 100)

In [12]:
padded_sequences

array([[  70, 3348, 2215, ...,    1,    1,    1],
       [  56,  170,   34, ...,    1,    1,    1],
       [  21,    2, 1487, ...,    1,    1,    1],
       ...,
       [   2,   60,   14, ...,    1,    1,    1],
       [   2,   25,   67, ...,    1,    1,    1],
       [  25,   44,    4, ...,    1,    1,    1]], dtype=int32)

In [13]:
def get_coefs(word,*arr): 
    return word, np.asarray(arr, dtype='float32')

embeddings_index = dict(get_coefs(*o.strip().split()) for o in open('data/GloVe/glove.6B.50d.txt'))

In [14]:
# Create embeddings matrix
all_embs = np.stack(embeddings_index.values())
emb_mean, emb_std = all_embs.mean(), all_embs.std()

# Create embedding matrix using our vocabulary
word_index = tokenizer.word_index
print('word_index', word_index)
nb_words = min(TOKEN_COUNT, len(word_index))

# Initialize embedding matrix
embedding_matrix = np.random.normal(emb_mean, emb_std, (nb_words, EMBED_SIZE))

# Loop through each word and get its embedding vector
missing_count = 0
too_short_count = 0
for word, i in word_index.items():
    if i >= TOKEN_COUNT: 
        too_short_count += 1
        continue # Skip words appearing less than the minimum allowed
    
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None: 
        embedding_matrix[i] = embedding_vector
    else:
        missing_count += 1

# print(missing_count, too_short_count, embedding_matrix[0])

  if (await self.run_code(code, result,  async_=asy)):
IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



## Experimental Setup

We `train_test_split` rather than k-fold cross validate because it is too expensive.

In [38]:
from sklearn.model_selection import train_test_split

TEST_SPLIT = 0.1

# X_train, X_test, y_train, y_test = train_test_split(
#     posts_text,
#     labels,
#     test_size=TEST_SPLIT,
#     random_state=1337
# )
X_train, X_test, y_train, y_test = train_test_split(
    padded_sequences,
    labels,
    test_size=TEST_SPLIT,
    random_state=1337
)

assert(X_train.shape[0] == y_train.shape[0])
assert(X_train.shape[1] == MAX_LEN)
assert(X_test.shape[0] == y_test.shape[0]) 
assert(X_test.shape[1] == MAX_LEN)

## Start with a sub-sample

In [39]:
# X_train = X_train[:100000]
# y_train = y_train[:100000]
# X_test  = X_test[:10000]
# y_test  = y_test[:10000]

## Create an LSTM Model to Classify Posts into Tags

We use the padded/tokenized posts as input, an Elmo embedding feeding an Long-Short-Term-Memory (LSTM) layer followed by a Dense layer with the same number of output neurons as our tag list.

We use focal loss as a loss function, which is used in appliations like object detection, because it 

In [40]:
# from keras.layers import Input, concatenate, Activation, Dense, LSTM, BatchNormalization, Embedding, Dropout, Lambda, Bidirectional
# from keras.metrics import categorical_accuracy, top_k_categorical_accuracy
# from keras.models import Model
# from keras.optimizers import Adam
# from keras_metrics import precision, f1_score, false_negative, true_positive, false_positive, true_negative

# # Text model
# text_input = Input(shape=(MAX_LEN,), dtype=tf.string)

# elmo_embedding = Lambda(ElmoEmbedding, output_shape=(MAX_LEN, 1024))(text_input)

# text_lstm = LSTM(
#     input_shape=(MAX_LEN, 1024,),
#     units=512,
#     recurrent_dropout=0.2,
#     dropout=0.2)(elmo_embedding)

# text_dense = Dense(200, activation='relu')(text_lstm)

# text_output = Dense(record_count, activation='sigmoid')(text_dense)

# text_model = Model(
#     inputs=text_input, 
#     outputs=text_output
# )



# from sklearn.metrics import hamming_loss

# from keras.optimizers import Adam
# adam = Adam(lr=0.0005)

# text_model.compile(
#     loss='binary_crossentropy',
#     optimizer=adam,
#     metrics=[
#         precision_m,
#         recall_m,
#         f1_m,
#         'mae',
#         abs_KL_div,
#         'accuracy'
#     ]
# )
# 
# text_model.summary()

## Compute Sample and Class Weights

Because we have skewed classes and multiple classes per example, we employ sample or class weights which weight the importance of each row according to the relative frequency of their labels.

In [41]:
from sklearn.utils.class_weight import compute_sample_weight

train_sample_weights = compute_sample_weight('balanced', y_train)
test_sample_weights = compute_sample_weight('balanced', y_test)

train_sample_weights, test_sample_weights

(array([1.33795272e-07, 1.33795272e-07, 4.87422163e-06, ...,
        1.33795272e-07, 4.44708048e-06, 1.33795272e-07]),
 array([4.88338303e-05, 6.51990004e-06, 1.35878540e-04, ...,
        1.87774806e-06, 5.86159129e-06, 3.94508491e-06]))

In [42]:
train_weight_vec = list(np.max(np.sum(y_train, axis=0))/np.sum(y_train, axis=0))
train_class_weights = {i: train_weight_vec[i] for i in range(y_train.shape[1])}

test_weight_vec = list(np.max(np.sum(y_test, axis=0))/np.sum(y_test, axis=0))
test_class_weights = {i: test_weight_vec[i] for i in range(y_test.shape[1])}

sorted(list(train_class_weights.items()), key=lambda x: x[1]), sorted(list(test_class_weights.items()), key=lambda x: x[1])

([(16, 1.0),
  (5, 1.1135069654754695),
  (12, 1.273924190977756),
  (17, 1.5361604345101316),
  (20, 1.7833393961440525),
  (10, 1.8090481930674802),
  (8, 1.8764928039195672),
  (9, 1.9153990414669724),
  (7, 2.033234716730721),
  (13, 2.0413624628709437),
  (22, 2.053848731985253),
  (14, 2.15067852129153),
  (11, 2.2125406186063303),
  (1, 2.2765858642147303),
  (23, 2.296708101692798),
  (6, 2.457425477877289),
  (21, 2.4888648209571516),
  (2, 2.5072794844692967),
  (15, 2.780819845711693),
  (0, 2.847693916276188),
  (3, 2.8789100732098816),
  (18, 3.0464827243350734),
  (4, 3.236049991198733),
  (19, 3.757396147360891)],
 [(16, 1.0),
  (5, 1.1531073446327684),
  (12, 1.2846577498033045),
  (17, 1.506551024174202),
  (20, 1.7459366980325064),
  (10, 1.7942857142857143),
  (8, 1.8462234283129806),
  (9, 1.8596810933940775),
  (7, 1.995600097775605),
  (13, 2.0123243776189303),
  (22, 2.027819175360159),
  (14, 2.145597897503285),
  (23, 2.214266341198807),
  (11, 2.22512946306895

## Establish a Log for Performance

In [43]:
performance_log = []

## Simple Baseline Model using `Conv1D`

In [44]:
def hamming_loss(y_true, y_pred, mode='multilabel'):
    if mode not in ['multiclass', 'multilabel']:
        raise TypeError('mode must be: [multiclass, multilabel])')

    if mode == 'multiclass':
        nonzero = tf.cast(tf.math.count_nonzero(y_true * y_pred, axis=-1), tf.float32)
        print(nonzero)
        return 1.0 - nonzero

    else:
        nonzero = tf.cast(tf.math.count_nonzero(y_true - y_pred, axis=-1), 
            tf.float32)
        return nonzero / y_true.get_shape()[-1]

In [45]:
import sys

## Model imports
from tensorflow.keras.callbacks import ReduceLROnPlateau, ModelCheckpoint, EarlyStopping
from tensorflow.keras.layers import ( Input, Embedding, GlobalMaxPooling1D, Conv1D, Dense, Activation, 
                                      Dropout, Lambda, BatchNormalization, concatenate )
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.preprocessing.text import Tokenizer

# Fit imports
from tensorflow.keras.losses import hinge, mae, binary_crossentropy, kld, Huber, squared_hinge

# Hyperparameter/method search space
import itertools

# For 4 GPUs
DIST_BATCH_SIZE = int(BATCH_SIZE/4)
EPOCHS = 10


print('Starting experiment loop...')

EXPERIMENT_NAME = 'kld dense dims x filter_lengths x adam/sgd'
learning_rates = [0.005]
losses = [kld] #binary_crossentropy, kld, hinge, mae]
activations = ['selu'] # 'selu'
optimizers = ['adam', 'sgd']
dropout_ratios = [0.2]
filter_lengths = [64, 128]
class_weight_set = [train_class_weights]
sample_weight_set = [train_sample_weights]
test_sample_weight_set = [None] #, test_sample_weights]
dense_dims = [16, 32, 64, 128]

args = list(itertools.product(
    learning_rates,
    losses,
    activations,
    optimizers,
    dropout_ratios,
    filter_lengths,
    class_weight_set,
    sample_weight_set,
    test_sample_weight_set,
    dense_dims
))
print()
print(f'{len(args):,} total iterations...')
sys.stdout.flush()

Starting experiment loop...

16 total iterations...


In [46]:
# Weights and Biases Monitoring
import wandb
from wandb.keras import WandbCallback
wandb.init(project="weakly-supervised-learning", name=EXPERIMENT_NAME)
config = wandb.config

# tqdm_notebook
for learning_rate, loss_function, activation, optimizer, dropout_ratio, filter_length, class_weights, \
    sample_weights, test_sample_weights, dense_dim in args:
    
    cw_label  = 'class_weights' if isinstance(class_weights, dict) else 'no_class_weights'
    sw_label  = 'sample_weights' if isinstance(sample_weights, np.ndarray) else 'no_sample_weights'
    tsw_label = 'test_sample_weights' if isinstance(test_sample_weights, list) else 'no_test_sample_weights'
    
    model_name = str(loss_function.__name__) + ' ' + str(learning_rate) + ' ' + str(optimizer) + ' ' + \
                 str(activation) + ' ' + str(EPOCHS) + ' ' + cw_label + ' ' + sw_label + ' ' + tsw_label + ' ' + \
                 str(dense_dim)
    
    # Log wandb config
    config.update(
        {
            'class_weights': cw_label,
            'sample_weights': sw_label,
            'test_sample_weights': tsw_label,
        },
        allow_val_change=True
    )
    
    print(model_name)
    sys.stdout.flush()
    
    #
    # Build ze model...
    #
    def build_model(
        token_count=None,
        max_words=None,
        embedding_dim=None,
        label_count=None,
        dropout_ratio=None,
        filter_length=None,
        loss_function=None,
        learning_rate=None,
        optimizer=None,
        activation=None,
        dense_dim=None,
    ):
        """Build the model using this experiment's parameters"""
        
        # Store config in wandb
        config.update(
            {
                'token_count': token_count,
                'max_words': max_words,
                'embedding_dim': embedding_dim,
                'label_count': label_count,
                'dropout_ratio': dropout_ratio,
                'filter_length': filter_length,
                'loss_function': loss_function.__name__,
                'learning_rate': learning_rate,
                'optimizer': optimizer,
                'activation': activation,
                'dense_dim': dense_dim,
            },
            allow_val_change=True
        )
        
        mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1", "/gpu:2", "/gpu:3"])
        with mirrored_strategy.scope():
            
            print('Number of devices: {}'.format(mirrored_strategy.num_replicas_in_sync))
        
            hashed_input = Input(shape=(X_train.shape[1],), dtype='int64')

            emb = Embedding(token_count, embedding_dim, weights=[embedding_matrix])(hashed_input)

            # Specify each convolution layer and their kernel siz i.e. n-grams 
            conv1_1 = Conv1D(filters=filter_length, kernel_size=3)(emb)
            btch1_1 = BatchNormalization()(conv1_1)
            drp1_1  = Dropout(dropout_ratio)(btch1_1)
            actv1_1 = Activation(activation)(drp1_1)
            glmp1_1 = GlobalMaxPooling1D()(actv1_1)

            conv1_2 = Conv1D(filters=filter_length, kernel_size=4)(emb)
            btch1_2 = BatchNormalization()(conv1_2)
            drp1_2  = Dropout(dropout_ratio)(btch1_2)
            actv1_2 = Activation(activation)(drp1_2)
            glmp1_2 = GlobalMaxPooling1D()(actv1_2)

            conv1_3 = Conv1D(filters=filter_length, kernel_size=5)(emb)
            btch1_3 = BatchNormalization()(conv1_3)
            drp1_3  = Dropout(dropout_ratio)(btch1_3)
            actv1_3 = Activation(activation)(drp1_3)
            glmp1_3 = GlobalMaxPooling1D()(actv1_3)

            conv1_4 = Conv1D(filters=filter_length, kernel_size=6)(emb)
            btch1_4 = BatchNormalization()(conv1_4)
            drp1_4  = Dropout(dropout_ratio)(btch1_4)
            actv1_4 = Activation(activation)(drp1_4)
            glmp1_4 = GlobalMaxPooling1D()(actv1_4)

            # Gather all convolution layers
            cnct = concatenate([glmp1_1, glmp1_2, glmp1_3, glmp1_4], axis=1)
            drp1 = Dropout(dropout_ratio)(cnct)

            dns1  = Dense(dense_dim, activation=activation)(drp1)
            btch1 = BatchNormalization()(dns1)
            drp2  = Dropout(dropout_ratio)(btch1)

            out = Dense(y_train.shape[1], activation='sigmoid')(drp2)

            text_model = Model(
                inputs=hashed_input, 
                outputs=out
            )

            if activation == 'adam':
                activation = Adam(lr=learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
            if activation == 'sgd':
                activation = SGD(lr=learning_rate)

            text_model.compile(
                optimizer=optimizer,
                loss=loss_function,
                metrics=[
                    'categorical_accuracy',
                    tf.keras.metrics.Precision(),
                    tf.keras.metrics.Recall(),
                    tf.keras.metrics.BinaryAccuracy(),
                    tf.keras.metrics.Hinge(),
                    tf.keras.metrics.AUC(),
                    tf.keras.metrics.Accuracy(),
                    tf.keras.metrics.MeanAbsoluteError(),
                    tf.keras.metrics.MeanAbsolutePercentageError(),
                    tf.keras.metrics.TruePositives(),
                    tf.keras.metrics.FalsePositives(),
                    tf.keras.metrics.TrueNegatives(),
                    tf.keras.metrics.FalseNegatives()
                ]
            )
            #text_model.summary()

            return text_model

    #
    # Train ze model...
    #
    def train_model(
        model=None,
        X_train=None,
        X_test=None,
        learning_rate=None,
        loss_function=None,
        optimizer=None,
        activation=None,
        epochs=None,
        class_weights=None,
        sample_weights=None,
        test_sample_weights=None,
    ):
        """Train the model using the current parameters and evaluate performance"""
        
        # Log wandb config
        config.update(
            { 'epochs': epochs },
            allow_val_change=True,
        )
        
        callbacks = [
            ReduceLROnPlateau(
                patience=1,
                verbose=1,
                min_delta=0.001,
                min_lr=0.0005,
            ), 
            EarlyStopping(
                patience=2,
                min_delta=0.001,
                verbose=1,
                restore_best_weights=True
            ), 
            WandbCallback()
            #ModelCheckpoint(filepath='model-conv1d.h5', save_best_only=True)
        ]

        history = text_model.fit(
            X_train, 
            y_train,
            class_weight=class_weights,
            sample_weight=sample_weights,
            epochs=epochs,
            batch_size=DIST_BATCH_SIZE,
            validation_data=(X_test, y_test),
            callbacks=callbacks
        )
    
        # Evaluate to our log and return a description key and a list of metrics
        accr = text_model.evaluate(X_test, y_test, sample_weight=test_sample_weights)
        f1_score = 2.0 * (accr[1] * accr[2]) / \
                         (accr[1] + accr[2])
        return_val = [i for i in zip([j.item() for j in accr + [f1_score]], text_model.metrics_names + ['val_f1_score'])]

        return return_val

    #
    # main()
    #
    text_model = build_model(
        token_count=TOKEN_COUNT,
        max_words=100,
        embedding_dim=50,
        label_count=y_train.shape[1],
        filter_length=128,
        loss_function=loss_function,
        learning_rate=learning_rate,
        optimizer=optimizer,
        activation=activation,
        dropout_ratio=dropout_ratio,
        dense_dim=dense_dim,
    )
    try:
        accuracies = train_model(
            model=text_model,
            X_train=X_train,
            X_test=X_test,
            learning_rate=learning_rate,
            loss_function=loss_function,
            optimizer=optimizer,
            activation=activation,
            epochs=EPOCHS,
            class_weights=class_weights,
            sample_weights=sample_weights,
            test_sample_weights=test_sample_weights,
        )

        log_record = (model_name, accuracies)
        performance_log.append(log_record)

        with open('data/performance_log.jsonl', 'w') as f:
            for record in performance_log:
                f.write(json.dumps(record) + '\n')

        print(log_record)
        sys.stdout.flush()
    except KeyboardInterrupt as e:
        print('Aboring training run!')
        sys.stdout.flush()

print('Completed experiment loop!')

kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 16
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 6/10
Epoch 00006: ReduceLROnPlateau reducing learning rate to 0.0005.
Restoring model weights from the end of the best epoch.
Epoch 00006: early stopping


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 16', [(0.0010656878421522779, 'loss'), (0.05405273288488388, 'categorical_accuracy'), (0.03301475569605827, 'precision_35'), (0.9999178051948547, 'recall_35'), (0.03324517980217934, 'binary_accuracy'), (1.9324488639831543, 'hinge'), (0.5297659039497375, 'auc_35'), (0.004717678297311068, 'accuracy'), (0.9654586315155029, 'mean_absolute_error'), (965447360.0, 'mean_absolute_percentage_error'), (97341.0, 'true_positives_35'), (2851068.0, 'false_positives_35'), (703.0, 'true_negatives_35'), (8.0, 'false_negatives_35'), (0.04099206047835016, 'val_f1_score')])
kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 32
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 6/10
Epoch 00006: ReduceLROnPlateau 

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 32', [(0.000601937157067578, 'loss'), (0.04169107973575592, 'categorical_accuracy'), (0.033014386892318726, 'precision_36'), (0.9999281167984009, 'recall_36'), (0.03322449326515198, 'binary_accuracy'), (1.9315993785858154, 'hinge'), (0.5491718053817749, 'auc_36'), (0.001862250384874642, 'accuracy'), (0.9646093845367432, 'mean_absolute_error'), (964596992.0, 'mean_absolute_percentage_error'), (97342.0, 'true_positives_36'), (2851130.0, 'false_positives_36'), (641.0, 'true_negatives_36'), (7.0, 'false_negatives_36'), (0.03684885556186531, 'val_f1_score')])
kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 64
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 6/10
Epoch 00006: ReduceLROnPlateau 

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 64', [(2.742876708188291e-05, 'loss'), (0.03614095225930214, 'categorical_accuracy'), (0.0330096073448658, 'precision_37'), (1.0, 'recall_37'), (0.033012568950653076, 'binary_accuracy'), (1.9334360361099243, 'hinge'), (0.508758544921875, 'auc_37'), (0.0015075684059411287, 'accuracy'), (0.9664439558982849, 'mean_absolute_error'), (966440576.0, 'mean_absolute_percentage_error'), (97349.0, 'true_positives_37'), (2851762.0, 'false_positives_37'), (9.0, 'true_negatives_37'), (0.0, 'false_negatives_37'), (0.03450438039953967, 'val_f1_score')])
kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 128
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 6/10
Epoch 00006: ReduceLROnPlateau reducing learnin

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 128', [(-1.2996239576660665e-06, 'loss'), (0.04990234225988388, 'categorical_accuracy'), (0.033009517937898636, 'precision_38'), (1.0, 'recall_38'), (0.0330098532140255, 'binary_accuracy'), (1.933701992034912, 'hinge'), (0.5031223893165588, 'auc_38'), (0.002716064453125, 'accuracy'), (0.9667128324508667, 'mean_absolute_error'), (966711424.0, 'mean_absolute_percentage_error'), (97349.0, 'true_positives_38'), (2851770.0, 'false_positives_38'), (1.0, 'true_negatives_38'), (0.0, 'false_negatives_38'), (0.03973501890505257, 'val_f1_score')])
kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 16
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 6/10
Epoch 00006: ReduceLROnPlateau reducing learning 

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 16', [(0.00018007040202080068, 'loss'), (0.0381673164665699, 'categorical_accuracy'), (0.03300952911376953, 'precision_39'), (1.0, 'recall_39'), (0.033010195940732956, 'binary_accuracy'), (1.9326425790786743, 'hinge'), (0.5290810465812683, 'auc_39'), (0.004009670577943325, 'accuracy'), (0.965654730796814, 'mean_absolute_error'), (965645760.0, 'mean_absolute_percentage_error'), (97349.0, 'true_positives_39'), (2851769.0, 'false_positives_39'), (2.0, 'true_negatives_39'), (0.0, 'false_negatives_39'), (0.035401546723535605, 'val_f1_score')])
kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 32
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 6/10
Epoch 00006: ReduceLROnPlateau reducing learnin

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 32', [(1.3968878259647729e-05, 'loss'), (0.05078125, 'categorical_accuracy'), (0.033010125160217285, 'precision_40'), (1.0, 'recall_40'), (0.033028170466423035, 'binary_accuracy'), (1.9334862232208252, 'hinge'), (0.5070258378982544, 'auc_40'), (0.00433315709233284, 'accuracy'), (0.966495156288147, 'mean_absolute_error'), (966493632.0, 'mean_absolute_percentage_error'), (97349.0, 'true_positives_40'), (2851716.0, 'false_positives_40'), (55.0, 'true_negatives_40'), (0.0, 'false_negatives_40'), (0.040011168574021934, 'val_f1_score')])
kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 64
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 6/10
Epoch 00006: ReduceLROnPlateau reducing learning rate 

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 64', [(3.6175796661404765e-05, 'loss'), (0.05732421949505806, 'categorical_accuracy'), (0.03300955146551132, 'precision_41'), (1.0, 'recall_41'), (0.033010873943567276, 'binary_accuracy'), (1.9333159923553467, 'hinge'), (0.5117529630661011, 'auc_41'), (0.0030127631034702063, 'accuracy'), (0.9663252234458923, 'mean_absolute_error'), (966321856.0, 'mean_absolute_percentage_error'), (97349.0, 'true_positives_41'), (2851767.0, 'false_positives_41'), (4.0, 'true_negatives_41'), (0.0, 'false_negatives_41'), (0.04189455778160954, 'val_f1_score')])
kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 128
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 00004: ReduceLROnPlateau reducing learning rate to 0.0005.
Epoch 5/10
Epoch 00005: ReduceLROnPlateau reducing learning rate t

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 adam selu 10 class_weights sample_weights no_test_sample_weights 128', [(0.00024194382681509788, 'loss'), (0.03868814930319786, 'categorical_accuracy'), (0.03300962597131729, 'precision_42'), (0.9999897480010986, 'recall_42'), (0.033022742718458176, 'binary_accuracy'), (1.931443691253662, 'hinge'), (0.5554824471473694, 'auc_42'), (0.0005296495510265231, 'accuracy'), (0.9644525051116943, 'mean_absolute_error'), (964441728.0, 'mean_absolute_percentage_error'), (97348.0, 'true_positives_42'), (2851731.0, 'false_positives_42'), (40.0, 'true_negatives_42'), (1.0, 'false_negatives_42'), (0.03562401707992176, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 16
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 16', [(0.4377009195624851, 'loss'), (0.0345052070915699, 'categorical_accuracy'), (0.039889052510261536, 'precision_43'), (0.736874520778656, 'recall_43'), (0.40584999322891235, 'binary_accuracy'), (1.4949902296066284, 'hinge'), (0.6242903470993042, 'auc_43'), (0.00010884602670557797, 'accuracy'), (0.5279985666275024, 'mean_absolute_error'), (514548608.0, 'mean_absolute_percentage_error'), (71734.0, 'true_positives_43'), (1726604.0, 'false_positives_43'), (1125167.0, 'true_negatives_43'), (25615.0, 'false_negatives_43'), (0.03700231862144002, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 32
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 32', [(0.4458680520571458, 'loss'), (0.04587402194738388, 'categorical_accuracy'), (0.04468986764550209, 'precision_44'), (0.6831092238426208, 'recall_44'), (0.5075202584266663, 'binary_accuracy'), (1.4707436561584473, 'hinge'), (0.6430634260177612, 'auc_44'), (0.00015360514225903898, 'accuracy'), (0.5037552714347839, 'mean_absolute_error'), (490695104.0, 'mean_absolute_percentage_error'), (66500.0, 'true_positives_44'), (1421533.0, 'false_positives_44'), (1430238.0, 'true_negatives_44'), (30849.0, 'false_negatives_44'), (0.04527420388506065, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 64
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 64', [(0.3795206688848945, 'loss'), (0.02621256560087204, 'categorical_accuracy'), (0.04330582171678543, 'precision_45'), (0.761589765548706, 'recall_45'), (0.43675464391708374, 'binary_accuracy'), (1.4981143474578857, 'hinge'), (0.6616048812866211, 'auc_45'), (0.00010816786380019039, 'accuracy'), (0.5311237573623657, 'mean_absolute_error'), (519735680.0, 'mean_absolute_percentage_error'), (74140.0, 'true_positives_45'), (1637870.0, 'false_positives_45'), (1213901.0, 'true_negatives_45'), (23209.0, 'false_negatives_45'), (0.03265774039182904, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 128
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 128', [(0.4251003760029562, 'loss'), (0.04270833358168602, 'categorical_accuracy'), (0.04995804280042648, 'precision_46'), (0.7185898423194885, 'recall_46'), (0.5396268367767334, 'binary_accuracy'), (1.4529510736465454, 'hinge'), (0.6805363893508911, 'auc_46'), (3.696017665788531e-05, 'accuracy'), (0.4859619736671448, 'mean_absolute_error'), (473772640.0, 'mean_absolute_percentage_error'), (69954.0, 'true_positives_46'), (1330301.0, 'false_positives_46'), (1521470.0, 'true_negatives_46'), (27395.0, 'false_negatives_46'), (0.046049603536732146, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 16
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 16', [(0.3843434723947818, 'loss'), (0.10877278447151184, 'categorical_accuracy'), (0.044584665447473526, 'precision_47'), (0.7783747315406799, 'recall_47'), (0.44208651781082153, 'binary_accuracy'), (1.4994323253631592, 'hinge'), (0.6732448935508728, 'auc_47'), (5.560980935115367e-05, 'accuracy'), (0.5324429273605347, 'mean_absolute_error'), (520477888.0, 'mean_absolute_percentage_error'), (75774.0, 'true_positives_47'), (1623779.0, 'false_positives_47'), (1227992.0, 'true_negatives_47'), (21575.0, 'false_negatives_47'), (0.06324568382855061, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 32
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 32', [(0.3808861748276589, 'loss'), (0.04011230543255806, 'categorical_accuracy'), (0.044146228581666946, 'precision_48'), (0.7751286625862122, 'recall_48'), (0.43857547640800476, 'binary_accuracy'), (1.4992929697036743, 'hinge'), (0.6661351323127747, 'auc_48'), (0.00014851888408884406, 'accuracy'), (0.532303512096405, 'mean_absolute_error'), (520749984.0, 'mean_absolute_percentage_error'), (75458.0, 'true_positives_48'), (1633816.0, 'false_positives_48'), (1217955.0, 'true_negatives_48'), (21891.0, 'false_negatives_48'), (0.04203270292331171, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 64
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 64', [(0.39128414961354185, 'loss'), (0.02984212152659893, 'categorical_accuracy'), (0.04533908888697624, 'precision_49'), (0.7541628479957581, 'recall_49'), (0.467705100774765, 'binary_accuracy'), (1.4864524602890015, 'hinge'), (0.6714581251144409, 'auc_49'), (0.00012512206740211695, 'accuracy'), (0.519461989402771, 'mean_absolute_error'), (507769568.0, 'mean_absolute_percentage_error'), (73417.0, 'true_positives_49'), (1545870.0, 'false_positives_49'), (1305901.0, 'true_negatives_49'), (23932.0, 'false_negatives_49'), (0.03599342537463639, 'val_f1_score')])
kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 128
Number of devices: 4
Train on 1105920 samples, validate on 122880 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



('kullback_leibler_divergence 0.005 sgd selu 10 class_weights sample_weights no_test_sample_weights 128', [(0.3705319869176795, 'loss'), (0.0391845703125, 'categorical_accuracy'), (0.048270270228385925, 'precision_50'), (0.7710197567939758, 'recall_50'), (0.49063268303871155, 'binary_accuracy'), (1.4774867296218872, 'hinge'), (0.6933334469795227, 'auc_50'), (8.104112203000113e-05, 'accuracy'), (0.5104959011077881, 'mean_absolute_error'), (499449344.0, 'mean_absolute_percentage_error'), (75058.0, 'true_positives_50'), (1479895.0, 'false_positives_50'), (1371876.0, 'true_negatives_50'), (22291.0, 'false_negatives_50'), (0.04325546260475601, 'val_f1_score')])
Completed experiment loop!


In [47]:
%%wandb

UsageError: %%wandb is a cell magic, but the cell body is empty. Did you mean the line magic %wandb (single %)?


In [None]:
# from keras.callbacks import EarlyStopping

# EPOCHS = 4

# history = text_model.fit(
#     X_train,
#     y_train,
#     epochs=EPOCHS,
#     batch_size=BATCH_SIZE,
#     callbacks=[
#         EarlyStopping(monitor='loss', patience=1, min_delta=0.0001),
#         EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001),
#     ],
#     class_weight=class_weights,
#     # sample_weight=train_sample_weights,
#     validation_data=(X_test, y_test)
# )

In [None]:
accr = text_model.evaluate(X_test, y_test) #, sample_weight=test_sample_weights)
[i for i in zip(accr, text_model.metrics_names)]

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

print(history.history)
# summarize history for accuracy
plt.plot(history.history['val_loss'])
plt.plot(history.history['f1_m'])
plt.plot(history.history['categorical_accuracy'])
plt.plot(history.history['mean_absolute_error'])
plt.plot(history.history['precision_m'])
plt.title('model accuracy')
plt.ylabel('metric')
plt.xlabel('epoch')
plt.legend(['val_loss', 'f1', 'categorical accuracy', 'MAE', 'precision'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

In [None]:
import statistics

from sklearn.metrics import hamming_loss, jaccard_score
import keras.backend as K
import tensorflow as tf

y_pred = text_model.predict(X_test)

sess = tf.Session()
best_cutoff = 0
max_score = 0
with sess.as_default():
    for cutoff in [0.0001, 0.001, 0.01, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8]:
        y_pred_bin = K.greater(y_pred, cutoff).eval()
        print('Cutoff: {:,}'.format(cutoff))
        print('Hamming loss: {:,}'.format(
            hamming_loss(y_test, y_pred_bin)
        ))
        scores = []
        for j_type in ['micro', 'macro', 'weighted']:
            j_score = jaccard_score(y_test, y_pred_bin, average=j_type)
            print('Jaccard {} score: {:,}'.format(
                j_type,
                j_score
            ))
            scores.append(j_score)
        print('')
        mean_score = statistics.mean(scores)
        if mean_score > max_score:
            best_cutoff = cutoff
            max_score = mean_score

print('Best cutoff was: {:,} with mean jaccard score of {:,}'.format(best_cutoff, max_score))

In [None]:
y_pred

In [None]:
from sklearn.metrics import classification_report, multilabel_confusion_matrix

y_pred = text_model.predict(X_test, batch_size=32, verbose=1)
y_pred_bool = np.where(y_pred > best_cutoff, 1, 0)

print(classification_report(y_test, y_pred_bool))

print(multilabel_confusion_matrix(y_test, y_pred_bool))

## View the Results

Now lets map from the one-hot-encoded tags back to the text tags and view them alongside the text of the original posts to sanity check the model and see if it really works.

In [None]:
predicted_tags = []
for test, pred in zip(y_test, y_pred_bool):
    tags = []
    for i, val in enumerate(test):
        if pred[i] == 1.0:
            tags.append(sorted_all_tags[i])
    predicted_tags.append(tags)

for text, tags in zip(X_test, predicted_tags):
    print(' '.join(text), tags)