## Review

Our last classsifier was very poor--it operated at chance--a coin flip would have had the same predictive power. A few things may have been going on that casued us to find no signal. I could be we down sampled our images too much and lost useful information, it could be that our model was poorly configured (it was), it could be we were using the wrong model (we were), or it could be all of these. To address all of these issues, we'll spend a little more time constructing model this time--examining the underlying construct of the dara itself, what we really want to learn from it, and how best to model that.  

Convolutional neural networks are commonly used in image classification, but in this particular case, the strength of CNNs is something we want to avoid--namely, that the feature location within the actual image is irrelevant. In our case, where a specific feature shows up in our plot images is very important in classifying if it came from good or bad underlying data. For this reason, we'll go with a different type of neural network, a type of recurrent neural network (RNN) called a "Long short-term memory" neural network (LSTM). This network structure allows for a time dimension, and remembers that dimension over the course of training. Although our data don't have a time dimension, we can pretend that one dimention of our image, in this case the one that is representing by the X axis of the plot, is "time", and pass the dimension representing Y and red/blue/green intensity as out other dimension. This will allow our neural network to take into account how various features (say, a cluster of outlying points) relates to the rest of the image. 

We'll use keras for training this model, since we need more fine-tuned control over our model creation. This has TensorFlow as a backend, and allows us to run these computations on a GPU, which vastly resuces training time. 

## Preparation

We start by importing a few python modules we'll need. `warnings` and `os` for logistics, and the `keras` modules for building the model, loading and processing data, and logging results. 

In [1]:
# keeps warnings from printing for nicer blog output
import warnings 
warnings.filterwarnings('ignore') 

import os

import numpy as np

from keras.models import Sequential, load_model
from keras.layers import Permute, Reshape, LSTM, Dropout, TimeDistributed, Dense, Activation, Flatten
from keras import optimizers

from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import CSVLogger, EarlyStopping

Using TensorFlow backend.


These "callbacks" will be used during training to tell keras to write statistics about training progress to a disk and to halt training if our validation loss starts to decrease (a sign we've had enough training and we're starting to overfit our model). 

In [None]:
# keras callbacks
csv_logger = CSVLogger('tf-log/epoch-log.csv', append=True, separator=';')
early_stopper = EarlyStopping(monitor='val_loss',
                              min_delta=0,
                              patience=2,
                              verbose=0, mode='auto')

In [2]:
os.chdir(os.path.expanduser('~/share/rkingdc-blog/regplot'))

## Building the Model

We start by designing our model. The `input_dim1` is the dimension of our images--256 pixels. Since one axis will be our tiume axis, we need to know this value to make sure out model knows that we have 256 time steps for each of the 256\*3 pixel values. 

In [3]:
input_dim1 = 256
lstm_size = 150
hidden_layer_size = 100
adam_parms = {'lr': 1e-4, 'beta_1': 0.9, 'beta_2': 0.999}

mod = Sequential()

mod.add(Permute((2,1,3), input_shape=(input_dim1,input_dim1,3)))
mod.add(Reshape(target_shape = (input_dim1,input_dim1*3)))

# our hidden layers
mod.add(LSTM(lstm_size, return_sequences=True))
mod.add(LSTM(lstm_size, return_sequences=True))

# dropout 
mod.add(Dropout(0.5))

mod.add(TimeDistributed(Dense(hidden_layer_size), input_shape=(input_dim1, lstm_size) ))

mod.add(Flatten())

mod.add(Dense(1, activation='sigmoid'))

mod.compile(optimizer=optimizers.Adam(**adam_parms), loss='binary_crossentropy', metrics=['accuracy'])
mod.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
permute_1 (Permute)          (None, 256, 256, 3)       0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 256, 768)          0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 256, 150)          551400    
_________________________________________________________________
lstm_2 (LSTM)                (None, 256, 150)          180600    
_________________________________________________________________
dropout_1 (Dropout)          (None, 256, 150)          0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, 256, 100)          15100     
_________________________________________________________________
flatten_1 (Flatten)          (None, 25600)             0         
__________

## Pre-procesing Data

In image processing we'll want to pre-proccess our images before we train a model on them, by adding some random stretching, blurring, rotating, etc. Keras has utilities included to make this easier. 

In [4]:
train_gen = ImageDataGenerator(rescale = 1/255)
test_gen = ImageDataGenerator(rescale = 1/255)

In [5]:
train = train_gen.flow_from_directory('data/imgs/train',
                                      shuffle=True,
                                      batch_size=36,
                                      class_mode='binary')
test = test_gen.flow_from_directory('data/imgs/test',
                                    shuffle=True,
                                    batch_size=36,
                                    class_mode='binary')

Found 16000 images belonging to 2 classes.
Found 4000 images belonging to 2 classes.


In [6]:
mod.fit_generator(train,
       epochs=15,
       verbose=0,
       validation_data=test,
       callbacks=[csv_logger, early_stopper])

<keras.callbacks.History at 0x7f031ccea9b0>

In [7]:
from datetime import date
mod.save(f'trained_model_1_{str(date.today())}.h5')

In [8]:
model_eval = mod.evaluate_generator(test, use_multiprocessing=True, workers=2)
print(mod.metrics_names)
print(model_eval)

['loss', 'acc']
[0.02431063937046565, 0.9917499966025353]
