# CNN1D With Generator
In this notebook I will create a basic 1D convolutional network and train it on a generator.  1D convolutional networks can be a useful tool if 
the present observation's outcome is in some way dependant on the previous datapoints.  Note that cnns are not limited to data in this format, as seen by @baosenguo's genius [2nd place solution][1] in the MOA competition.  

The primary takeaway from this kernel is the generator, in my opinion.  Creating a cnn is relatively simple in whatever framework you are using.  Getting the data into the appropriate format is much more difficult.  Samples for a 1D cnn are not just 1 ts_id with its features (1,130).  A sample for a 1D cnn is the current ts_id with the n previous ts_ids (1+n,130).  This notebook's 1Dcnn looks at the previous 2 ts_ids, so its samples are of size (1+2,130).  How do we reorder the data to be this dimension?  

## Data Preparation
A simple way to prepare the data for the model is to divide the data into chunks of size 4.  As the train has 2.4 million observations, this will create 800k
samples of size (3,130).  The advantage of this approach is the simplicity of implementation.  The disadvantages is the reduction of training samples.  Looking at the 2 previous observations reduces the sample size by a factor of 3.  Looking at the previous 1000 observations reduces the training samples by a factor of around 1000!  

An alternative is to duplicate the data such that each individual observation has its 2 preceeding observations.  This yields 2.4 million samples of size (3,130), effectively quadrupling the memory footprint of the data.  If you have the memory, go for it!  Otherwise, this might not be feasable.

One last alternative is to create a generator to create samples on the fly for training.  A generator cycles through each observation during runtime, then creates a sample by combining it with the previous 2 observations to form a sample of size (3,130).  The advantage of a generator is that you can use all data points in the train to train you model while not increasing the memory footprint in doing so.   The disadvantage is that samples are generated during training, leading to somewhat slower training time.  For this competition, the drawback is relatively small; all 5 folds were trained in under 1 hour.  To me, the biggest disadvantage of generators is coding them.  Perhaps I am a n00b, but coding a generator can be rather difficult.  If you wish to use a generator, it is my sincere hope that this code can help you get started.  If you want a more in depth at creating generators, I strongly recommend this [Stanford blog post][3].  I used that post to create the generator for this notebook.


[1]: https://www.kaggle.com/c/lish-moa/discussion/202256
[3]: https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

In [None]:
import warnings
warnings.filterwarnings('ignore')

import os, gc
from time import time
import pandas as pd
import numpy as np
import sklearn
import janestreet
from sklearn.model_selection import GroupKFold
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm

import tensorflow as tf
tf.random.set_seed(42)
import tensorflow.keras.backend as K
import tensorflow.keras.layers as layers
from tensorflow.keras.callbacks import Callback, ReduceLROnPlateau, ModelCheckpoint, EarlyStopping

SUBMIT = True
PATH='../input/cnn1d-pretrain4/model_4_finetune.hdf5'
if SUBMIT== True and PATH == '':
    1/0
SAVED_MODEL_DIR = None #only for submissions.  They use trained models.  

### Hyperparameters for the Model

Below are the hyperparameters for the model.  They are 
1. ***NUM_FOLDS***: the number of cross val folds for the model
1. **window_size**: the number of observations in a sample. window_size = current_observation + num_previous_observations  
1. **num_features**: the number of features your model will use in the data.  
1. **filters**: number of neurons in hidden layers of 1Dcnn.
1. **bottleneck_layer**: Dense layer preceeding convolutions.  <=0 if no bottleneck used, >0 for number of neurons in Dense layer.

In [None]:
HYPERPARAMS = dict( NUM_FOLDS = 5,
                    window_size = 3,
                    num_features = 130,
                    filters = [64,64,64],
                    bottleneck_layer= 64, #put <=0 for no bottleneck layer
                    dropout_rate = .1,
                    kernel_size = 3,
                    batch_size = 4096,
                    learning_rate = .001,
                    label_smoothing = 1e-2,
                    pretraining = True)

In [None]:
def nontimeseries_split(df,val_len = 720000, NUM_FOLDS = HYPERPARAMS['NUM_FOLDS']):
    tot_len = df.shape[0]
    
    val_starts = [int((tot_len - val_len)/(NUM_FOLDS -1)*i) for i in range(NUM_FOLDS)]
    folds = []#list of (train_idx, val_idx)
    for val_start in val_starts:
        dates_val =  df.loc[val_start:(val_start + val_len -1), 'date'].unique()
        dates_train= list(set(df.date.unique()) - set(dates_val) )
        
        val_idx = df[df.date.isin(dates_val)].index
        train_idx = df[df.date.isin(dates_train)].index
        
        folds.append((train_idx, val_idx))
    return folds


#From Yirun Zhang: https://www.kaggle.com/gogo827jz/optimise-speed-of-filling-nan-function
from numba import njit
@njit
def fillna_npwhere_njit(array, values= 0):
    if np.isnan(array.sum()):
        array = np.where(np.isnan(array), values, array)
    return array

def Feature_Engineering(X, inference = False):
    X = X.loc[:, X.columns.str.contains('feature')]
    X = fillna_npwhere_njit(X.values, 0)
    return X

#Loading the data
train = pd.read_parquet('../input/parquet/train.parquet')

FEATURES = [feat for feat in train.columns if 'feature' in feat]
f_mean = train[FEATURES].mean()
train[FEATURES] = train[FEATURES].fillna(f_mean)
train['action'] = (train['resp'] > 0).astype('int')
f_mean = train[FEATURES].mean().values

# The 1D Convolutional Network

In [None]:
#####################################################################
#Shape: (window_size, num_features)
#dropout_rate: (float) the dropout rate for network.
#filters: (list of floats) the filter sizes for the conv blocks. This 
    #is effectively the size of the network
#kernel_size: (int) the size of the kernel
######################################################################
def makeConv(shape = (HYPERPARAMS['window_size'], HYPERPARAMS['num_features']), 
             filters = HYPERPARAMS['filters'], dropout_rate = HYPERPARAMS['dropout_rate'],
             learning_rate = HYPERPARAMS['learning_rate'], label_smoothing =HYPERPARAMS['label_smoothing'],
             kernel_size = HYPERPARAMS['kernel_size']):
    inp = tf.keras.layers.Input(shape = shape)
    x = tf.keras.layers.BatchNormalization()(inp)
    x = tf.keras.layers.Dropout(dropout_rate)(x)
    
    #Bottleneck Layer
    if HYPERPARAMS['bottleneck_layer'] >0:
        x = tf.keras.layers.Dense(HYPERPARAMS['bottleneck_layer'])(x)
        x = tf.keras.layers.BatchNormalization()(x)
        x = tf.keras.layers.Dropout(dropout_rate)(x)
    
    
    #Conv Blocks
    for filter_ in filters:
        x = tf.keras.layers.Conv1D(filters=filter_, kernel_size=kernel_size, strides=1, 
                   padding='causal')(x)
        x = tf.keras.layers.BatchNormalization()(x)
        x = tf.keras.layers.Activation(tf.keras.activations.swish)(x)
        

    
    #Dense output
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(100, activation='relu', name='last_dense')(x)
    out = tf.keras.layers.Dense(1, activation='sigmoid', name = 'output')(x)
    
   
    model = tf.keras.models.Model(inputs=inp, outputs=out)
    model.compile(optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate),
                  loss = tf.keras.losses.BinaryCrossentropy(label_smoothing = label_smoothing), 
                  metrics = tf.keras.metrics.AUC(name = 'AUC'), 
                 )
    
    return model
    
model = makeConv()
tf.keras.utils.plot_model(model, show_shapes=True)

# The Generator

In [None]:
###############################################################################
#Generator: used to generate (minibatch, window_size, num_features)
#INPUTS:
    #train: (pandas df) the train with all feature engineering done. DO NOT TAKE OUT WEIGHT==0 ts_id DAYS!!! THIS IS IMPORTANT! MAYBE!!!
    #features: (list) all features to be trained on
    #TARGET: (list) all target features 
    #batch_size: (int) size of each minibatch
    #window_size: (int) number of ts_id that the CNN will look at per sample
#STEPS:
    #1.  Decides minibatch label
    #2.  Picks MINIBATCH_SIZE paired embeddings
###############################################################################

class Generator(tf.keras.utils.Sequence):
    'Generates data for Keras'
    def __init__(self, train, features, TARGET, batch_size=32, window_size=100, shuffle = True):
        'Initialization'
        self.features = features
        self.TARGET = TARGET
        self.batch_size = batch_size
        self.window_size = window_size
        
        
        #Getting the indices that have the correct window specs
        train['date_day'] = train['ts_id'] - train['date'].map(train.groupby('date')['ts_id'].min())
        self.idx = train.index[(train.date_day>=(self.window_size-1)) & (train.weight !=0)].to_numpy()
        
        self.date_weight_resp = train.loc[self.idx, ['date','weight','resp']]
        
        self.train = train[self.features].values
        self.labels = train[self.TARGET].values
        self.num_features = len(self.features)
        self.num_labels = len(TARGET)
        
        if shuffle==True:
            self.on_epoch_end()

    def __len__(self):
        'Denotes the number of batches per epoch'
        return len(self.idx) // self.batch_size

    def __getitem__(self, index):
        'Generate one batch of data'
        indices_for_minibatch = self.idx[(index*self.batch_size):(self.batch_size * (index+1))]
        train, target = self.get_minibatch(indices_for_minibatch)
        
        return train, target

    def on_epoch_end(self):
        'Shuffles the order of the minibatches'
        np.random.shuffle(self.idx)
        gc.collect()
            
    def get_minibatch(self, indices_for_minibatch):
        'Generates one minibatch and its val'
        train_minibatch = np.zeros(shape= (len(indices_for_minibatch), self.window_size, self.num_features))
        val_minibatch = np.zeros(shape= (len(indices_for_minibatch), self.num_labels))
        for minibatch_index, train_index in enumerate(indices_for_minibatch):
            train_minibatch[minibatch_index] = self.train[(train_index - (self.window_size - 1)):(train_index+1)]
            val_minibatch[minibatch_index] =  self.labels[train_index]
            
        return train_minibatch, val_minibatch
    
    def get_labels(self):
        return self.labels
    
    def get_date_weight_resp(self):
        'Only is useful if shuffle is False'
        return self.date_weight_resp

In [None]:
####################
#Pretraining
###################
if SUBMIT ==False and HYPERPARAMS['pretraining']:
    
    #Creating the model trained from resp_n; n:1-4
    model = makeConv()
    new_model = tf.keras.Model(inputs= model.input, outputs=model.get_layer(name='last_dense').output)
    out = tf.keras.layers.Dense(4, activation='sigmoid')(new_model.output)
    newest_model = tf.keras.Model(inputs=new_model.input, outputs = out)
    newest_model.compile(optimizer=tf.optimizers.Adam(), loss='mse')
    
    #Creating the data
    TARGET = [f'resp_{i}' for i in range(1,5)]
    train_gen = Generator(train= train.copy(), features = [feat for feat in train.columns if 'feature' in feat], 
            TARGET= TARGET, batch_size = HYPERPARAMS['batch_size'], 
            window_size= HYPERPARAMS['window_size'])

    #Pretraining Model
    newest_model.fit(train_gen, epochs = 3)
    
    #Saving the model_weights
    ckp_path = f'pretraining.hdf5'
    model.save_weights(ckp_path)
    
    del train_gen; gc.collect()

In [None]:
if SUBMIT ==False:
    for fold_num, (tr_idx, val_idx) in enumerate(nontimeseries_split(train)):
    #for fold_num, (tr_idx, val_idx) in enumerate(date_ts_split(train)):
        start = time()
        tf.keras.backend.clear_session()
        print(f'Starting fold {fold_num+1} of {HYPERPARAMS["NUM_FOLDS"]}')
        
        
        #Preparing the generators
        train_fold = train.iloc[tr_idx].copy()
        train_fold.reset_index(drop=True,inplace=True)
        val_fold = train.iloc[val_idx].copy()
        val_fold.reset_index(drop=True, inplace=True)
        train_gen = Generator(train= train_fold, features = [feat for feat in train.columns if 'feature' in feat], 
                TARGET=['action'], batch_size = HYPERPARAMS['batch_size'], 
                window_size= HYPERPARAMS['window_size'])
        val_gen = Generator(train= val_fold, features = [feat for feat in train.columns if 'feature' in feat], 
                TARGET=['action'], batch_size = HYPERPARAMS['batch_size'], 
                window_size= HYPERPARAMS['window_size'], shuffle=False)
        del train_fold, val_fold; gc.collect()
        
        
        #Preparing Callbacks
        ckp_path = f'model_{fold_num}.hdf5'
        rlr = ReduceLROnPlateau(monitor = 'val_AUC', factor = 0.1, patience = 3, verbose = 0, 
                             min_delta = 1e-4, mode = 'max')
        ckp = ModelCheckpoint(ckp_path, monitor = 'val_AUC', verbose = 0, 
                           save_best_only = True, save_weights_only = True, mode = 'max')
        es = EarlyStopping(monitor = 'val_AUC', min_delta = 1e-4, patience = 7, mode = 'max', 
                        baseline = None, restore_best_weights = True, verbose = 0)
        
        #Creating and Training Model
        model = makeConv()
        if HYPERPARAMS['pretraining']:
            model.load_weights('pretraining.hdf5')
        H = model.fit(train_gen, validation_data = val_gen, epochs = 1, callbacks = [rlr, ckp, es])
        
        #Plotting the models
        df = pd.DataFrame(H.history)
        df.plot(y=['loss','val_loss'], kind='line')
        df.plot(y=['AUC', 'val_AUC'], kind='line')
        
        
        #Training the model for 3 more epochs on only the val at very weak lr
        model = makeConv(learning_rate = HYPERPARAMS['learning_rate']/100)
        model.load_weights(ckp_path)
        model.fit(val_gen, epochs = 3)
        model.save_weights(f'model_{fold_num}_finetune.hdf5')
        
        #Freeing memory
        del train_gen, val_gen; gc.collect()

else:
    model = makeConv()
    model.load_weights(PATH)
    env = janestreet.make_env()
    env_iter = env.iter_test()
    
    tmp = np.zeros((1, 4000000, HYPERPARAMS['num_features']))
    opt_th = 0.5
    for index, (test_df, pred_df) in enumerate(tqdm(env_iter)):
        #Cleaning the row and putting it in the numpy container
        row = Feature_Engineering(test_df)
        tmp[0, index, :] = row

        if index < HYPERPARAMS['window_size']-1:
            pred_df.action = 0
        elif test_df['weight'].item() > 0:
            pred = model(tmp[0:1, (index - HYPERPARAMS['window_size'] +1):index+1, :])
            pred_df.action = np.where(pred >= opt_th, 1, 0).astype(int)
        else:
            pred_df.action = 0
        env.predict(pred_df)

# When you love TensorFlow but TensorFlow doesn't love you back
If you have trained a network, you have undoubtedly noticed that the RAM is near maxed out.  After all this talk about generator memory efficiency, why are we almost out of memory after running the models?  Let's investigate.
```
train_gen = Generator(train= train.copy(), features = [feat for feat in train.columns if 'feature' in feat], 
                TARGET=['action'], batch_size = HYPERPARAMS['batch_size'], 
                window_size= HYPERPARAMS['window_size'])
for index in range(len(train_gen)):
    a = train_gen.__getitem__(index)
del train_gen; gc.collect()
```
The above code creates a generator from all the train, cycles through the data creating batches shape (batch_size, window_size, num_features).
Deleting the generator frees all memory.  No problem memory problems are created when using the generator to cycle through all the data.
```
#Ludicrous callback to delete and gc.collect() every minibatch after creating via the generator
#Unnecessary, but done to remove all doubt that this is a tf problem.
class Ludicrous_Memory_Freer(tf.keras.callbacks.Callback):
    def on_train_batch_begin(self, batch, logs=None):
        del batch; gc.collect()
        
train_gen = Generator(train= train.copy(), features = [feat for feat in train.columns if 'feature' in feat], 
                TARGET=['action'], batch_size = HYPERPARAMS['batch_size'], 
                window_size= HYPERPARAMS['window_size'])
model = makeConv()
model.fit(train_gen, callbacks=[Ludicrous_Memory_Freer()])
del train_gen, model; gc.collect()
tf.keras.backend.clear_session()
```
The above code runs creates a generator, cycles through the data creating batches shape (batch_size, window_size, num_features), runs the batches through a tf model, deletes the batches immediately after consumption, deletes everything else, then clears the tf.session.  

Unlike the previous block, the RAM does not go down after this code.  Both blocks ran a generator through all data, yet only the first freed up memory.  I believe tf has a memory leak under the hood.  The good news is that the program runs and that tf is usually pretty good with dealing with bugs.  