# Part 1.2 - Extract Predictions for RNN 
In this notebook, we will load a pre-trained RNN model, run our train and test data through it, and extract the set of bottleneck features from it (the outputs of the layer just before the fully connected / classification layer). These features will be used to represent embedded timeseries features, which we can concatenate with other features to train our final XGBoost classification model. 

In [1]:
import os
GPU_id = 0
os.environ['CUDA_VISIBLE_DEVICES'] = str(GPU_id)

In [2]:
import warnings
warnings.filterwarnings("ignore")
import math
import pandas as pd
import numpy as np
import time
import tensorflow as tf
from rnn import PlasticcRNN
import matplotlib.pyplot as plt
%matplotlib inline

print(tf.__version__)

1.11.0


### Load Train & Test Data

In [3]:
train = pd.read_pickle('train_rnn.pkl')
test = pd.read_pickle('test_rnn.pkl')

### Load pre-trained RNN model

Set relevant parameters and load the model. You can find the code for this model in `rnn.py` if you'd like to look further into the RNN implementation.

In [4]:
params = {
        'load_path':'weight/rnn.npy',
        'hidden':64,
        'bottleneck':True,
        'classes':14,
        'num_features':4,
        'embedding_size':4,
        'stratified':True,
        'objective':'multiclassification',
        'metric':'cross_entropy',
        'save_path':'weights',      
        'epochs':100,
        'early_stopping_epochs':10,
        'learning_rate':0.01,
        'batch_size':2048,
        'verbosity':10,
    }

In [5]:
model = PlasticcRNN(**params)

Call `predict_bottleneck` to feed each training example through the RNN and extract the outputs from the layer just before the final classification layer .

In [6]:
train_bn = model.predict_bottleneck(train)

restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/bias:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/bias:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/gates/bias:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/candidate/kernel:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/candidate/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/gates/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/gates/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/candidate/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/candidate/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/bia

  0%|          | 0/3 [00:00<?, ?it/s]

restore RNN/rnn4/bidirectional_rnn/bw/gru_cell/candidate/bias:0


4it [00:08,  2.22s/it]                       


Call `predict_bottleneck` to do the same with the testing data. This can take a little time, so it might be worthwhile to move onto the next nextbook and return to this once it's complete. 

In [8]:
test_bn = model.predict_bottleneck(test)

restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/bias:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/bias:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/gates/bias:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/candidate/kernel:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/candidate/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/gates/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/gates/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/candidate/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/candidate/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/bia

722it [16:30,  1.08s/it]                         


Let's verify that we have embedded each of our timeseries into 16-dimensional space. 

In [9]:
print(train_bn.shape)
print(test_bn.shape)

(7848, 16)
(1477022, 16)


### Convert Bottleneck Features to DataFrames

In [19]:
train_bn = pd.DataFrame(train_bn,columns=['bottleneck%d'%i for i in range(train_bn.shape[1])])
train_bn['object_id'] = train.object_id.unique()

In [13]:
train_bn.head()

Unnamed: 0,bottleneck0,bottleneck1,bottleneck2,bottleneck3,bottleneck4,bottleneck5,bottleneck6,bottleneck7,bottleneck8,bottleneck9,bottleneck10,bottleneck11,bottleneck12,bottleneck13,bottleneck14,bottleneck15,bottleneck16,object_id
0,39.732151,0.481644,9.54537,0.620394,0.356216,36.236847,0.001318,0.045979,0.075107,7.429893,0.131433,0.000731,0.459892,56.466637,0.147015,20.742081,,615
1,2.141679,6.474417,8.759014,0.114259,2.240714,9.736774,7.122031,10.171884,3.942687,1.04294,1.473609,0.025091,6.322166,1.655637,3.261946,18.709688,,713
2,1.139722,28.513437,0.475543,0.954774,9.411217,1.062097,18.426241,2.884042,2.424168,1.874315,0.666815,10.709197,5.510744,0.388814,8.288029,2.827859,,730
3,13.054775,24.500601,3.417211,2.564081,0.949257,1.549924,21.927143,6.542567,6.207689,4.959152,2.384895,2.566335,9.735003,0.954085,17.784241,3.097171,,745
4,2.80456,22.51988,5.93181,2.715422,2.990606,2.720281,20.145365,4.909352,5.238781,2.418495,0.306975,3.099148,7.483629,0.263368,8.544585,1.836024,,1124


In [14]:
test_bn = pd.DataFrame(test_bn,columns=['bottleneck%d'%i for i in range(test_bn.shape[1])])
test_bn['object_id'] = test.object_id.unique()

In [15]:
test_bn.head()

Unnamed: 0,bottleneck0,bottleneck1,bottleneck2,bottleneck3,bottleneck4,bottleneck5,bottleneck6,bottleneck7,bottleneck8,bottleneck9,bottleneck10,bottleneck11,bottleneck12,bottleneck13,bottleneck14,bottleneck15,object_id
0,0.951065,0.251944,3.318631,0.000736,2.441335,2.927944,1.147541,4.429285,3.313077,0.003929,1.19665,0.306425,1.503735,0.039145,0.131615,0.641173,49433749
1,10.752063,2.291001,3.327534,1.669377,1.118063,1.246555,3.30348,7.597233,10.859297,0.491332,6.444759,3.714056,2.977883,0.727499,8.415609,5.365536,49433769
2,0.78358,7.8496,0.511415,2.363375,3.164018,1.02177,7.064287,3.281813,5.019171,1.784291,4.517396,2.340689,1.91958,0.080436,5.031866,4.927161,49433826
3,5.999391,4.163189,2.212541,0.434298,1.154951,4.638716,7.524286,6.470578,6.138537,1.092754,4.615686,5.806997,3.477196,0.701255,8.968193,10.473463,49433842
4,2.767841,8.570056,2.039268,0.764214,0.91889,2.032314,1.962744,1.229086,11.210052,3.119526,1.505225,8.089894,3.605137,4.458409,10.702964,5.939608,49433919


### Store Features to Disk

In [17]:
train_bn.to_pickle('train_bn.pkl')
test_bn.to_pickle('test_bn.pkl')