# Part 1.2 - Extract Predictions for RNN 
In this notebook, we will load a pre-trained RNN model and run both our train and test samples through it, extracting the bottleneck features (the outputs of the layer just before the fully-connected / classification layer(s)). 

These features will be used to represent the embedded timeseries features for each sample. They will be concatenated with other features to train our final XGBoost classification model. 

In [1]:
import os
GPU_id = 0
os.environ['CUDA_VISIBLE_DEVICES'] = str(GPU_id)

In [2]:
import warnings
warnings.filterwarnings("ignore")
import math
import pandas as pd
import numpy as np
import time
import tensorflow as tf
from rnn import PlasticcRNN
import matplotlib.pyplot as plt
%matplotlib inline

print(tf.__version__)

1.13.1


### Load Train & Test Data

In [3]:
train = pd.read_pickle('train_rnn.pkl')
test = pd.read_pickle('test_rnn.pkl')

### Load pre-trained RNN model

Set relevant parameters and load the model. You can find the code for this model in `rnn.py` if you'd like to look further into the RNN implementation.

In [4]:
model = PlasticcRNN('weight/rnn.npy')

Call `predict_bottleneck` to feed each training example through the pre-trained RNN model and extract the outputs from the layer just before the final classification layer .

In [5]:
train_bn = model.predict_bottleneck(train)

Instructions for updating:
Colocations handled automatically by placer.

For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

Instructions for updating:
This class is equivalent as tf.keras.layers.GRUCell, and will be replaced by that in Tensorflow 2.0.
Instructions for updating:
Please use `keras.layers.Bidirectional(keras.layers.RNN(cell))`, which is equivalent to this API
Instructions for updating:
Please use `keras.layers.RNN(cell)`, which is equivalent to this API
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/bias:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/bias:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional

  0%|          | 0/3 [00:00<?, ?it/s]

restore RNN/rnn4/bidirectional_rnn/bw/gru_cell/gates/kernel:0
restore RNN/rnn4/bidirectional_rnn/bw/gru_cell/gates/bias:0
restore RNN/rnn4/bidirectional_rnn/bw/gru_cell/candidate/kernel:0
restore RNN/rnn4/bidirectional_rnn/bw/gru_cell/candidate/bias:0


4it [00:04,  1.15s/it]                       


Call `predict_bottleneck` to do the same with the testing data. This can take a little time, so it might be worthwhile to move onto the next nextbook and return to this once it's complete. 

In [6]:
test_bn = model.predict_bottleneck(test)

restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/gates/bias:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/kernel:0
restore RNN/rnn3/bidirectional_rnn/fw/gru_cell/candidate/bias:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/gates/kernel:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/gates/bias:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/candidate/kernel:0
restore RNN/rnn3/bidirectional_rnn/bw/gru_cell/candidate/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/gates/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/gates/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/candidate/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/gru_cell/candidate/bias:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/kernel:0
restore RNN/rnn5/bidirectional_rnn/fw/output_projection_wrapper/bia

722it [08:55,  1.82it/s]                         


Let's verify that we have embedded each of our timeseries into 16-dimensional space. 

In [8]:
print(train_bn.shape)
print(test_bn.shape)

(7848, 16)
(1477022, 16)


### Convert Bottleneck Features to DataFrames

In [9]:
train_bn = pd.DataFrame(train_bn,columns=['bottleneck%d'%i for i in range(train_bn.shape[1])])
train_bn['object_id'] = train.object_id.unique()

In [10]:
train_bn.head()

Unnamed: 0,bottleneck0,bottleneck1,bottleneck2,bottleneck3,bottleneck4,bottleneck5,bottleneck6,bottleneck7,bottleneck8,bottleneck9,bottleneck10,bottleneck11,bottleneck12,bottleneck13,bottleneck14,bottleneck15,object_id
0,39.732155,0.481644,9.545387,0.620394,0.356216,36.23687,0.001318,0.045979,0.075107,7.4299,0.131433,0.000731,0.45989,56.466602,0.147015,20.742071,615
1,2.14168,6.474412,8.759027,0.11426,2.240716,9.736775,7.122033,10.171883,3.942687,1.042941,1.473614,0.025092,6.322167,1.655638,3.261944,18.709698,713
2,1.139721,28.513445,0.475544,0.954771,9.411226,1.062097,18.426237,2.884042,2.42417,1.874314,0.666819,10.709209,5.510745,0.388815,8.28803,2.827858,730
3,13.054786,24.500618,3.417213,2.564089,0.94926,1.549925,21.927141,6.542571,6.207672,4.959156,2.384908,2.566332,9.735011,0.954083,17.784239,3.097169,745
4,2.804562,22.519892,5.931814,2.715422,2.990609,2.720287,20.145376,4.909363,5.238785,2.41849,0.306978,3.099148,7.483637,0.263367,8.544586,1.836025,1124


In [11]:
test_bn = pd.DataFrame(test_bn,columns=['bottleneck%d'%i for i in range(test_bn.shape[1])])
test_bn['object_id'] = test.object_id.unique()

In [12]:
test_bn.head()

Unnamed: 0,bottleneck0,bottleneck1,bottleneck2,bottleneck3,bottleneck4,bottleneck5,bottleneck6,bottleneck7,bottleneck8,bottleneck9,bottleneck10,bottleneck11,bottleneck12,bottleneck13,bottleneck14,bottleneck15,object_id
0,0.951065,0.251945,3.318633,0.000736,2.441339,2.927944,1.147542,4.429283,3.313076,0.003929,1.196651,0.306426,1.503734,0.039145,0.131615,0.641174,49433749
1,10.752055,2.290999,3.327535,1.669374,1.118064,1.246555,3.30348,7.597229,10.859291,0.491331,6.444759,3.714051,2.977882,0.7275,8.415606,5.365532,49433769
2,0.78358,7.8496,0.511416,2.363377,3.164022,1.021771,7.064289,3.281812,5.019171,1.784289,4.517395,2.340687,1.91958,0.080435,5.031864,4.927163,49433826
3,5.999395,4.163189,2.212542,0.434298,1.15495,4.638714,7.524287,6.470583,6.13854,1.092754,4.615685,5.806994,3.47719,0.701255,8.968193,10.473462,49433842
4,2.767842,8.570052,2.039266,0.764213,0.91889,2.032316,1.962744,1.229088,11.210056,3.119527,1.505223,8.089897,3.605143,4.458409,10.70297,5.939607,49433919


### Store Features to Disk

In [13]:
train_bn.to_pickle('train_bn.pkl')
test_bn.to_pickle('test_bn.pkl')