# S16 RNN LSTM Assignment

### Data Location 
http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption

**Attribute Information:**

1. date: Date in format dd/mm/yyyy 
2. time: time in format hh:mm:ss 
3. global_active_power: household global minute-averaged active power (in kilowatt) 
4. global_reactive_power: household global minute-averaged reactive power (in kilowatt) 
5. voltage: minute-averaged voltage (in volt) 
6. global_intensity: household global minute-averaged current intensity (in ampere) 
7. sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered). 
8. sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light. 
9. sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.


## Assignment:
This is a supervised learning problem. Formulated it to predict the `Global_active_power` at the current time `t` given the `Global_active_power` measurement and all other other features at the prior time step.

You can chose to use `Global_active_power` alone to compare the results


In [1]:
# Import the packages


In [2]:
# Global Paramaters

In [3]:
## Import the data and preprocess

In [4]:
##  Clean up  missing values and 'nan'

## LSTM Data Preparation and feature engineering

- Think how can you prepare features out of given records.

# Model Architecture

1. LSTM with 100 neurons in the first visible layer (**Q:** would you like ot change it and why?)
2. dropout 20%
4. 1 neuron in the output layer for predicting Global_active_power. 
5. The input shape will be 1 time step with 7 features.
6. Use the Mean Absolute Error (MAE) loss function and the efficient Adam gradient descent.
7. The model will be fit for suitable training epochs with a suitable batch size.

In [None]:
model = tf.keras.Sequential()
initializer = tf.keras.initializers.HeNormal()
model.add(tf.keras.layers.LSTM(100,
                               activation='relu',
                               kernel_initializer=initializer,
                               input_shape=(X_train.shape[1],
                                            X_train.shape[2])))
model.add(tf.keras.layers.Dropout(0.2))

model.add(tf.keras.layers.Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

> tf.keras.layers.LSTM(
    units, activation='tanh', recurrent_activation='sigmoid', use_bias=True,
    kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal',
    bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None,
    recurrent_regularizer=None, bias_regularizer=None, activity_regularizer=None,
    kernel_constraint=None, recurrent_constraint=None, bias_constraint=None,
    dropout=0.0, recurrent_dropout=0.0, implementation=2, return_sequences=False,
    return_state=False, go_backwards=False, stateful=False, time_major=False,
    unroll=False, **kwargs
)

## Arguments: 
Arguments|Description
:--|:--
units|Positive integer, dimensionality of the output space.
activation|Activation function to use. Default: hyperbolic tangent (tanh). If you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
recurrent_activation|Activation function to use for the recurrent step. Default: sigmoid (sigmoid). If you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
use_bias|Boolean (default True), whether the layer uses a bias vector.
kernel_initializer|Initializer for the kernel weights matrix, used for the linear transformation of the inputs. Default: glorot_uniform.
recurrent_initializer|Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. Default: orthogonal.
bias_initializer|Initializer for the bias vector. Default: zeros.
unit_forget_bias|Boolean (default True). If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force bias_initializer="zeros". This is recommended in Jozefowicz et al..
kernel_regularizer|Regularizer function applied to the kernel weights matrix. Default: None.
recurrent_regularizer|Regularizer function applied to the recurrent_kernel weights matrix. Default: None.
bias_regularizer|Regularizer function applied to the bias vector. Default: None.
activity_regularizer|Regularizer function applied to the output of the layer (its "activation"). Default: None.
kernel_constraint|Constraint function applied to the kernel weights matrix. Default: None.
recurrent_constraint|Constraint function applied to the recurrent_kernel weights matrix. Default: None.
bias_constraint|Constraint function applied to the bias vector. Default: None.
dropout|Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. Default: 0.
recurrent_dropout|Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state. Default: 0.
implementation|Implementation mode, either 1 or 2. Mode 1 will structure its operations as a larger number of smaller dot products and additions, whereas mode 2 will batch them into fewer, larger operations. These modes will have different performance profiles on different hardware and for different applications. Default: 2.
return_sequences|Boolean. Whether to return the last output. in the output sequence, or the full sequence. Default: False.
return_state|Boolean. Whether to return the last state in addition to the output. Default: False.
go_backwards|Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
stateful|Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
time_major|The shape format of the inputs and outputs tensors. If True, the inputs and outputs will be in shape [timesteps, batch, feature], whereas in the False case, it will be [batch, timesteps, feature]. Using time_major = True is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
unroll|Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences. 

In [None]:
model.summary()

In [None]:
# fit network


In [None]:
history.history.keys()

In [None]:
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper right')
plt.show()

## Making Predictions

In [None]:
# lets scale the X_test too


In [None]:
# Invert scale pred

# invert scale for actual


In [None]:
# calculate RMSE

# calculate R2 Score
