<a href="https://colab.research.google.com/github/rahiakela/deep-learning--from-basics-to-practice/blob/24-keras-part-2/rnn_returning_sequence_shape_fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# RNN Returning Sequence Shape Fundamentals

In our deep networks, we use the output of one RNN as the input of another RNN. We saw that the earlier RNN layer (the one providing the input to the next) needed a new argument. Its name was return_sequences and we set it to True (the default is False). 

Now it’s time to make good on our promise to discuss what that argument
is about.

Let’s take this simple network of an RNN that followed by a dense layer.Our goal was to hand the network a sequence of time steps, and then have it predict the next value after the sequence.

<img src='https://github.com/rahiakela/img-repo/blob/master/returning-sequences-shape-0.png?raw=1' width='800'/>

Our samples contained just one feature, which held a series of values
from a 1D curve. These made up the time steps.

When we gave the RNN our sample, it read the first time step and produced
an output. This output was the contents of the internal state,
after passing through the RNN’s internal selection gate. The output
could be thought of as the RNN’s prediction of the next value of the
curve.

<img src='https://github.com/rahiakela/img-repo/blob/master/returning-sequences-shape-1.png?raw=1' width='800'/>

But we didn’t care about that prediction, because we already knew the
second time step. Keras knew we had more time steps to come, so it
automatically ignored that output, and didn’t even send it to the dense
layer.

Instead, it gave the RNN the second time step. Again, the RNN
produced an output, and again, Keras ignored it.

Here we’ve handed the RNN the third time step, and it’s produced the third output, which we’re ignoring.

We did this over and over, handing the RNN sequential time steps and
ignoring the outputs, until we gave it the last time step in the sample.
The output of that time step was the prediction for the value of the
sequence after the end of our inputs, so that output was the value we
were after all along. We fed that to our dense layer, and the output was
the prediction.

Suppose our inputs had more than one feature. If our data held
weather measurements at the top of a mountain, maybe each sample
held temperature, wind speed, and humidity.

<img src='https://github.com/rahiakela/img-repo/blob/master/returning-sequences-shape-2.png?raw=1' width='800'/>

So at each time step we give the RNN the values for all three features at that time step. The output is again the RNN’s internal state after the selection gate, so it has as many elements as there are elements in the internal state.

As before, we get back one such output for every time step input we provide, and we only pay attention to the last one.

First, our input is a 3D tensor. In this example, it has 1 sample, 7 time
steps, and 3 features, so it has dimensions 1 by 7 by 3. 

The number of time steps, and the number of features, don’t appear in the output, which is a 2D tensor of shape 1 by 4. The 1 is because we only care
about one output (the last one), and the 4 comes from the internal
state of the RNN, which we’ve assuming has 4 elements.

We “lost” the number of features because they are used internally by
the RNN to control the forgetting, remembering, and selecting of the
internal state. We “lost” the number of time steps because we chose to
ignore all but the last one.

Let’s expand the picture a little to include the unrolled RNN diagram.

<img src='https://github.com/rahiakela/img-repo/blob/master/returning-sequences-shape-3.png?raw=1' width='800'/>

Once again, the RNN’s internal state has 4 elements. We can see in the
figure that at each time step, an entire row of features is fed to the RNN,
which produces an output. Then the RNN’s state changes, which sets
up the RNN for the next input, as shown by the open downward-pointing
arrow. We only pay attention to the last output.The output is a 2D
grid of shape 1 by 4.

If we want to feed this output to a dense layer, we don’t have to do a
thing.

But suppose we want to take our sequence of outputs and present them
as a sequence of inputs to another RNN layer, as we did in some of our
deep RNN models above. We know that an RNN needs a 3D input, and
the output here is 2D.

We could just give it a depth of 1, producing a shape that’s 1 by 1 by 4.
While this is now legal for an RNN, it doesn’t make any sense. A tensor
with this shape would be interpreted as a single sample (the first 1)
with 1 time step (the second 1), containing 4 features (the 4 at the end).
That’s nothing like our single sample of 5 time steps and 3 features.

Losing the time step information is a big problem, because that’s the
idea at the heart of an RNN. We’re giving our first layer 5 time steps,
and it’s producing 5 outputs. We then want to hand those 5 outputs to
the next layer. Each output will have 4 elements (since we’re supposing
that our RNN has 4 elements of internal state), so those 4 values
will be interpreted by the next RNN as 4 features. But we need the 5
time steps.

<img src='https://github.com/rahiakela/img-repo/blob/master/returning-sequences-shape-4.png?raw=1' width='800'/>

That’s actually easy to do. We just tell Keras to not ignore the output
after each step. We tell it to take the outputs and stack them up to
make a grid. It will be as tall as there are time steps, and as wide as
there are elements in the internal state. Now we can give that grid a
depth of 1, and it makes sense as an input to an RNN.

To tell Keras to remember the output after each time step and build
up this grid, we tell it that we want the RNN to return not just a single
output, but the whole sequence of outputs corresponding to the
sequence of inputs.

By setting the optional argument return_sequences to True, we’re
telling Keras to do exactly.










## Setup

In [1]:
from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf
from tensorflow import keras
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

from tensorflow.keras import backend as keras_backend
from tensorflow.keras.utils import to_categorical

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

keras_backend.set_image_data_format('channels_last')

TensorFlow 2.x selected.


## Experiment with RNN return sequence shape

Now that we know what return_sequences is all about, we can usually
invoke it without even thinking about all of this. If our RNN’s
output is going into another RNN, just set return_sequences to
True. If we want only the output after the last time step, we can set
return_sequences to False, or just leave it off, since that’s the default
value.

<img src='https://github.com/rahiakela/img-repo/blob/master/returning-sequences-shape-5.png?raw=1' width='800'/>

A few input shapes and their outputs with return_sequences set to
both False and True in the below example.

In [0]:
def show_output_shapes(data_shape):

  def make_model(X_train, return_seq):
    # create and fit the LSTM network
    model = Sequential()
    model.add(LSTM(4, input_shape=X_train[0].shape, return_sequences=return_seq))
    model.compile(loss='mse', optimizer='adam')

    return model

  data = np.zeros(data_shape)
  model = make_model(data, False)
  pred = model.predict(data, batch_size=1, verbose=2)
  print(f'Input shape: {data.shape}')
  print(f'  Without sequences: {pred.shape}')

  model = make_model(data, True)
  pred = model.predict(data, batch_size=1, verbose=2)
  print(f'  With sequences: {pred.shape}')

In [5]:
# when 1 sample, 3 times step and 1 feature so shape would be (1, 3, 1)
show_output_shapes([1, 3, 1])

1/1 - 0s
Input shape: (1, 3, 1)
  Without sequences: (1, 4)
1/1 - 0s
  With sequences: (1, 3, 4)


In [6]:
# when 1 sample, 5 times step and 1 feature so shape would be (1, 5, 1)
show_output_shapes([1, 5, 1])

1/1 - 0s
Input shape: (1, 5, 1)
  Without sequences: (1, 4)
1/1 - 0s
  With sequences: (1, 5, 4)


In [7]:
# when 1 sample, 3 times step and 2 feature so shape would be (1, 3, 2)
show_output_shapes([1, 3, 2])

1/1 - 0s
Input shape: (1, 3, 2)
  Without sequences: (1, 4)
1/1 - 0s
  With sequences: (1, 3, 4)


In [8]:
# when 1 sample, 5 times step and 2 feature so shape would be (1, 5, 2)
show_output_shapes([1, 5, 2])

1/1 - 0s
Input shape: (1, 5, 2)
  Without sequences: (1, 4)
1/1 - 0s
  With sequences: (1, 5, 4)


In [9]:
# when 2 sample, 3 times step and 2 feature so shape would be (2, 3, 2)
show_output_shapes([2, 3, 2])

2/2 - 0s
Input shape: (2, 3, 2)
  Without sequences: (2, 4)
2/2 - 0s
  With sequences: (2, 3, 4)


In [10]:
# when 2 sample, 5 times step and 2 feature so shape would be (2, 5, 2)
show_output_shapes([2, 5, 2])

2/2 - 0s
Input shape: (2, 5, 2)
  Without sequences: (2, 4)
2/2 - 0s
  With sequences: (2, 5, 4)


In [11]:
# when 2 sample, 3 times step and 1 feature so shape would be (2, 3, 1)
show_output_shapes([2, 3, 1])

2/2 - 0s
Input shape: (2, 3, 1)
  Without sequences: (2, 4)
2/2 - 0s
  With sequences: (2, 3, 4)


In [12]:
# when 2 sample, 5 times step and 1 feature so shape would be (2, 5, 1)
show_output_shapes([2, 5, 1])

2/2 - 0s
Input shape: (2, 5, 1)
  Without sequences: (2, 4)
2/2 - 0s
  With sequences: (2, 5, 4)


In [13]:
# when 3 sample, 3 times step and 3 feature so shape would be (3, 3, 3)
show_output_shapes([3, 3, 3])

3/3 - 0s
Input shape: (3, 3, 3)
  Without sequences: (3, 4)
3/3 - 0s
  With sequences: (3, 3, 4)


In [14]:
# when 3 sample, 5 times step and 3 feature so shape would be (3, 5, 3)
show_output_shapes([3, 5, 3])

3/3 - 0s
Input shape: (3, 5, 3)
  Without sequences: (3, 4)
3/3 - 0s
  With sequences: (3, 5, 4)


## Conclusion

It’s useful to see at a glance whether an RNN returns just the final output
or the full sequence. We mark the icon for an RNN that returns a
sequence with a small box on the output side, suggesting multiple outputs.

<img src='https://github.com/rahiakela/img-repo/blob/master/returning-sequences-shape-6.png?raw=1' width='800'/>