# How to Use the TimeDistributed Layer for Long Short-Term Memory Networks in Python
Long Short-Term Networks or LSTMs are a popular and powerful type of Recurrent Neural Network, or RNN.

They can be quite difficult to configure and apply to arbitrary sequence prediction problems, even with well defined and “easy to use” interfaces like those provided in the Keras deep learning library in Python.

One reason for this difficulty in Keras is the use of the TimeDistributed wrapper layer and the need for some LSTM layers to return sequences rather than single values.

In this tutorial, you will discover different ways to configure LSTM networks for sequence prediction, the role that the TimeDistributed layer plays, and exactly how to use it.

After completing this tutorial, you will know:

* How to design a one-to-one LSTM for sequence prediction.
* How to design a many-to-one LSTM for sequence prediction without the TimeDistributed Layer.
* How to design a many-to-many LSTM for sequence prediction with the TimeDistributed Layer.

Let’s get started.

## Tutorial Overview
This tutorial is divided into 5 parts; they are:

1. TimeDistributed Layer
2. Sequence Learning Problem
3. One-to-One LSTM for Sequence Prediction
4. Many-to-One LSTM for Sequence Prediction (without TimeDistributed)
5. Many-to-Many LSTM for Sequence Prediction (with TimeDistributed)

## TimeDistributed Layer
LSTMs are powerful, but hard to use and hard to configure, especially for beginners.

An added complication is the [TimeDistributed Layer](https://keras.io/layers/wrappers/#timedistributed) (and the former TimeDistributedDense layer) that is cryptically described as a layer wrapper:

*"This wrapper allows us to apply a layer to every temporal slice of an input."*

How and when are you supposed to use this wrapper with LSTMs?

The confusion is compounded when you search through discussions about the wrapper layer on the Keras GitHub issues and StackOverflow.

For example, in the issue “[When and How to use TimeDistributedDense,](https://github.com/keras-team/keras/issues/1029)” fchollet (Keras’ author) explains:

*"TimeDistributedDense applies a same Dense (fully-connected) operation to every timestep of a 3D tensor."*

This makes perfect sense if you already understand what the TimeDistributed layer is for and when to use it, but is no help at all to a beginner.

This tutorial aims to clear up confusion around using the TimeDistributed wrapper with LSTMs with worked examples that you can inspect, run, and play with to help your concrete understanding.

## Sequence Learning Problem
We will use a simple sequence learning problem to demonstrate the TimeDistributed layer.

In this problem, the sequence [0.0, 0.2, 0.4, 0.6, 0.8] will be given as input one item at a time and must be in turn returned as output, one item at a time.

Think of it as learning a simple echo program. We give 0.0 as input, we expect to see 0.0 as output, repeated for each item in the sequence.

We can generate this sequence directly as follows:

In [1]:
from numpy import array
length = 5
seq = array([i/float(length) for i in range(length)])
print(seq)

[0.  0.2 0.4 0.6 0.8]


Running this example prints the generated sequence above. The example is configurable and you can play with longer/shorter sequences yourself later if you like. Let me know about your results in the comments.

## One-to-One LSTM for Sequence Prediction
Before we dive in, it is important to show that this sequence learning problem can be learned piecewise.

That is, we can reframe the problem into a dataset of input-output pairs for each item in the sequence. Given 0, the network should output 0, given 0.2, the network must output 0.2, and so on.

This is the simplest formulation of the problem and requires the sequence to be split into input-output pairs and for the sequence to be predicted one step at a time and gathered outside of the network.

The input-output pairs are as follows:

In [None]:
X, 	y
0.0,	0.0
0.2,	0.2
0.4,	0.4
0.6,	0.6
0.8,	0.8

The input for LSTMs must be three dimensional. We can reshape the 2D sequence into a 3D sequence with 5 samples, 1 time step, and 1 feature. We will define the output as 5 samples with 1 feature.

In [None]:
X = seq.reshape(5, 1, 1)
y = seq.reshape(5, 1)

We will define the network model as having 1 input with 1 time step. The first hidden layer will be an LSTM with 5 units. The output layer with be a fully-connected layer with 1 output.

The model will be fit with efficient ADAM optimization algorithm and the mean squared error loss function.

The batch size was set to the number of samples in the epoch to avoid having to make the LSTM stateful and manage state resets manually, although this could just as easily be done in order to update weights after each sample is shown to the network.

The complete code listing is provided below:

In [2]:
from numpy import array
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(len(seq), 1, 1)
y = seq.reshape(len(seq), 1)
# define LSTM configuration
n_neurons = length
n_batch = length
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result:
	print('%.1f' % value)

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 5)                 140       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 6         
Total params: 146
Trainable params: 146
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/1000
 - 1s - loss: 0.2012
Epoch 2/1000
 - 0s - loss: 0.1995
Epoch 3/1000
 - 0s - loss: 0.1978
Epoch 4/1000
 - 0s - loss: 0.1961
Epoch 5/1000
 - 0s - loss: 0.1945
Epoch 6/1000
 - 0s - loss: 0.1928
Epoch 7/1000
 - 0s - loss: 0.1911
Epoch 8/1000
 - 0s - loss: 0.1895
Epoch 9/1000
 - 0s - loss: 0.1878
Epoch 10/1000
 - 0s - loss: 0.1862
Epoch 11/1000
 - 0s - loss: 0.1846
Epoch 12/1000
 - 0s - loss: 0.1829
Epoch 13/1000
 - 0s - loss: 0.1813
Epoch 14/1000
 - 0s - loss: 0.1797
Epoch 15/1000
 - 0s - loss: 0.1781
Epoch 16/1000
 - 0s

Epoch 215/1000
 - 0s - loss: 0.0346
Epoch 216/1000
 - 0s - loss: 0.0345
Epoch 217/1000
 - 0s - loss: 0.0344
Epoch 218/1000
 - 0s - loss: 0.0343
Epoch 219/1000
 - 0s - loss: 0.0342
Epoch 220/1000
 - 0s - loss: 0.0341
Epoch 221/1000
 - 0s - loss: 0.0339
Epoch 222/1000
 - 0s - loss: 0.0338
Epoch 223/1000
 - 0s - loss: 0.0337
Epoch 224/1000
 - 0s - loss: 0.0336
Epoch 225/1000
 - 0s - loss: 0.0335
Epoch 226/1000
 - 0s - loss: 0.0334
Epoch 227/1000
 - 0s - loss: 0.0333
Epoch 228/1000
 - 0s - loss: 0.0332
Epoch 229/1000
 - 0s - loss: 0.0331
Epoch 230/1000
 - 0s - loss: 0.0330
Epoch 231/1000
 - 0s - loss: 0.0329
Epoch 232/1000
 - 0s - loss: 0.0328
Epoch 233/1000
 - 0s - loss: 0.0327
Epoch 234/1000
 - 0s - loss: 0.0326
Epoch 235/1000
 - 0s - loss: 0.0325
Epoch 236/1000
 - 0s - loss: 0.0324
Epoch 237/1000
 - 0s - loss: 0.0323
Epoch 238/1000
 - 0s - loss: 0.0322
Epoch 239/1000
 - 0s - loss: 0.0321
Epoch 240/1000
 - 0s - loss: 0.0320
Epoch 241/1000
 - 0s - loss: 0.0319
Epoch 242/1000
 - 0s - loss:

Epoch 443/1000
 - 0s - loss: 0.0137
Epoch 444/1000
 - 0s - loss: 0.0136
Epoch 445/1000
 - 0s - loss: 0.0135
Epoch 446/1000
 - 0s - loss: 0.0135
Epoch 447/1000
 - 0s - loss: 0.0134
Epoch 448/1000
 - 0s - loss: 0.0133
Epoch 449/1000
 - 0s - loss: 0.0132
Epoch 450/1000
 - 0s - loss: 0.0131
Epoch 451/1000
 - 0s - loss: 0.0131
Epoch 452/1000
 - 0s - loss: 0.0130
Epoch 453/1000
 - 0s - loss: 0.0129
Epoch 454/1000
 - 0s - loss: 0.0128
Epoch 455/1000
 - 0s - loss: 0.0127
Epoch 456/1000
 - 0s - loss: 0.0127
Epoch 457/1000
 - 0s - loss: 0.0126
Epoch 458/1000
 - 0s - loss: 0.0125
Epoch 459/1000
 - 0s - loss: 0.0124
Epoch 460/1000
 - 0s - loss: 0.0124
Epoch 461/1000
 - 0s - loss: 0.0123
Epoch 462/1000
 - 0s - loss: 0.0122
Epoch 463/1000
 - 0s - loss: 0.0121
Epoch 464/1000
 - 0s - loss: 0.0120
Epoch 465/1000
 - 0s - loss: 0.0120
Epoch 466/1000
 - 0s - loss: 0.0119
Epoch 467/1000
 - 0s - loss: 0.0118
Epoch 468/1000
 - 0s - loss: 0.0117
Epoch 469/1000
 - 0s - loss: 0.0117
Epoch 470/1000
 - 0s - loss:

Epoch 671/1000
 - 0s - loss: 0.0019
Epoch 672/1000
 - 0s - loss: 0.0018
Epoch 673/1000
 - 0s - loss: 0.0018
Epoch 674/1000
 - 0s - loss: 0.0018
Epoch 675/1000
 - 0s - loss: 0.0018
Epoch 676/1000
 - 0s - loss: 0.0017
Epoch 677/1000
 - 0s - loss: 0.0017
Epoch 678/1000
 - 0s - loss: 0.0017
Epoch 679/1000
 - 0s - loss: 0.0017
Epoch 680/1000
 - 0s - loss: 0.0017
Epoch 681/1000
 - 0s - loss: 0.0016
Epoch 682/1000
 - 0s - loss: 0.0016
Epoch 683/1000
 - 0s - loss: 0.0016
Epoch 684/1000
 - 0s - loss: 0.0016
Epoch 685/1000
 - 0s - loss: 0.0016
Epoch 686/1000
 - 0s - loss: 0.0015
Epoch 687/1000
 - 0s - loss: 0.0015
Epoch 688/1000
 - 0s - loss: 0.0015
Epoch 689/1000
 - 0s - loss: 0.0015
Epoch 690/1000
 - 0s - loss: 0.0015
Epoch 691/1000
 - 0s - loss: 0.0015
Epoch 692/1000
 - 0s - loss: 0.0014
Epoch 693/1000
 - 0s - loss: 0.0014
Epoch 694/1000
 - 0s - loss: 0.0014
Epoch 695/1000
 - 0s - loss: 0.0014
Epoch 696/1000
 - 0s - loss: 0.0014
Epoch 697/1000
 - 0s - loss: 0.0014
Epoch 698/1000
 - 0s - loss:

Epoch 881/1000
 - 0s - loss: 1.5464e-04
Epoch 882/1000
 - 0s - loss: 1.5341e-04
Epoch 883/1000
 - 0s - loss: 1.5220e-04
Epoch 884/1000
 - 0s - loss: 1.5100e-04
Epoch 885/1000
 - 0s - loss: 1.4983e-04
Epoch 886/1000
 - 0s - loss: 1.4868e-04
Epoch 887/1000
 - 0s - loss: 1.4755e-04
Epoch 888/1000
 - 0s - loss: 1.4643e-04
Epoch 889/1000
 - 0s - loss: 1.4534e-04
Epoch 890/1000
 - 0s - loss: 1.4426e-04
Epoch 891/1000
 - 0s - loss: 1.4321e-04
Epoch 892/1000
 - 0s - loss: 1.4217e-04
Epoch 893/1000
 - 0s - loss: 1.4115e-04
Epoch 894/1000
 - 0s - loss: 1.4014e-04
Epoch 895/1000
 - 0s - loss: 1.3915e-04
Epoch 896/1000
 - 0s - loss: 1.3818e-04
Epoch 897/1000
 - 0s - loss: 1.3723e-04
Epoch 898/1000
 - 0s - loss: 1.3629e-04
Epoch 899/1000
 - 0s - loss: 1.3537e-04
Epoch 900/1000
 - 0s - loss: 1.3447e-04
Epoch 901/1000
 - 0s - loss: 1.3358e-04
Epoch 902/1000
 - 0s - loss: 1.3270e-04
Epoch 903/1000
 - 0s - loss: 1.3184e-04
Epoch 904/1000
 - 0s - loss: 1.3100e-04
Epoch 905/1000
 - 0s - loss: 1.3017e-04


Running the example first prints the structure of the configured network.

We can see that the LSTM layer has 140 parameters. This is calculated based on the number of inputs (1) and the number of outputs (5 for the 5 units in the hidden layer), as follows:

In [None]:
n = 4 * ((inputs + 1) * outputs + outputs^2)
n = 4 * ((1 + 1) * 5 + 5^2)
n = 4 * 35
n = 140

We can also see that the fully connected layer only has 6 parameters for the number of inputs (5 for the 5 inputs from the previous layer), number of outputs (1 for the 1 neuron in the layer), and the bias.

In [None]:
n = inputs * outputs + outputs
n = 5 * 1 + 1
n = 6

The network correctly learns the prediction problem.

## Many-to-One LSTM for Sequence Prediction (without  TimeDistributed)
In this section, we develop an LSTM to output the sequence all at once, although without the TimeDistributed wrapper layer.

The input for LSTMs must be three dimensional. We can reshape the 2D sequence into a 3D sequence with 1 sample, 5 time steps, and 1 feature. We will define the output as 1 sample with 5 features.

In [None]:
X = seq.reshape(1, 5, 1)
y = seq.reshape(1, 5)

Immediately, you can see that the problem definition must be slightly adjusted to support a network for sequence prediction without a TimeDistributed wrapper. Specifically, output one vector rather build out an output sequence one step at a time. The difference may sound subtle, but it is important to understanding the role of the TimeDistributed wrapper.

We will define the model as having one input with 5 time steps. The first hidden layer will be an LSTM with 5 units. The output layer is a fully-connected layer with 5 neurons.

In [None]:
# create LSTM
model = Sequential()
model.add(LSTM(5, input_shape=(5, 1)))
model.add(Dense(length))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())

Next, we fit the model for only 500 epochs and a batch size of 1 for the single sample in the training dataset.

In [None]:
# train LSTM
model.fit(X, y, epochs=500, batch_size=1, verbose=2)

Putting this all together, the complete code listing is provided below.

In [3]:
from numpy import array
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 500
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1)))
model.add(Dense(length))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:]:
	print('%.1f' % value)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_2 (LSTM)                (None, 5)                 140       
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 30        
Total params: 170
Trainable params: 170
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/500
 - 1s - loss: 0.2001
Epoch 2/500
 - 0s - loss: 0.1978
Epoch 3/500
 - 0s - loss: 0.1956
Epoch 4/500
 - 0s - loss: 0.1933
Epoch 5/500
 - 0s - loss: 0.1910
Epoch 6/500
 - 0s - loss: 0.1887
Epoch 7/500
 - 0s - loss: 0.1865
Epoch 8/500
 - 0s - loss: 0.1842
Epoch 9/500
 - 0s - loss: 0.1819
Epoch 10/500
 - 0s - loss: 0.1796
Epoch 11/500
 - 0s - loss: 0.1773
Epoch 12/500
 - 0s - loss: 0.1751
Epoch 13/500
 - 0s - loss: 0.1728
Epoch 14/500
 - 0s - loss: 0.1705
Epoch 15/500
 - 0s - loss: 0.1682
Epoch 16/500
 - 0s - loss: 0.1660


 - 0s - loss: 9.5386e-06
Epoch 215/500
 - 0s - loss: 8.7332e-06
Epoch 216/500
 - 0s - loss: 7.9912e-06
Epoch 217/500
 - 0s - loss: 7.3067e-06
Epoch 218/500
 - 0s - loss: 6.6758e-06
Epoch 219/500
 - 0s - loss: 6.0946e-06
Epoch 220/500
 - 0s - loss: 5.5596e-06
Epoch 221/500
 - 0s - loss: 5.0674e-06
Epoch 222/500
 - 0s - loss: 4.6143e-06
Epoch 223/500
 - 0s - loss: 4.1990e-06
Epoch 224/500
 - 0s - loss: 3.8179e-06
Epoch 225/500
 - 0s - loss: 3.4684e-06
Epoch 226/500
 - 0s - loss: 3.1482e-06
Epoch 227/500
 - 0s - loss: 2.8551e-06
Epoch 228/500
 - 0s - loss: 2.5871e-06
Epoch 229/500
 - 0s - loss: 2.3421e-06
Epoch 230/500
 - 0s - loss: 2.1184e-06
Epoch 231/500
 - 0s - loss: 1.9144e-06
Epoch 232/500
 - 0s - loss: 1.7284e-06
Epoch 233/500
 - 0s - loss: 1.5588e-06
Epoch 234/500
 - 0s - loss: 1.4047e-06
Epoch 235/500
 - 0s - loss: 1.2647e-06
Epoch 236/500
 - 0s - loss: 1.1375e-06
Epoch 237/500
 - 0s - loss: 1.0221e-06
Epoch 238/500
 - 0s - loss: 9.1753e-07
Epoch 239/500
 - 0s - loss: 8.2283e-07


Epoch 425/500
 - 0s - loss: 1.5568e-14
Epoch 426/500
 - 0s - loss: 1.5568e-14
Epoch 427/500
 - 0s - loss: 1.5257e-14
Epoch 428/500
 - 0s - loss: 1.5257e-14
Epoch 429/500
 - 0s - loss: 1.5257e-14
Epoch 430/500
 - 0s - loss: 1.5257e-14
Epoch 431/500
 - 0s - loss: 1.5257e-14
Epoch 432/500
 - 0s - loss: 1.5257e-14
Epoch 433/500
 - 0s - loss: 1.4924e-14
Epoch 434/500
 - 0s - loss: 1.4924e-14
Epoch 435/500
 - 0s - loss: 1.4924e-14
Epoch 436/500
 - 0s - loss: 1.4702e-14
Epoch 437/500
 - 0s - loss: 1.4702e-14
Epoch 438/500
 - 0s - loss: 1.4702e-14
Epoch 439/500
 - 0s - loss: 1.5035e-14
Epoch 440/500
 - 0s - loss: 1.5035e-14
Epoch 441/500
 - 0s - loss: 9.7949e-15
Epoch 442/500
 - 0s - loss: 9.7949e-15
Epoch 443/500
 - 0s - loss: 9.5035e-15
Epoch 444/500
 - 0s - loss: 9.3509e-15
Epoch 445/500
 - 0s - loss: 9.2037e-15
Epoch 446/500
 - 0s - loss: 9.0622e-15
Epoch 447/500
 - 0s - loss: 8.9262e-15
Epoch 448/500
 - 0s - loss: 8.7957e-15
Epoch 449/500
 - 0s - loss: 7.8715e-15
Epoch 450/500
 - 0s - los

Running the example first prints a summary of the configured network.

We can see that the LSTM layer has 140 parameters as in the previous section.

The LSTM units have been crippled and will each output a single value, providing a vector of 5 values as inputs to the fully connected layer. The time dimension or sequence information has been thrown away and collapsed into a vector of 5 values.

We can see that the fully connected output layer has 5 inputs and is expected to output 5 values. We can account for the 30 weights to be learned as follows:

In [None]:
n = inputs * outputs + outputs
n = 5 * 5 + 5
n = 30

The model is fit, printing loss information before finalizing and printing the predicted sequence.

The sequence is reproduced correctly, but as a single piece rather than stepwise through the input data. We may have used a Dense layer as the first hidden layer instead of LSTMs as this usage of LSTMs does not take much advantage of their full capability for sequence learning and processing.

## Many-to-Many LSTM for Sequence Prediction (with TimeDistributed)
In this section, we will use the TimeDistributed layer to process the output from the LSTM hidden layer.

There are two key points to remember when using the TimeDistributed wrapper layer:

* **The input must be (at least) 3D**. This often means that you will need to configure your last LSTM layer prior to your TimeDistributed wrapped Dense layer to return sequences (e.g. set the “return_sequences” argument to “True”).
* **The output will be 3D**. This means that if your TimeDistributed wrapped Dense layer is your output layer and you are predicting a sequence, you will need to resize your y array into a 3D vector.

We can define the shape of the output as having 1 sample, 5 time steps, and 1 feature, just like the input sequence, as follows:

In [None]:
y = seq.reshape(1, length, 1)

We can define the LSTM hidden layer to return sequences rather than single values by setting the “*return_sequences*” argument to true.

In [None]:
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))

This has the effect of each LSTM unit returning a sequence of 5 outputs, one for each time step in the input data, instead of single output value as in the previous example.

We also can use the TimeDistributed on the output layer to wrap a fully connected Dense layer with a single output.

In [None]:
model.add(TimeDistributed(Dense(1)))

The single output value in the output layer is key. It highlights that we intend to output one time step from the sequence for each time step in the input. It just so happens that we will process 5 time steps of the input sequence at a time.

The TimeDistributed achieves this trick by applying the same Dense layer (same weights) to the LSTMs outputs for one time step at a time. In this way, the output layer only needs one connection to each LSTM unit (plus one bias).

For this reason, the number of training epochs needs to be increased to account for the smaller network capacity. I doubled it from 500 to 1000 to match the first one-to-one example.

Putting this together, the full code listing is provided below.

In [4]:
from numpy import array
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
	print('%.1f' % value)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_3 (LSTM)                (None, 5, 5)              140       
_________________________________________________________________
time_distributed_1 (TimeDist (None, 5, 1)              6         
Total params: 146
Trainable params: 146
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/1000
 - 1s - loss: 0.1550
Epoch 2/1000
 - 0s - loss: 0.1524
Epoch 3/1000
 - 0s - loss: 0.1502
Epoch 4/1000
 - 0s - loss: 0.1477
Epoch 5/1000
 - 0s - loss: 0.1451
Epoch 6/1000
 - 0s - loss: 0.1428
Epoch 7/1000
 - 0s - loss: 0.1403
Epoch 8/1000
 - 0s - loss: 0.1378
Epoch 9/1000
 - 0s - loss: 0.1354
Epoch 10/1000
 - 0s - loss: 0.1330
Epoch 11/1000
 - 0s - loss: 0.1307
Epoch 12/1000
 - 0s - loss: 0.1284
Epoch 13/1000
 - 0s - loss: 0.1260
Epoch 14/1000
 - 0s - loss: 0.1238
Epoch 15/1000
 - 0s - loss: 0.1215
Epoch 16/1000
 - 0s

Epoch 215/1000
 - 0s - loss: 0.0034
Epoch 216/1000
 - 0s - loss: 0.0034
Epoch 217/1000
 - 0s - loss: 0.0034
Epoch 218/1000
 - 0s - loss: 0.0034
Epoch 219/1000
 - 0s - loss: 0.0034
Epoch 220/1000
 - 0s - loss: 0.0034
Epoch 221/1000
 - 0s - loss: 0.0034
Epoch 222/1000
 - 0s - loss: 0.0034
Epoch 223/1000
 - 0s - loss: 0.0033
Epoch 224/1000
 - 0s - loss: 0.0033
Epoch 225/1000
 - 0s - loss: 0.0033
Epoch 226/1000
 - 0s - loss: 0.0033
Epoch 227/1000
 - 0s - loss: 0.0033
Epoch 228/1000
 - 0s - loss: 0.0033
Epoch 229/1000
 - 0s - loss: 0.0033
Epoch 230/1000
 - 0s - loss: 0.0033
Epoch 231/1000
 - 0s - loss: 0.0032
Epoch 232/1000
 - 0s - loss: 0.0032
Epoch 233/1000
 - 0s - loss: 0.0032
Epoch 234/1000
 - 0s - loss: 0.0032
Epoch 235/1000
 - 0s - loss: 0.0032
Epoch 236/1000
 - 0s - loss: 0.0032
Epoch 237/1000
 - 0s - loss: 0.0032
Epoch 238/1000
 - 0s - loss: 0.0032
Epoch 239/1000
 - 0s - loss: 0.0031
Epoch 240/1000
 - 0s - loss: 0.0031
Epoch 241/1000
 - 0s - loss: 0.0031
Epoch 242/1000
 - 0s - loss:

Epoch 443/1000
 - 0s - loss: 0.0019
Epoch 444/1000
 - 0s - loss: 0.0019
Epoch 445/1000
 - 0s - loss: 0.0019
Epoch 446/1000
 - 0s - loss: 0.0019
Epoch 447/1000
 - 0s - loss: 0.0019
Epoch 448/1000
 - 0s - loss: 0.0019
Epoch 449/1000
 - 0s - loss: 0.0019
Epoch 450/1000
 - 0s - loss: 0.0019
Epoch 451/1000
 - 0s - loss: 0.0019
Epoch 452/1000
 - 0s - loss: 0.0018
Epoch 453/1000
 - 0s - loss: 0.0018
Epoch 454/1000
 - 0s - loss: 0.0018
Epoch 455/1000
 - 0s - loss: 0.0018
Epoch 456/1000
 - 0s - loss: 0.0018
Epoch 457/1000
 - 0s - loss: 0.0018
Epoch 458/1000
 - 0s - loss: 0.0018
Epoch 459/1000
 - 0s - loss: 0.0018
Epoch 460/1000
 - 0s - loss: 0.0018
Epoch 461/1000
 - 0s - loss: 0.0018
Epoch 462/1000
 - 0s - loss: 0.0018
Epoch 463/1000
 - 0s - loss: 0.0018
Epoch 464/1000
 - 0s - loss: 0.0018
Epoch 465/1000
 - 0s - loss: 0.0018
Epoch 466/1000
 - 0s - loss: 0.0018
Epoch 467/1000
 - 0s - loss: 0.0018
Epoch 468/1000
 - 0s - loss: 0.0018
Epoch 469/1000
 - 0s - loss: 0.0018
Epoch 470/1000
 - 0s - loss:

Epoch 671/1000
 - 0s - loss: 0.0013
Epoch 672/1000
 - 0s - loss: 0.0013
Epoch 673/1000
 - 0s - loss: 0.0013
Epoch 674/1000
 - 0s - loss: 0.0013
Epoch 675/1000
 - 0s - loss: 0.0013
Epoch 676/1000
 - 0s - loss: 0.0013
Epoch 677/1000
 - 0s - loss: 0.0013
Epoch 678/1000
 - 0s - loss: 0.0013
Epoch 679/1000
 - 0s - loss: 0.0013
Epoch 680/1000
 - 0s - loss: 0.0013
Epoch 681/1000
 - 0s - loss: 0.0013
Epoch 682/1000
 - 0s - loss: 0.0013
Epoch 683/1000
 - 0s - loss: 0.0013
Epoch 684/1000
 - 0s - loss: 0.0013
Epoch 685/1000
 - 0s - loss: 0.0012
Epoch 686/1000
 - 0s - loss: 0.0012
Epoch 687/1000
 - 0s - loss: 0.0012
Epoch 688/1000
 - 0s - loss: 0.0012
Epoch 689/1000
 - 0s - loss: 0.0012
Epoch 690/1000
 - 0s - loss: 0.0012
Epoch 691/1000
 - 0s - loss: 0.0012
Epoch 692/1000
 - 0s - loss: 0.0012
Epoch 693/1000
 - 0s - loss: 0.0012
Epoch 694/1000
 - 0s - loss: 0.0012
Epoch 695/1000
 - 0s - loss: 0.0012
Epoch 696/1000
 - 0s - loss: 0.0012
Epoch 697/1000
 - 0s - loss: 0.0012
Epoch 698/1000
 - 0s - loss:

 - 0s - loss: 8.3874e-04
Epoch 890/1000
 - 0s - loss: 8.3699e-04
Epoch 891/1000
 - 0s - loss: 8.3524e-04
Epoch 892/1000
 - 0s - loss: 8.3350e-04
Epoch 893/1000
 - 0s - loss: 8.3175e-04
Epoch 894/1000
 - 0s - loss: 8.3001e-04
Epoch 895/1000
 - 0s - loss: 8.2828e-04
Epoch 896/1000
 - 0s - loss: 8.2654e-04
Epoch 897/1000
 - 0s - loss: 8.2481e-04
Epoch 898/1000
 - 0s - loss: 8.2308e-04
Epoch 899/1000
 - 0s - loss: 8.2135e-04
Epoch 900/1000
 - 0s - loss: 8.1963e-04
Epoch 901/1000
 - 0s - loss: 8.1791e-04
Epoch 902/1000
 - 0s - loss: 8.1619e-04
Epoch 903/1000
 - 0s - loss: 8.1447e-04
Epoch 904/1000
 - 0s - loss: 8.1276e-04
Epoch 905/1000
 - 0s - loss: 8.1104e-04
Epoch 906/1000
 - 0s - loss: 8.0934e-04
Epoch 907/1000
 - 0s - loss: 8.0763e-04
Epoch 908/1000
 - 0s - loss: 8.0593e-04
Epoch 909/1000
 - 0s - loss: 8.0423e-04
Epoch 910/1000
 - 0s - loss: 8.0253e-04
Epoch 911/1000
 - 0s - loss: 8.0083e-04
Epoch 912/1000
 - 0s - loss: 7.9914e-04
Epoch 913/1000
 - 0s - loss: 7.9745e-04
Epoch 914/1000


Running the example, we can see the structure of the configured network.

We can see that as in the previous example, we have 140 parameters in the LSTM hidden layer.

The fully connected output layer is a very different story. In fact, it matches the one-to-one example exactly. One neuron that has one weight for each LSTM unit in the previous layer, plus one for the bias input.

This does two important things:

* Allows the problem to be framed and learned as it was defined, that is one input to one output, keeping the internal process for each time step separate.
* Simplifies the network by requiring far fewer weights such that only one time step is processed at a time.

The one simpler fully connected layer is applied to each time step in the sequence provided from the previous layer to build up the output sequence.

Again, the network learns the sequence.

We can think of the framing of the problem with time steps and a TimeDistributed layer as a more compact way of implementing the one-to-one network in the first example. It may even be more efficient (space or time wise) at a larger scale.

## Summary
In this tutorial, you discovered how to develop LSTM networks for sequence prediction and the role of the TimeDistributed layer.

Specifically, you learned:

* How to design a one-to-one LSTM for sequence prediction.
* How to design a many-to-one LSTM for sequence prediction without the TimeDistributed Layer.
* How to design a many-to-many LSTM for sequence prediction with the TimeDistributed Layer.