# LSTM for Daily Minimum Temperature Prediction

# Overview

The codes for this project is modified from [Thushan Ganegedara's Datacamp tutorial](https://www.datacamp.com/community/tutorials/lstm-python-stock-market).

# Background for LSTM
The long short-term memory (LSTM) unit is an improved version of gated recurrent unit (GRU), which tries to resolve the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html) and keep the long term "memory" activated.

![alt text](LSTM_rnn.png "LSTM rnn")
![alt text](LSTM.png "LSTM cell")

* Picture summary ([Adrew Ng's lecture](https://www.coursera.org/specializations/deep-learning)):

> Four parallel layers of interacting networks.

> The weighted sum of the input and previous hidden output gets transformed by the activation functions (sigma (0 to 1) and tanh (-1 to 1)).  The `dot`/`add` signs represent elementwise `multiplication`/`addition`. 

> The cell state c[t] essentially stores the memory, which comes from multiplying the previous cell state to the `forgetness` (0 to 1). This essentially forgets/keeps the previous cell state if forgetness is near zero/one. 

> The activation/hidden state (last equation) is composed of current state `filter` (0 to 1) by the current cell state activation (-1 to 1) with previous memory. The hidden state connects to the ouput with softmax layer for prediction.

> Notice the [`peephole connection`](ftp://ftp.idsia.ch/pub/juergen/TimeCount-IJCNN2000.pdf), in other variation of LSTM, is not shown in the figure. It is done by adding another weighted sum of previous cell state c[t-1] to the forget and update gate, and add c[t] to the output gate. 


# Data 
Generate batches of sequenced data for the input and output data for training:

In [46]:
import numpy as np
class DataGeneratorSeq(object):
    # prices: total training time-series data
    # batch_size: the length of a batch/sequence
    # num_unroll: sampled number of batches/sequences
    # segments: total number of segments in a series that is divided by the batch_size
    
    def __init__(self,prices,batch_size,num_unroll):
        self._prices = prices
        self._prices_length = len(self._prices) - num_unroll
        self._batch_size = batch_size
        self._num_unroll = num_unroll
        self._segments = self._prices_length //self._batch_size
        self._cursor = [offset * self._segments for offset in range(self._batch_size)]
        print(self._cursor)
        print(self._segments)
    def next_batch(self):

        batch_data = np.zeros((self._batch_size),dtype=np.float32)
        batch_labels = np.zeros((self._batch_size),dtype=np.float32)

        for b in range(self._batch_size):
            if self._cursor[b]+1>=self._prices_length:
                #self._cursor[b] = b * self._segments
                self._cursor[b] = np.random.randint(0,(b+1)*self._segments)

            batch_data[b] = self._prices[self._cursor[b]]
            batch_labels[b]= self._prices[self._cursor[b]+np.random.randint(0,5)]

            self._cursor[b] = (self._cursor[b]+1)%self._prices_length

        return batch_data,batch_labels

    def unroll_batches(self):

        unroll_data,unroll_labels = [],[]
        init_data, init_label = None,None
        for ui in range(self._num_unroll):

            data, labels = self.next_batch()    

            unroll_data.append(data)
            unroll_labels.append(labels)

        return unroll_data, unroll_labels

    def reset_indices(self):
        for b in range(self._batch_size):
            self._cursor[b] = np.random.randint(0,min((b+1)*self._segments,self._prices_length-1))



Import and generate the data using the above code:

In [59]:
import pandas as pd
series = pd.read_csv('~/Downloads/daily-minimum-temperatures-in-me.csv', error_bad_lines=False)
series.rename(columns={'Daily minimum temperatures in Melbourne, Australia, 1981-1990':'mint'},inplace=True) # rename minimum temp to 'mint'
y = pd.to_numeric(series["mint"],downcast='float')
y.index = pd.DatetimeIndex(start='1981-01-01',end='1990-12-31',freq='d')
freq=365 # sampling freq
train, valid = y[:freq*9], y[freq*9:]
train.index, valid.index = y.index[:freq*9], y.index[freq*9:]

dg = DataGeneratorSeq(train,18,4)
u_data, u_labels = dg.unroll_batches()
print(dg._prices.head(25))
for ui,(dat,lbl) in enumerate(zip(u_data,u_labels)):   
    print('\n\nUnrolled index %d'%ui)
    dat_ind = dat
    lbl_ind = lbl
    print('\tInputs: ',dat )
    print('\n\tOutput:',lbl)

[0, 182, 364, 546, 728, 910, 1092, 1274, 1456, 1638, 1820, 2002, 2184, 2366, 2548, 2730, 2912, 3094]
182
1981-01-01    20.700001
1981-01-02    17.900000
1981-01-03    18.799999
1981-01-04    14.600000
1981-01-05    15.800000
1981-01-06    15.800000
1981-01-07    15.800000
1981-01-08    17.400000
1981-01-09    21.799999
1981-01-10    20.000000
1981-01-11    16.200001
1981-01-12    13.300000
1981-01-13    16.700001
1981-01-14    21.500000
1981-01-15    25.000000
1981-01-16    20.700001
1981-01-17    20.600000
1981-01-18    24.799999
1981-01-19    17.700001
1981-01-20    15.500000
1981-01-21    18.200001
1981-01-22    12.100000
1981-01-23    14.400000
1981-01-24    16.000000
1981-01-25    16.500000
Freq: D, Name: mint, dtype: float32


Unrolled index 0
	Inputs:  [20.7 10.  17.4  4.2 17.7  5.5 16.1  7.8 12.   8.  13.3  6.9 10.5  7.
 11.2 10.6 15.2  5. ]

	Output: [15.8  7.4 17.4  4.2 10.9  9.5 19.5  2.6 12.  10.4 16.3  6.9 10.5  7.5
 12.1  8.1  9.5  5.3]


Unrolled index 1
	Inputs:  [17.9 

# Reference
* [Understanding LSTM](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
* [Why use LSTM? (paper collection)](http://people.idsia.ch/~juergen/rnn.html)
* [LSTM for stock prediction, referenece project](https://www.datacamp.com/community/tutorials/lstm-python-stock-market)
* [vanishing gradient problem explained](http://neuralnetworksanddeeplearning.com/chap5.html)