# Basic RNN
- Objective: to understand basics of RNN & LSTM

## Recurrent Neural Networks
- Feedforward neural networks (e.g. MLPs and CNNs) are powerful, but they are not optimized to handle "sequential" data
- In other words, they do not possess "memory" of previous inputs
- For instance, consider the case of translating a corpus. You need to consider the **"context"** to guess the next word to come forward

<img src="http://2.bp.blogspot.com/-9GIdV292xV4/UwOIy6B6koI/AAAAAAAAHi4/X6UGlyHI-_U/s1600/tumblr_ms5qzpFY671r9nm7io1_500.gif" style="width: 500px"/>

<br>
- RNNs are suitable for dealing with sequential format data since they have **"recurrent"** structure
- To put it differently, they keep the **"memory"** of earlier inputs in the sequence
</br>
<img src="http://www.wildml.com/wp-content/uploads/2015/09/rnn.jpg" style="width: 600px"/>

<br>
- However, in order to reduce the number of parameters, every layer of different time steps shares same parameters
</br>

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" style="width: 600px"/>

## Load Dataset

In [34]:
import numpy as np

from sklearn.metrics import accuracy_score
from keras.datasets import reuters
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical

In [103]:
# parameters for data load
num_words = 30000
maxlen = 50
test_split = 0.3

In [104]:
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words = num_words, maxlen = maxlen, test_split = test_split)

In [105]:
X_train.shape


(1395,)

In [106]:
np.unique(y_train)

array([ 0,  1,  2,  3,  4,  6,  8,  9, 10, 11, 12, 13, 14, 16, 17, 18, 19,
       20, 21, 23, 24, 25, 28, 30, 32, 34, 36, 37, 38, 39, 45])

In [107]:
len(np.unique(y_test))

24

In [108]:
# pad the sequences with zeros 
# padding parameter is set to 'post' => 0's are appended to end of sequences
X_train = pad_sequences(X_train, padding = 'post')
X_test = pad_sequences(X_test, padding = 'post')

In [109]:
X_train.shape


(1395, 49)

In [110]:
X_train = np.array(X_train).reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = np.array(X_test).reshape((X_test.shape[0], X_test.shape[1], 1))

In [111]:
X_train.shape

(1395, 49, 1)

In [113]:
y_actual = y_test 

In [114]:
y_data = np.concatenate((y_train, y_test))
y_data = to_categorical(y_data)

In [115]:
y_train = y_data[:1395]
y_test = y_data[1395:]

In [116]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(1395, 49, 1)
(599, 49, 1)
(1395, 46)
(599, 46)


## 1. Vanilla RNN
- Vanilla RNNs have a simple structure
- However, they suffer from the problem of "long-term dependencies"
- Hence, they are not able to keep the **sequential memory" for long

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png" style="width: 600px"/>

In [117]:
!pip install scikeras


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


In [118]:
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, Activation
from keras import optimizers


In [119]:
def vanilla_rnn():
    model = Sequential()
    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam()
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [120]:
model = vanilla_rnn()

  super().__init__(**kwargs)


In [121]:
model.fit(X_train, y_train, epochs = 200, batch_size = 50, verbose = 1)

Epoch 1/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.1765 - loss: 3.6656     
Epoch 2/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.6742 - loss: 2.1307
Epoch 3/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7241 - loss: 1.2155
Epoch 4/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7240 - loss: 1.1480
Epoch 5/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7182 - loss: 1.1728
Epoch 6/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7026 - loss: 1.1554
Epoch 7/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7106 - loss: 1.1694
Epoch 8/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.7297 - loss: 1.0922
Epoch 9/200
[1m28/28[0m [32m━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x2f33f6550>

In [122]:
y_pred = model.predict(X_test)

[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step


In [126]:
y_test_ = np.argmax(y_pred, axis = 1)

In [129]:
print(accuracy_score(y_actual, y_test_))

0.7512520868113522


## 2. Stacked Vanilla RNN
- RNN layers can be stacked to form a deeper network

<img src="https://lh6.googleusercontent.com/rC1DSgjlmobtRxMPFi14hkMdDqSkEkuOX7EW_QrLFSymjasIM95Za2Wf-VwSC1Tq1sjJlOPLJ92q7PTKJh2hjBoXQawM6MQC27east67GFDklTalljlt0cFLZnPMdhp8erzO" style="width: 500px"/>

In [131]:
def stacked_vanilla_rnn():
    model = Sequential()
    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = True))   # return_sequences parameter has to be set True to stack
    model.add(SimpleRNN(50, return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam()
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [133]:
model = stacked_vanilla_rnn()

In [134]:
model.fit(X_train, y_train,epochs = 200, batch_size = 50, verbose = 1)

Epoch 1/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.3954 - loss: 2.9866
Epoch 2/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7186 - loss: 1.3096
Epoch 3/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7027 - loss: 1.2192
Epoch 4/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7309 - loss: 1.1206
Epoch 5/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7043 - loss: 1.2019
Epoch 6/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7054 - loss: 1.2130
Epoch 7/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7211 - loss: 1.0815
Epoch 8/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7142 - loss: 1.1580
Epoch 9/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x2fd8e9b90>

In [135]:
y_pred = model.predict(X_test)

[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step


In [136]:
y_test_ = np.argmax(y_pred, axis = 1)

In [137]:
print(accuracy_score(y_actual, y_test_))

0.7562604340567612


## 3. LSTM
- LSTM (long short-term memory) is an improved structure to solve the problem of long-term dependencies

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" style="width: 600px"/>

In [138]:
from keras.layers import LSTM

In [139]:
def lstm():
    model = Sequential()
    model.add(LSTM(50, input_shape = (49,1), return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam()
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [140]:
model = lstm()

  super().__init__(**kwargs)


In [141]:
model.fit(X_train, y_train, epochs = 200, batch_size = 50, verbose = 1)

Epoch 1/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 6ms/step - accuracy: 0.5188 - loss: 3.4110   
Epoch 2/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7239 - loss: 1.3660
Epoch 3/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7067 - loss: 1.2237
Epoch 4/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7287 - loss: 1.1204
Epoch 5/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7246 - loss: 1.0445
Epoch 6/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7165 - loss: 1.1039
Epoch 7/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.6945 - loss: 1.1497
Epoch 8/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - accuracy: 0.7141 - loss: 1.0912
Epoch 9/200
[1m28/28[0m [32m━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x30155a650>

In [142]:
y_pred = model.predict(X_test)

[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step


In [143]:
y_test_ = np.argmax(y_pred, axis = 1)

In [144]:
# accuracy improves by adopting LSTM structure
print(accuracy_score(y_actual, y_test_))

0.8480801335559266


## 4. Stacked LSTM
- LSTM layers can be stacked as well

In [160]:
def stacked_lstm():
    model = Sequential()
    model.add(LSTM(50, input_shape = (49,1), return_sequences = True))
    model.add(LSTM(50, return_sequences = True, return_state= False))
    model.add(LSTM(50, return_sequences = False, return_state= False))
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam()
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [161]:
model = stacked_lstm()

In [162]:
model.fit(X_train, y_train, epochs = 200, batch_size = 50, verbose = 1)

Epoch 1/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 22ms/step - accuracy: 0.5802 - loss: 2.9859
Epoch 2/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 22ms/step - accuracy: 0.6994 - loss: 1.2967
Epoch 3/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 22ms/step - accuracy: 0.7152 - loss: 1.1997
Epoch 4/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.6952 - loss: 1.2656
Epoch 5/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 25ms/step - accuracy: 0.7041 - loss: 1.1794
Epoch 6/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.7189 - loss: 1.1466
Epoch 7/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.7312 - loss: 1.0649
Epoch 8/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.7012 - loss: 1.0457
Epoch 9/200
[1m28/28[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x32b2685d0>

In [163]:
y_pred = model.predict(X_test)

[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step


In [164]:
y_test_ = np.argmax(y_pred, axis = 1)

In [165]:
print(accuracy_score(y_actual, y_test_))

0.8614357262103506


In [None]:
Logistic - 80 - simple - simple - faster - faster
Random Forest - 82 - weeks
Deep Learning -  83 - months - resource
Generative - 85 - money

In [166]:
def stacked_lstm_hidden():
    model = Sequential()
    model.add(LSTM(50, input_shape = (49,1), return_sequences = True))
    model.add(LSTM(50, return_sequences = True, return_state= False))
    model.add(LSTM(50, return_sequences = False, return_state= False))
    model.add(Dense(512))
    model.add(Activation('relu'))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam()
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [167]:
model = stacked_lstm_hidden()

  super().__init__(**kwargs)


In [168]:
model.fit(X_train, y_train, epochs = 200, batch_size = 50, verbose = 1)

Epoch 1/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 22ms/step - accuracy: 0.6111 - loss: 2.8149
Epoch 2/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 22ms/step - accuracy: 0.7197 - loss: 1.1916
Epoch 3/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 22ms/step - accuracy: 0.7037 - loss: 1.2114
Epoch 4/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 24ms/step - accuracy: 0.7083 - loss: 1.1397
Epoch 5/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 25ms/step - accuracy: 0.7168 - loss: 1.0994
Epoch 6/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 25ms/step - accuracy: 0.7114 - loss: 1.1466
Epoch 7/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 26ms/step - accuracy: 0.8030 - loss: 0.8454
Epoch 8/200
[1m28/28[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 26ms/step - accuracy: 0.7523 - loss: 0.8961
Epoch 9/200
[1m28/28[0m [32m━━━━━━━━━

<keras.src.callbacks.history.History at 0x32c6ff9d0>

In [169]:
y_pred = model.predict(X_test)

[1m19/19[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step


In [170]:
y_test_ = np.argmax(y_pred, axis = 1)

In [171]:
print(accuracy_score(y_actual, y_test_))

0.8397328881469115
