Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN) designed to overcome the limitations of traditional RNNs, particularly the vanishing gradient problem. Here's a high-level explanation of how LSTMs work internally and the differences between RNNs and LSTMs:

### LSTM Internal Working:
An LSTM unit is composed of a cell and three gates: an input gate, an output gate, and a forget gate³. The cell is responsible for maintaining the state over arbitrary time intervals, and the gates regulate the flow of information into and out of the cell. Here's a simplified view of the operations within an LSTM cell:

- **Forget Gate**: Decides what information is discarded from the cell state.
- **Input Gate**: Updates the cell state with new information from the current input.
- **Output Gate**: Determines what the next hidden state should be, which is used for predictions and passed to the next time step.

These gates use sigmoid and tanh activation functions to control the flow of information, allowing the network to learn what to keep or forget over long sequences.

### Difference Between RNN and LSTM:
The main difference between RNNs and LSTMs lies in their structure and ability to handle sequential data:

- **RNNs** are simpler networks that process sequences by maintaining a hidden state that is updated at each time step. However, they struggle with long-term dependencies due to the vanishing gradient problem, where the contribution of information decays geometrically over time, making it hard to learn connections between distant events in a sequence⁶.

- **LSTMs** include additional mechanisms (the gates) that allow them to control the flow of information and maintain a memory over longer sequences. This design helps them to remember important information and forget the irrelevant, making them more effective for tasks involving long-term dependencies⁶.

In summary, while both RNNs and LSTMs are used for sequential data, LSTMs are better suited for tasks where the sequence has long-range temporal dependencies. They are more complex but provide a solution to the limitations of RNNs in learning from longer sequences.

Source: Conversation with Copilot, 9/6/2024
(1) Long short-term memory - Wikipedia. https://en.wikipedia.org/wiki/Long_short-term_memory.
(2) Main Difference Between RNN and LSTM- (RNN vs LSTM) - theiotacademy. https://www.theiotacademy.co/blog/what-is-the-main-difference-between-rnn-and-lstm/.
(3) LSTMs Explained: A Complete, Technically Accurate, Conceptual ... - Medium. https://medium.com/analytics-vidhya/lstms-explained-a-complete-technically-accurate-conceptual-guide-with-keras-2a650327e8f2.
(4) What is LSTM? Introduction to Long Short-Term Memory - Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-memory-lstm/.
(5) Deep Learning | Introduction to Long Short Term Memory. https://www.geeksforgeeks.org/deep-learning-introduction-to-long-short-term-memory/.
(6) Time Series Prediction with LSTM Recurrent Neural Networks in Python .... https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/.
(7) RNN vs GRU vs LSTM - Medium. https://medium.com/analytics-vidhya/rnn-vs-gru-vs-lstm-863b0b7b1573.
(8) neural networks - What is the difference between LSTM and RNN .... https://ai.stackexchange.com/questions/18198/what-is-the-difference-between-lstm-and-rnn.
(9) The Ultimate Showdown: RNN vs LSTM vs GRU – Which is the Best? - Shiksha. https://www.shiksha.com/online-courses/articles/rnn-vs-gru-vs-lstm/.
(10) undefined. https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/.

In [None]:
import numpy as np
from tensorflow.keras.datasets import imdb
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM
from tensorflow.keras.preprocessing import sequence

# Set the number of words to consider as features
max_features = 20000
# Cut texts after this number of words (among top max_features most common words)
maxlen = 100
batch_size = 32

# Load the IMDB dataset
print('Loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)

# Pad sequences to a uniform length
input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test = sequence.pad_sequences(input_test, maxlen=maxlen)

# Build the LSTM model
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(LSTM(64, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# Train the model
model.fit(input_train, y_train,
          batch_size=batch_size,
          epochs=10,
          validation_data=(input_test, y_test))

# Evaluate the model
score, acc = model.evaluate(input_test, y_test,
                            batch_size=batch_size)
print('Test score:', score)
print('Test accuracy:', acc)
