### LSTM (Long Short-Term Memory)

#### What is LSTM?

LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) designed to handle the vanishing gradient problem. It was developed to learn and remember long-term dependencies in sequential data. Unlike standard RNNs, LSTMs have a more complex architecture that allows them to better capture long-range dependencies.

!["lstm"](../images/4/4-lstm.png)
<br><br>

---

#### LSTM's Improvement Over RNN

RNNs suffer from a problem called the vanishing gradient problem, where gradients diminish as they are backpropagated through time, making it difficult for the network to learn long-term dependencies. LSTMs improve upon RNNs by introducing memory cells that allow them to retain information over long periods, which helps to address the vanishing gradient problem.

#### LSTM Architecture, Components, and Working

LSTM has three main components:

1. **Forget Gate**: Decides what information from the cell state should be discarded.
2. **Input Gate**: Determines what new information should be added to the cell state.
3. **Output Gate**: Decides what part of the cell state should be output to the next hidden state.

The working of LSTM can be broken down as follows:

- **Forget Gate**: It takes the previous hidden state and the current input, applies a sigmoid activation to decide which parts of the previous memory should be discarded.
- **Input Gate**: It adds new information to the cell state using a tanh activation function to create candidate values that could be added to the memory.
- **Cell State Update**: The cell state is updated by combining the forget gate’s output and the input gate’s output.
- **Output Gate**: The final output is based on the cell state, which is passed through the output gate and used in the next hidden state.

#### Applications of LSTM

LSTMs are commonly used in tasks that involve sequential data, such as:

- **Natural Language Processing (NLP)**: Sentiment analysis, text generation, machine translation.
- **Speech Recognition**: Converting spoken language into text.
- **Time Series Prediction**: Forecasting stock prices, weather predictions.
- **Video Processing**: Action recognition, object tracking.

#### RNN vs LSTM (Graphical Comparison)

Here is a graphical comparison of RNN and LSTM:

![rnn-vs-lstm](../images/4/4-rnn-vs-lstm.png)


---


#### Real-Life Application of RNN Using the Daily Dialog Dataset

- The dataset link &rarr; [Daily_Dialog_Dataset.csv](https://www.kaggle.com/datasets/va6573/daily-dialog-clean)


In [105]:
# Import libraries
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.models import Sequential
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer  # Deprecated

In [106]:
# Import dataset
df = pd.read_csv("../data/Daily_Dialog_Dataset.csv")
print(df.head())

   Emotion                                               Text
0      joy          yes now i have got it thank you very much
1  neutral  if i do a few exercises at home like crunches ...
2  neutral  ok i hope you can have these goods delivered b...
3  neutral                            well she is quite short
4      joy    oh thank you i am looking for the train station


In [107]:
# Create dataset
texts = df["Text"].astype(str).tolist()[:2000]

In [108]:
# Preparing tokenizer and sequences
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)
total_words = len(tokenizer.word_index) + 1

In [109]:
# Apply text sorting and padding
input_sequences = []
for text in texts:
    token_list = tokenizer.texts_to_sequences([text])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[: i + 1]
        input_sequences.append(n_gram_sequence)

max_sequence_length = max(len(x) for x in input_sequences)
input_sequences = pad_sequences(
    input_sequences, maxlen=max_sequence_length, padding="pre"
)

X, Y = input_sequences[:, :-1], input_sequences[:, -1]
Y = tf.keras.utils.to_categorical(Y, num_classes=total_words)

In [115]:
# Create LSTM model
model = Sequential()
model.add(Embedding(total_words, 50))
model.add(LSTM(200, return_sequences=False))
model.add(Dense(total_words, activation="softmax"))

model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])

In [119]:
# Train LSTM model
model.fit(
    X,
    Y,
    epochs=50,
    batch_size=64,
    verbose=1,
)

Epoch 1/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 79ms/step - accuracy: 0.2693 - loss: 3.5756
Epoch 2/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 78ms/step - accuracy: 0.2947 - loss: 3.3791
Epoch 3/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 77ms/step - accuracy: 0.3222 - loss: 3.2243
Epoch 4/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m26s[0m 77ms/step - accuracy: 0.3484 - loss: 3.0661
Epoch 5/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 87ms/step - accuracy: 0.3856 - loss: 2.8821
Epoch 6/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 84ms/step - accuracy: 0.4123 - loss: 2.7295
Epoch 7/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 89ms/step - accuracy: 0.4419 - loss: 2.5897
Epoch 8/50
[1m334/334[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m27s[0m 80ms/step - accuracy: 0.4724 - loss: 2.4587
Epoch 9/50
[1m334/334[

<keras.src.callbacks.history.History at 0x21eb8a18f50>

In [131]:
# Evaluation Text completion task
def predict_next_word(seed_text, next_words):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences(
            [token_list], maxlen=max_sequence_length - 1, padding="pre"
        )
        predicted_probs = model.predict(token_list, verbose=0)
        predicted_word_index = np.argmax(predicted_probs, axis=-1)
        predicted_word = tokenizer.index_word[predicted_word_index[0]]

        seed_text += " " + predicted_word

    return seed_text

In [154]:
seed_text = "this"
print(predict_next_word(seed_text, 1))

this is


In [156]:
seed_text = "it is"
print(predict_next_word(seed_text, 3))

it is a good thing


In [159]:
seed_text = "did you"
print(predict_next_word(seed_text, 5))

did you work as a salesperson before
