<a href="https://colab.research.google.com/github/shstreuber/Data-Mining/blob/master/Module10_Recurrent_Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**What is a Recurrent Neural Network**
A recurrent neural network (RNN) is the type of artificial neural network (ANN) that is used in Apple’s Siri and Google’s voice search.
<center>
<img src="https://www.apple.com/v/siri/h/images/overview/routers_tile_1__gds6mleh3lea_large.png">
</center>

At a high level, a recurrent neural network (RNN) **processes sequences** — whether daily stock prices, sentences, or sensor measurements — one element at a time **while retaining a memory** (called a state) of what has come previously in the sequence. In other words, RNN remembers past inputs due to an internal memory which is useful for predicting stock prices, generating text, transcriptions, and machine translation.

<img src = "https://media.geeksforgeeks.org/wp-content/uploads/20231204130132/RNN-vs-FNN-660.png">

**RECURRENT** means the output at the current time step becomes the input to the next time step. At each element of the sequence, the model considers not just the current input, but what it remembers about the preceding elements.

In an RNN, the information cycles through the loop, so the output is determined by the current input and previously received inputs. The input layer  processes the initial input and passes it to the middle layer RNN. The middle layer consists of multiple hidden layers, each with its activation functions, weights, and biases. These parameters are standardized across the hidden layer so that instead of creating multiple hidden layers, it will create one and loop it over.
<center>
<img src="https://cdn-images-1.medium.com/max/1500/1*czgLJc2bXADt9N7yJX6S1w.png">
</center>

Instead of using traditional backpropagation, like this:
<center>
<img src = "https://editor.analyticsvidhya.com/uploads/18870backprop2.png">
</center>

Recurrent Neural Networks use **backpropagation through time (BPTT) algorithms** to determine the gradient.



In [None]:
from IPython.display import IFrame  # This is just for me so I can embed videos
IFrame(src="https://www.youtube.com/embed/0XdPIqi0qpg", width=560, height=315)

<iframe width="560" height="315" src="https://www.youtube.com/embed/0XdPIqi0qpg?si=OrjrTrK88BJFrF5I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>

In backpropagation, the model adjusts the parameter by calculating errors from the output to the input layer. BPTT sums the error at each time step as RNN shares parameters across each layer.

NOW, take a look at this great video that explains RNNs in very clear terms:



In [None]:
IFrame(src="https://www.youtube.com/embed/AsNTP8Kwu80", width=560, height=315)

##**Recurrent Neural Network Example: Text Prediction**
RNNs are often used to predict text. So, that's what we will do.
<center>
<img src = "https://m.media-amazon.com/images/I/819+5EFvhBL._AC_UF1000,1000_QL80_.jpg" width = 300>
</center>

In the example below, we will use a Recurrent Neural Network to predict lines from [William Shakespeare's famous Sonnets](https://www.shakespeare.org.uk/explore-shakespeare/shakespedia/shakespeares-poems/).




##**0. Importing the Libraries and Preparing the Data**
1. We import the libraries we need (including numpy and tensorflow):

 1. **`import requests`**:
   - **Purpose**: This library helps your program communicate with websites.
   - **Example Use**: If you want to download a text file from the internet, you can use `requests` to fetch the file.

 2. **`from tensorflow.keras.preprocessing.text import Tokenizer`**:
   - **Purpose**: This tool helps convert text into numerical data that a computer can understand.
   - **Example Use**: If you have a sentence and you want to turn each word into a number, the `Tokenizer` can do that for you.

 3. **`from tensorflow.keras.preprocessing.sequence import pad_sequences`**:
   - **Purpose**: This tool ensures that all your sequences of numbers are the same length by adding extra zeros where needed.
   - **Example Use**: If you have sentences of different lengths and you need them all to be the same length for processing, `pad_sequences` can add padding to make them uniform.

Together, these libraries and tools help us fetch data from the internet, process it into a format suitable for machine learning, and build models to learn from and make predictions with this data.
2. Then we download the text of Shakespeare's Sonnets and break it down into smaller pieces (words and sentences).

In [1]:
import requests
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Download the text of Shakespeare's Sonnets
url = "https://raw.githubusercontent.com/shstreuber/Data-Mining/master/data/ShakespeareSonnets.txt"
response = requests.get(url)
text = response.text

 3. Next, we **tokenize** the Sonnets as follows:

In [2]:
# Tokenize the text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([text])
total_words = len(tokenizer.word_index) + 1

1. **`tokenizer = Tokenizer()`**:
   - This line creates an instance of the `Tokenizer` class from the `tensorflow.keras.preprocessing.text` module. The `Tokenizer` is a tool that helps convert text into sequences of numbers. Each unique word in the text will be assigned a unique number.

2. **`tokenizer.fit_on_texts([text])`**:
   - The `fit_on_texts` method updates the internal vocabulary based on the list of texts provided. In this case, we are passing the entire text (which contains Shakespeare's Sonnets) as a list with one element.
   - What happens here is that the `Tokenizer` goes through the entire text and builds a dictionary (`word_index`) where each unique word is assigned a unique integer index. For example, the word "the" might be assigned the index 1, "and" might be assigned 2, and so on.

3. **`total_words = len(tokenizer.word_index) + 1`**:
   - This line calculates the total number of unique words in the text by getting the length of the `word_index` dictionary.
   - `tokenizer.word_index` is a dictionary where the keys are the words and the values are their corresponding integer indices.
   - `len(tokenizer.word_index)` gives the number of unique words in the text. Adding 1 is often done to account for padding tokens (e.g., `0`), making the total word count useful for defining the vocabulary size in subsequent neural network layers.

In summary, this code snippet prepares the text for use in a machine learning model by converting words into numerical representations. This numerical representation is essential because our RNNs work with numbers rather than raw text.

The next code snippet creates sequences of tokenized words from the text. For each line in the text, it generates multiple n-gram sequences. Each n-gram sequence starts from the beginning of the line and extends one word at a time. These sequences are stored in the input_sequences list and will be used to train a model to predict the next word in a sequence, which is a common task in text generation and language modeling.

For example, if the line is "shall I compare thee," the sequences created will be:

"shall I"
"shall I compare"
"shall I compare thee"
Each sequence helps the model learn to predict the next word given a sequence of words.

In [3]:
# Create input sequences using the tokenized text
input_sequences = []
for line in text.split('\n'):
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i+1]
        input_sequences.append(n_gram_sequence)

The next code snippet ensures that all sequences in input_sequences have the same length by padding shorter sequences with zeros at the beginning. This is necessary because neural networks require input data to have a uniform shape. By padding the sequences, the model can process them in batches without encountering shape mismatches. The longest sequence determines the length to which all other sequences are padded.

In [4]:
# Pad sequences for equal length
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

##**1. Preprocessing**
Now that our data is prepared, we can create our predictors (input attributes) and label (output attribute). As with any other Neural Network, we need to one-hot encode the output.

In [5]:
# Create predictors and label
X, y = input_sequences[:,:-1],input_sequences[:,-1]

# One-hot encode the label
y = tf.keras.utils.to_categorical(y, num_classes=total_words)

##**2. Building and Compiling the Model**
Now we are ready to build and compils a recurrent neural network for text generation. The model consists of an embedding layer that converts word indices into dense vectors, a simple RNN layer with 150 units that processes the sequences, and a dense output layer with softmax activation that produces a probability distribution over the vocabulary. The model is compiled with categorical cross-entropy loss and the Adam optimizer, and it tracks accuracy as a performance metric.

In [6]:
# Build the model
model = tf.keras.Sequential([
    tf.keras.layers.Embedding(total_words, 100, input_length=max_sequence_len-1),
    tf.keras.layers.SimpleRNN(150, return_sequences=False),
    tf.keras.layers.Dense(total_words, activation='softmax')
])

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

###**Here is how we are building the Sequential model as an RNN:**

**1. Embedding Layer:**

`tf.keras.layers.Embedding(total_words, 100, input_length=max_sequence_len-1)`

This layer turns positive integers (indexes) into dense vectors of fixed size.
* total_words is the size of the vocabulary.
* 100 is the dimension of the dense embedding.
* input_length=max_sequence_len-1 specifies the length of input sequences after padding.

This layer effectively learns a vector representation for each word in the vocabulary.

**2. SimpleRNN Layer:**

`tf.keras.layers.SimpleRNN(150, return_sequences=False)`
This is a simple recurrent neural network (RNN) layer with 150 units.
* 150 is the number of units in the RNN.
* return_sequences=False means the layer will only return the output of the last time step, not the entire sequence.

This is suitable for many-to-one problems.

**3. Dense Layer:**

`tf.keras.layers.Dense(total_words, activation='softmax')`
This is a fully connected (Dense) layer with total_words units.
* total_words corresponds to the size of the vocabulary, as each unit will output a probability for each word.
* activation='softmax' ensures the output is a probability distribution over the vocabulary, making it suitable for classification tasks.

**Compile the Model:**

`model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])`
* **loss**:
categorical_crossentropy is used as the loss function, which is suitable for multi-class classification problems.
* **optimizer:**
adam is an optimization algorithm that adjusts the learning rate throughout training. It is widely used due to its efficiency and effectiveness.
* **metrics:**
['accuracy'] indicates that the model will also track accuracy as a metric during training and evaluation.

Not much of this should be new at this point.

##**3. Training the Model**
The next code snippet trains the neural network model using the input data (`X`) and target data (`y`). The model is trained for 100 epochs, meaning the entire dataset is passed through the network 100 times. The training process is monitored and printed to the console due to the `verbose=1` parameter. The result of this training process is stored in the `history` variable, which contains details about the training loss and accuracy for each epoch. This information can be useful for evaluating the model's performance and for plotting training curves.

In [None]:
# Fit the model
history = model.fit(X, y, epochs=100, verbose=1)

### Step-by-Step Breakdown:

1. **Fit the Model**:
   ```python
   history = model.fit(X, y, epochs=100, verbose=1)
   ```

   - **model.fit**:
     - This method trains the model on the given data (`X` and `y`).
     - `X` represents the input sequences used for training.
     - `y` represents the corresponding labels (one-hot encoded target words).

   - **Parameters**:
     - `X`: The input data, which are the sequences of tokens created from the text. Each sequence is padded to the same length.
     - `y`: The target data, which are the one-hot encoded next words corresponding to each input sequence.
     - `epochs=100`: The number of times the entire training dataset is passed forward and backward through the neural network. Training for more epochs usually leads to better performance, up to a point where the model starts to overfit.
     - `verbose=1`: This parameter controls how much information is displayed during training. A value of `1` means that the progress bar and training status will be printed to the console for each epoch.

##**4. Using the Model**
This is different from what you have seen before--because now we are generating text.

The generate_text function below iteratively generates text using a deep learning model (model) trained on sequences of tokens. It starts with a seed_text, predicts the next word based on the model's output probabilities, and continues until the desired number of words (next_words) is generated. The function relies on the tokenization and padding we did initially, and on the prediction mechanisms we just set up to produce coherent text based on the learned patterns in the training data.

In [8]:
# Function to generate text
def generate_text(seed_text, next_words, model, max_sequence_len):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = np.argmax(model.predict(token_list, verbose=0), axis=-1)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    return seed_text

Here's an explanation of what happens in this function:

```python
# Function to generate text
def generate_text(seed_text, next_words, model, max_sequence_len):
    for _ in range(next_words):
        token_list = tokenizer.texts_to_sequences([seed_text])[0]
        token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
        predicted = np.argmax(model.predict(token_list, verbose=0), axis=-1)
        output_word = ""
        for word, index in tokenizer.word_index.items():
            if index == predicted:
                output_word = word
                break
        seed_text += " " + output_word
    return seed_text
```

### Step-by-Step Breakdown:

1. **Function Definition**:
   ```python
   def generate_text(seed_text, next_words, model, max_sequence_len):
   ```
   - **Parameters**:
     - `seed_text`: The starting text sequence from which to generate additional words.
     - `next_words`: The number of words to generate beyond the `seed_text`.
     - `model`: The trained deep learning model used for text generation.
     - `max_sequence_len`: The maximum length of input sequences expected by the model.

2. **Tokenization and Padding**:
   ```python
   token_list = tokenizer.texts_to_sequences([seed_text])[0]
   token_list = pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')
   ```
   - **`tokenizer.texts_to_sequences([seed_text])[0]`**:
     - Converts `seed_text` into a sequence of tokens (numbers) based on the tokenizer's vocabulary.
   - **`pad_sequences([token_list], maxlen=max_sequence_len-1, padding='pre')`**:
     - Pads the token sequence to match the expected input shape of the model (`max_sequence_len-1`).

3. **Prediction and Output**:
   ```python
   predicted = np.argmax(model.predict(token_list, verbose=0), axis=-1)
   ```
   - **`model.predict(token_list, verbose=0)`**:
     - Generates predictions from the model for the padded token sequence `token_list`.
   - **`np.argmax(..., axis=-1)`**:
     - Retrieves the index of the word with the highest predicted probability from the model's output.

4. **Mapping Predictions to Words**:
   ```python
   output_word = ""
   for word, index in tokenizer.word_index.items():
       if index == predicted:
           output_word = word
           break
   ```
   - **`tokenizer.word_index.items()`**:
     - Provides a dictionary mapping from integer indices to words in the tokenizer's vocabulary.
   - **Matching Predicted Index**:
     - Finds the word corresponding to the predicted index returned by `np.argmax`.

5. **Generating Text**:
   ```python
   seed_text += " " + output_word
   ```
   - Appends the predicted word to `seed_text`, preparing it for the next iteration or as the final generated text.

6. **Return**:
   ```python
   return seed_text
   ```
   - Returns the generated text sequence after the specified number of words (`next_words`) have been added

##**5. Processing the User-Generated Input**
Another new thing! The user now supplies input.

This part of the code shows us how to use our trained deep learning model (model) to generate text based on an initial seed_text. It leverages the generate_text function to iteratively predict and append words to the seed_text, ultimately producing a longer text sequence (generated_text). The number of words to generate beyond the seed_text is controlled by the next_words parameter. This approach showcases the application of natural language generation using deep learning techniques.








In [None]:
# Generate text
seed_text = "shall i compare thee to a summer's day?"
next_words = 100
generated_text = generate_text(seed_text, next_words, model, max_sequence_len)
print(generated_text)

This code snippet generates text based on a provided `seed_text` using our pre-trained deep learning model (`model`). Here's a breakdown of what each line does:

1. **Seed Text Definition**:
   ```python
   seed_text = "shall i compare thee to a summer's day?"
   ```
   - Defines the starting text sequence (`seed_text`) from which the text generation process will begin.

2. **Next Words to Generate**:
   ```python
   next_words = 100
   ```
   - Specifies the number of words (`next_words`) to generate beyond the `seed_text`.

3. **Text Generation**:
   ```python
   generated_text = generate_text(seed_text, next_words, model, max_sequence_len)
   ```
   - Calls the `generate_text` function with parameters:
     - `seed_text`: The initial text sequence to start generating from.
     - `next_words`: The number of words to generate beyond the `seed_text`.
     - `model`: The pre-trained deep learning model used for text generation.
     - `max_sequence_len`: The maximum length of input sequences expected by the model.

4. **Print Generated Text**:
   ```python
   print(generated_text)
   ```
   - Outputs (`print`) the generated text (`generated_text`) to the console.

And that is your first Recurrent Neural Network in action!

## Your Turn
Run the code above a number of times. How does the output change when you run the code twice, 5 times, 10 times? Record your observations below!

##**Limitations**
Simple RNN models usually run into **two major issues**. These issues are related to gradient, which is the slope of the loss function along with the error function.

1. **Vanishing Gradient problem**
  <img src = "https://www.kdnuggets.com/wp-content/uploads/vanishing-gradient-problem-12.png">
  occurs when the gradient becomes so small that
updating parameters becomes insignificant; eventually the algorithm stops learning.
2. **Exploding Gradient problem**
<img src = "https://emergency.princeton.edu/sites/g/files/toruqf5936/files/styles/freeform_750w/public/2023-07/adobestock_explosion.jpeg?itok=clWTY0kY">
 occurs when the gradient becomes too large, which makes the model unstable. In this case, larger error gradients accumulate, and the model weights become too large. This issue can cause longer training times and poor model performance. It is less common than the Vanishing Gradient

Advanced RNN architectures such as LSTM and GRU mitigate the Vanishing Gradient problem.

Read more about RNNs [here](https://www.datacamp.com/tutorial/tutorial-for-recurrent-neural-network).

To preview LSTMs, go [here](https://towardsdatascience.com/recurrent-neural-networks-by-example-in-python-ffd204f99470).


#**MasterCard Stock Price Prediction Using LSTM & GRU**
Text generation is only one thing that we can do with Recurrent Neural Networks. We can also use them to predict numbers.

In the next project, we are going to use Kaggle’s MasterCard stock dataset from May-25-2006 to Oct-11-2021 and train the LSTM and GRU models to forecast the stock price. This is a simple project-based tutorial where we will analyze data, preprocess the data to train it on advanced RNN models, and finally evaluate the results.

The project requires Pandas and Numpy for data manipulation, Matplotlib.pyplot for data visualization, scikit-learn for scaling and evaluation, and TensorFlow for modeling. We will also set seeds for reproducibility.

##**0. Importing the Libraries and the Data**

In [None]:
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Dropout, GRU, Bidirectional
from tensorflow.keras.optimizers import SGD
from tensorflow.random import set_seed
set_seed(455)
np.random.seed(455)

dataset = pd.read_csv(
    "https://raw.githubusercontent.com/kalilurrahman/MasterCardStockData/main/Mastercard_stock_history.csv", index_col="Date", parse_dates=["Date"]
).drop(["Dividends", "Stock Splits"], axis=1)
print(dataset.head())

##**1. Exploratory Data Analysis (EDA)**

In [None]:
dataset.describe()

In [None]:
dataset.isna().sum()

In [None]:
# Checking the data distribution
tstart = 2016
tend = 2020

def train_test_plot(dataset, tstart, tend):
    dataset.loc[f"{tstart}":f"{tend}", "High"].plot(figsize=(16, 4), legend=True)
    dataset.loc[f"{tend+1}":, "High"].plot(figsize=(16, 4), legend=True)
    plt.legend([f"Train (Before {tend+1})", f"Test ({tend+1} and beyond)"])
    plt.title("MasterCard stock price")
    plt.show()

train_test_plot(dataset,tstart,tend)

##**2. Preprocessing**


In [None]:
#Setting up Training and Test Sets
def train_test_split(dataset, tstart, tend):
    train = dataset.loc[f"{tstart}":f"{tend}", "High"].values
    test = dataset.loc[f"{tend+1}":, "High"].values
    return train, test
training_set, test_set = train_test_split(dataset, tstart, tend)

In [None]:
#Standardizing the inputs with MinMaxScaler--this is a different form of normalization
sc = MinMaxScaler(feature_range=(0, 1))
training_set = training_set.reshape(-1, 1)
training_set_scaled = sc.fit_transform(training_set)

In [None]:
print("This is the beginning of the Training Set BEFORE scaling \n",training_set)
print("This is the beginning of the Training Set AFTER scaling \n",training_set_scaled)

In [None]:
#The split_sequence function uses a training dataset and converts it into inputs (X_train) and outputs (y_train).

def split_sequence(sequence, n_steps):
    X, y = list(), list()   # initialize two empty lists called X and y
    for i in range(len(sequence)): # loop through the sequence argument and calculate the end_ix variable by adding the current index i to the n_steps argument
        end_ix = i + n_steps
        if end_ix > len(sequence) - 1: # If end_ix is greater than the length of the sequence minus 1, the loop is broken
            break
        seq_x, seq_y = sequence[i:end_ix], sequence[end_ix] # create two variables seq_x and seq_y by slicing the sequence from the current index i to end_ix and selecting the value at end_ix
        X.append(seq_x) # append to the X and y lists
        y.append(seq_y)
    return np.array(X), np.array(y) # return X and y as numpy arrays


n_steps = 60 # initialize with 60
features = 1
# split into samples
X_train, y_train = split_sequence(training_set_scaled, n_steps) # assign the output of calling the split_sequence function with the training_set_scaled argument and n_steps variable

In [None]:
# We are working with univariate series, so the number of features is one, and we need to reshape the X_train to fit on the LSTM model.
# The X_train has [samples, timesteps], and we will reshape it to [samples, timesteps, features].

# Reshaping X_train for model
X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],features)

##**3. LSTM Model**
The Long Short Term Memory (LSTM) is an advanced type of RNN, designed to prevent both decaying and exploding gradient problems. Just like RNN, LSTM has repeating modules, but the structure is different. Instead of having a single layer of tanh, LSTM has four interacting layers that communicate with each other through three gates: The Forget Gate, the Input Gate, and the Output Gate. This four-layered structure helps LSTM retain long-term memory and can be used in several sequential problems including machine translation, speech synthesis, speech recognition, and handwriting recognition.

This is what an LSTM model looks like:

<img src="https://cdn-images-1.medium.com/max/1500/1*Mw4W7FZUbSr4EoriB5GuqQ.jpeg">

Our model will consist of a single hidden layer of LSTM and an output layer. You can experiment with the number of units, as more units will give you better results. For this experiment, we will set LSTM units to 125, tanh as activation, and set input size.

We will compile the model with an RMSprop optimizer and mean square error as a loss function.

###**3.1 Build the Model**

In [None]:
# The LSTM architecture
model_lstm = Sequential()
model_lstm.add(LSTM(units=125, activation="tanh", input_shape=(n_steps, features)))
model_lstm.add(Dense(units=1))
# Compiling the model
model_lstm.compile(optimizer="RMSprop", loss="mse")

model_lstm.summary()

###**3.2 Train the Model**

In [None]:
model_lstm.fit(X_train, y_train, epochs=50, batch_size=32) # train on 50 epochs with 32 batch sizes.

In this case, the loss value is very small, which indicates that the model is performing well on the training data.

###**3.3 Running the model on the Test Set**
We are going to repeat preprocessing and normalize the test set. First of all we will transform then split the dataset into samples, reshape it, predict, and inverse transform the predictions into standard form.

In [None]:
dataset_total = dataset.loc[:,"High"] # select the "High" column from the dataset and assigns it to the variable dataset_total.
inputs = dataset_total[len(dataset_total) - len(test_set) - n_steps :].values # selects the inputs for the test set by taking the values from dataset_total starting from the index len(dataset_total) - len(test_set) - n_steps
inputs = inputs.reshape(-1, 1) # reshape inputs to a 2D array with one column using inputs.reshape(-1, 1).
#scaling
inputs = sc.transform(inputs) #  scale inputs using the sc.transform() metho

# Split into samples
X_test, y_test = split_sequence(inputs, n_steps)
# reshape
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], features) # reshape X_test array to a 3D array with dimensions (number of samples, n_steps, features)
#prediction
predicted_stock_price = model_lstm.predict(X_test) # predict the stock prices based on the input sequences in X_test
#inverse transform the values
predicted_stock_price = sc.inverse_transform(predicted_stock_price) # inverse transform using sc.inverse_transform() to obtain the actual stock prices.

###**3.4 Plot Predictions vs. Actual Values from the Test Set**

In [None]:
def plot_predictions(test, predicted): #  take in two arguments, test and predicted, which are arrays representing the real and predicted values of a stock price over time
    plt.plot(test, color="gray", label="Real") # use the matplotlib  to plot these two arrays on a graph, with the real values in gray and the predicted values in red.
    plt.plot(predicted, color="red", label="Predicted")
    plt.title("MasterCard Stock Price Prediction")
    plt.xlabel("Time")
    plt.ylabel("MasterCard Stock Price")
    plt.legend()
    plt.show()


def return_rmse(test, predicted):
    rmse = np.sqrt(mean_squared_error(test, predicted)) #  use numpy to calculate the root mean squared error (RMSE) between the two arrays
    print("The root mean squared error is {:.2f}.".format(rmse))

plot_predictions(test_set,predicted_stock_price)
return_rmse(test_set,predicted_stock_price)


#**4. GRU Model**

The gated recurrent unit (GRU) is a variation of LSTM as both have design similarities, and in some cases, they produce similar results. GRU uses an update gate and reset gate to solve the vanishing gradient problem. These gates decide what information is important and pass it to the output. The gates can be trained to store information from long ago, without vanishing over time or removing irrelevant information.

Unlike LSTM, GRU does not have cell state Ct. It only has a hidden state ht, and due to the simple architecture, GRU has a lower training time compared to LSTM models. The GRU architecture is simpler as it takes input x(t) and the hidden state from the previous timestamp h(t-1) and outputs the new hidden state h(t).

<img src ="https://cdn-images-1.medium.com/max/1500/1*zFhmhw_SZcX4kUVQH-z2aw.jpeg">

We are going to keep everything the same and just replace the LSTM layer with the GRU layer so we can compare the results. The model structure contains a single GRU layer with 125 units and an output layer.


###**4.1 Build the Model**

In [None]:
model_gru = Sequential()
model_gru.add(GRU(units=125, activation="tanh", input_shape=(n_steps, features)))
model_gru.add(Dense(units=1))
# Compiling the RNN
model_gru.compile(optimizer="RMSprop", loss="mse")

model_gru.summary()

###**4.2 Train the Model**

In [None]:
model_gru.fit(X_train, y_train, epochs=50, batch_size=32)

###**4.3 Run the Model on the Test Set**

In [None]:
GRU_predicted_stock_price = model_gru.predict(X_test)
GRU_predicted_stock_price = sc.inverse_transform(GRU_predicted_stock_price)

###**4.4 Plot Predictions vs. Actual Values from the Test Set**

In [None]:
plot_predictions(test_set, GRU_predicted_stock_price)
return_rmse(test_set,GRU_predicted_stock_price)