# LSTM Stock Predictor Using Closing Prices

## Data Preparation

In this section, we will prepare the training and testing data for the LSTM model.

We will need to:
1. Use the `window_data` function to generate the X and y values for the model.
2. Split the data into 70% training and 30% testing
3. Apply the MinMaxScaler to the `X` and `y` values
4. Reshape the `X_train` and `X_test` data for the model.

**Note:** The required input format for the LSTM is:

```python
reshape((X_train.shape[0], X_train.shape[1], 1))
```

In [188]:
# Initial imports
import numpy as np
import pandas as pd
from pathlib import Path
from collections import Counter
from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import confusion_matrix
from imblearn.metrics import classification_report_imbalanced

%matplotlib inline

In [189]:
file_beef = Path("Beef84_22.csv")
file_eggs = Path("Eggs80_22.csv")
file_bread = Path("Bread80_22.csv")
file_chicken = Path("Chicken80_22.csv")
file_diesel = Path("Diesel98_22s.csv")
file_electric = Path("Electricity79_22.csv")
file_energy = Path("Energy00_22.csv")
file_flour = Path("Flour80_22.csv")
file_fuel = Path("Fuel79_22.csv")
file_gas = Path("Gasoline80_22.csv")
file_malt = Path("Malt96_22.csv")
file_medical = Path("Medical00_22.csv")
file_milk = Path("Milk95_22.csv")
file_pres = Path("Prescription80_22.csv")
file_shelter = Path("Shelter80_22.csv")
file_sugar = Path("Sugar80_22.csv")
file_utility = Path("Utility00_22.csv")

file_cpi = Path("CPI_Average.csv")

In [190]:
beef_file = pd.read_csv(file_beef)
eggs_file = pd.read_csv(file_eggs)
bread_file = pd.read_csv(file_bread)
chicken_file = pd.read_csv(file_chicken)
diesel_file = pd.read_csv(file_diesel)
electricity_file = pd.read_csv(file_electric)
energy_file = pd.read_csv(file_energy)
flour_file = pd.read_csv(file_flour)
fuel_file = pd.read_csv(file_fuel)
gasoline_file = pd.read_csv(file_gas)
malt_file = pd.read_csv(file_malt)
medical_file = pd.read_csv(file_medical)
milk_file = pd.read_csv(file_milk)
presecription_file = pd.read_csv(file_pres)
shelter_file = pd.read_csv(file_shelter)
sugar_file = pd.read_csv(file_sugar)
utility_file = pd.read_csv(file_utility)

CPI_file = pd.read_csv(file_cpi)

In [191]:
beef_file.head()
eggs_file.head()

Unnamed: 0,Year,Eggs
0,01-01-1980,0.879
1,02-01-1980,0.774
2,03-01-1980,0.812
3,04-01-1980,0.797
4,05-01-1980,0.737


In [192]:
df_beef = pd.DataFrame(beef_file)
df_beef.set_index(pd.to_datetime(df_beef['Year'], infer_datetime_format=True), inplace=True)
df_beef = df_beef.drop(columns=['Year'], axis=1)
df_beef = df_beef.pct_change()

df_eggs = pd.DataFrame(eggs_file)
df_eggs.set_index(pd.to_datetime(df_eggs['Year'], infer_datetime_format=True), inplace=True)
df_eggs = df_eggs.drop(columns=['Year'], axis=1)
df_eggs = df_eggs.pct_change()

df_bread = pd.DataFrame(bread_file)
df_bread.set_index(pd.to_datetime(df_bread['Year'], infer_datetime_format=True), inplace=True) 
df_bread = df_bread.drop(columns=['Year'], axis=1)
df_bread = df_bread.pct_change()

df_chicken = pd.DataFrame(chicken_file)
df_chicken.set_index(pd.to_datetime(df_chicken['Year'], infer_datetime_format=True), inplace=True)
df_chicken = df_chicken.drop(columns=['Year'], axis=1)
df_chicken  = df_chicken .pct_change()

df_diesel = pd.DataFrame(diesel_file)
df_diesel.set_index(pd.to_datetime(df_diesel ['Year'], infer_datetime_format=True), inplace=True)
df_diesel  = df_diesel .drop(columns=['Year'], axis=1)
df_diesel = df_diesel.pct_change()

df_electric = pd.DataFrame(electricity_file)
df_electric.set_index(pd.to_datetime(df_electric['Year'], infer_datetime_format=True), inplace=True)
df_electric = df_electric.drop(columns=['Year'], axis=1)
df_electric = df_electric.pct_change()

df_energy = pd.DataFrame(energy_file)
df_energy.set_index(pd.to_datetime(df_energy['Year'], infer_datetime_format=True), inplace=True)
df_energy = df_energy.drop(columns=['Year'], axis=1)
df_energy = df_energy.pct_change()


df_flour = pd.DataFrame(flour_file)
df_flour.set_index(pd.to_datetime(df_flour['Year'], infer_datetime_format=True), inplace=True)
df_flour = df_flour.drop(columns=['Year'], axis=1)
df_flour = df_flour.pct_change()


df_fuel = pd.DataFrame(fuel_file)
df_fuel.set_index(pd.to_datetime(df_fuel['Year'], infer_datetime_format=True), inplace=True)
df_fuel = df_fuel.drop(columns=['Year'], axis=1)
df_fuel = df_fuel.pct_change()

df_gas = pd.DataFrame(gasoline_file)
df_gas.set_index(pd.to_datetime(df_gas['Year'], infer_datetime_format=True), inplace=True)
df_gas = df_gas.drop(columns=['Year'], axis=1)
df_gas = df_gas.pct_change()


df_malt = pd.DataFrame(malt_file)
df_malt.set_index(pd.to_datetime(df_malt['Year'], infer_datetime_format=True), inplace=True)
df_malt = df_malt.drop(columns=['Year'], axis=1)
df_malt = df_malt.pct_change()


df_medical = pd.DataFrame(medical_file)
df_medical.set_index(pd.to_datetime(df_medical['Year'], infer_datetime_format=True), inplace=True)
df_medical = df_medical.drop(columns=['Year'], axis=1)
df_medical = df_medical.pct_change()


df_milk = pd.DataFrame(milk_file)
df_milk.set_index(pd.to_datetime(df_milk['Year'], infer_datetime_format=True), inplace=True)
df_milk = df_milk.drop(columns=['Year'], axis=1)
df_milk = df_milk.pct_change()



df_pres = pd.DataFrame(presecription_file)
df_pres.set_index(pd.to_datetime(df_pres['Year'], infer_datetime_format=True), inplace=True)
df_pres = df_pres.drop(columns=['Year'], axis=1)
df_pres = df_pres.pct_change()


df_shelter = pd.DataFrame(shelter_file)
df_shelter.set_index(pd.to_datetime(df_shelter ['Year'], infer_datetime_format=True), inplace=True)
df_shelter  = df_shelter.drop(columns=['Year'], axis=1)
df_shelter = df_shelter.pct_change()

df_sugar = pd.DataFrame(sugar_file)
df_sugar.set_index(pd.to_datetime(df_sugar['Year'], infer_datetime_format=True), inplace=True)
df_sugar = df_sugar.drop(columns=['Year'], axis=1)
df_sugar = df_sugar.pct_change()

df_utility = pd.DataFrame(utility_file)
df_utility.set_index(pd.to_datetime(df_utility['Year'], infer_datetime_format=True), inplace=True) 
df_utility = df_utility.drop(columns=['Year'], axis=1)
df_utility  = df_utility .pct_change()

df_cpi = pd.DataFrame(CPI_file)
df_cpi.set_index(pd.to_datetime(df_cpi['Year'], infer_datetime_format=True), inplace=True) 
df_cpi = df_cpi.drop(columns=['Year'], axis=1)
df_cpi  = df_cpi.pct_change()

In [193]:
df_beef.head()

Unnamed: 0_level_0,Beef
Year,Unnamed: 1_level_1
1980-01-01,
1984-02-01,0.03876
1984-03-01,-0.023881
1984-04-01,0.017584
1984-05-01,-0.022539


In [194]:
group_data = pd.concat([df_beef , df_chicken,df_eggs, df_bread,df_diesel,df_electric,df_energy,df_flour,df_fuel, df_gas,df_malt, df_medical, df_milk,df_utility,df_sugar, df_shelter, df_cpi], axis="columns", join = "inner")

group_data.head(100)

Unnamed: 0_level_0,Beef,Chicken,Eggs,Bread,Diesel,Electric79_22,Energy,Flour,Fuel,Gasoline,Malt,Medical,Milk,Utility,Sugar,Shelter,CPI
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2000-01-01,-0.029450,0.005698,0.059783,0.008899,0.025792,-0.011765,,0.071161,0.092831,0.003587,0.058419,,-0.031304,,0.025822,0.007953,0.002971
2000-02-01,0.022252,-0.012276,-0.013333,0.018743,0.123563,0.011905,0.034930,-0.066434,0.357443,0.048257,-0.046537,0.005970,-0.002873,0.020937,-0.011442,0.004734,0.005924
2000-03-01,0.010554,0.017208,-0.032225,0.000000,0.014706,0.000000,0.050145,0.067416,-0.157993,0.115942,0.069240,0.003956,-0.010443,-0.005859,-0.006944,0.006283,0.008245
2000-04-01,0.033943,0.004699,0.008593,0.003247,-0.038437,0.000000,-0.010101,-0.007018,-0.053716,-0.022918,-0.057325,0.001970,0.009098,0.000000,-0.034965,0.000520,0.000584
2000-05-01,-0.008838,-0.015903,-0.092652,-0.012945,-0.009174,0.000000,0.001855,0.067138,-0.017885,-0.002346,0.052928,0.002950,0.003246,0.008841,0.024155,0.000520,0.001168
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2007-12-01,-0.024028,0.004307,0.127282,0.036437,0.000292,0.000000,-0.006429,0.025773,0.024613,-0.015915,-0.039439,0.002534,-0.008709,0.004128,-0.013725,0.000681,-0.000671
2008-01-01,0.042077,-0.002573,0.036208,0.000781,-0.002921,0.008696,0.009080,0.057789,0.027718,0.009923,0.055657,0.007051,0.000258,0.008501,0.031809,0.006185,0.004971
2008-02-01,0.022766,-0.003439,-0.003218,0.031226,0.010841,0.000000,0.000512,0.080760,0.000300,-0.005757,-0.032844,0.005105,-0.000517,0.006404,-0.011561,0.003752,0.002904
2008-03-01,-0.036959,0.007765,0.016144,0.021953,0.135652,0.000000,0.050315,0.074725,0.108149,0.072338,0.044683,0.000830,-0.022745,0.017878,-0.017544,0.004939,0.008668


In [195]:
group_data.tail(100)

Unnamed: 0_level_0,Beef,Chicken,Eggs,Bread,Diesel,Electric79_22,Energy,Flour,Fuel,Gasoline,Malt,Medical,Milk,Utility,Sugar,Shelter,CPI
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2014-02-01,0.025382,-0.016351,-0.004980,0.016850,0.019370,0.000000,0.012323,0.022099,0.043033,0.011190,0.062998,0.007514,0.002534,0.011339,-0.020701,0.002489,0.003698
2014-03-01,0.040225,0.026596,0.031532,-0.020893,0.011654,0.007463,0.034338,-0.007207,-0.029470,0.050550,-0.064516,0.000990,0.030329,0.014671,-0.017886,0.003788,0.006440
2014-04-01,0.029746,-0.009067,0.028142,0.021339,-0.006011,-0.029630,0.007494,-0.007260,-0.030870,0.035705,0.067362,0.001574,0.004906,-0.019606,-0.008278,0.001743,0.003297
2014-05-01,0.012605,0.016993,-0.058046,0.009366,-0.009322,0.038168,0.013481,-0.027422,-0.003916,0.006656,-0.069121,0.001815,0.013019,0.018151,0.013356,0.003224,0.003493
2014-06-01,0.006224,-0.034062,-0.024048,-0.000714,-0.003306,0.051471,0.016759,0.000000,-0.009436,0.002835,0.083132,0.000671,-0.029183,0.028472,0.004942,0.002031,0.001862
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-01-01,-0.010860,0.009963,0.078859,0.015013,0.003824,0.035211,0.018268,0.092784,0.096718,0.000796,0.008940,0.009266,0.011755,0.029073,0.021708,0.003994,0.008415
2022-02-01,0.016689,0.006165,0.039399,0.014791,0.060952,0.006803,0.025413,0.009434,0.074682,0.053520,0.024051,0.003491,0.023237,0.000597,0.016997,0.005846,0.009134
2022-03-01,0.027430,0.056373,0.020449,0.018378,0.235701,0.013514,0.105373,0.046729,0.214391,0.197870,0.004944,0.005085,0.010839,0.014263,0.029248,0.005836,0.013351
2022-04-01,0.033424,0.040603,0.231672,0.003111,0.062474,0.006667,0.001888,0.015625,0.043628,-0.010087,-0.004305,0.003156,0.024253,0.010226,0.001353,0.005109,0.005583


In [196]:
# Set the random seed for reproducibility
# Note: This is used for model prototyping, but it is good practice to comment this out and run multiple experiments to evaluate your model.
#from numpy.random import seed

#seed(1)
#from tensorflow import random

#random.set_seed(2)

### Data Loading

In this activity, we will use closing prices from different stocks to make predictions of future closing prices based on the temporal data of each stock.

### Creating the Features `X` and Target `y` Data

The first step towards preparing the data is to create the input features vectors `X` and the target vector `y`. We will use the `window_data()` function to create these vectors.

This function chunks the data up with a rolling window of _X<sub>t</sub> - window_ to predict _X<sub>t</sub>_.

The function returns two `numpy` arrays:

* `X`: The input features vectors.

* `y`: The target vector.

The function has the following parameters:

* `df`: The original DataFrame with the time series data.

* `window`: The window size in days of previous closing prices that will be used for the prediction.

* `feature_col_number`: The column number from the original DataFrame where the features are located.

* `target_col_number`: The column number from the original DataFrame where the target is located.

In [197]:
def window_data(df, window, feature_col_number, target_col_number):
    """
    This function accepts the column number for the features (X) and the target (y).
    It chunks the data up with a rolling window of Xt - window to predict Xt.
    It returns two numpy arrays of X and y.
    """
    X = []
    y = []
    for i in range(len(df) - window):
        features = df.iloc[i : (i + window), feature_col_number]
        target = df.iloc[(i + window), target_col_number]
        X.append(features)
        y.append(target)
    return np.array(X), np.array(y).reshape(-1, 1)

In [198]:
X = group_data.iloc[:,0:14].values
y = group_data.iloc[:,15:15].values

In [199]:
# Use 70% of the data for training and the remainder for testing
split = int(0.7 * len(X))
X_train = X[: split]
X_test = X[split:]
y_train = y[: split]
y_test = y[split:]

In the forthcoming activities, we will predict closing prices using a `5` days windows of previous _T-Bonds_ closing prices, so that, we will create the `X` and `y` vectors by calling the `window_data` function and defining a window size of `5` and setting the features and target column numbers to `2` (this is the column with the _T-Bonds_ closing prices).

In [200]:
# Creating the features (X) and target (y) data using the window_data() function.
window_size = 30

#feature_column = range(15)
feature_column = 16
target_column = 16
#X, y = window_data(group_data, window_size, feature_column, target_column)
print (f"X sample values:\n{X[:5]} \n")
print (f"y sample values:\n{y[:5]}")

X sample values:
[[-0.02945026  0.00569801  0.05978261  0.00889878  0.02579219 -0.01176471
          nan  0.07116105  0.09283088  0.00358744  0.05841924         nan
  -0.03130435         nan]
 [ 0.02225219 -0.01227573 -0.01333333  0.01874311  0.12356322  0.01190476
   0.03493014 -0.06643357  0.35744323  0.04825737 -0.0465368   0.00597015
  -0.00287253  0.02093719]
 [ 0.01055409  0.01720841 -0.03222453  0.          0.01470588  0.
   0.05014465  0.06741573 -0.15799257  0.11594203  0.0692395   0.00395648
  -0.01044292 -0.00585938]
 [ 0.03394256  0.00469925  0.00859291  0.00324675 -0.0384373   0.
  -0.01010101 -0.00701754 -0.05371597 -0.02291826 -0.05732484  0.00197044
   0.00909753  0.        ]
 [-0.00883838 -0.01590271 -0.09265176 -0.01294498 -0.00917431  0.
   0.00185529  0.06713781 -0.01788491 -0.00234558  0.05292793  0.00294985
   0.00324558  0.00884086]] 

y sample values:
[]


### Splitting Data Between Training and Testing Sets

To avoid the dataset being randomized, we will manually split the data using array slicing.

In [201]:
# Reshape the features for the model
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
print (f"X_train sample values:\n{X_train[:5]} \n")
print (f"X_test sample values:\n{X_test[:5]}")

X_train sample values:
[[[-0.02945026]
  [ 0.00569801]
  [ 0.05978261]
  [ 0.00889878]
  [ 0.02579219]
  [-0.01176471]
  [        nan]
  [ 0.07116105]
  [ 0.09283088]
  [ 0.00358744]
  [ 0.05841924]
  [        nan]
  [-0.03130435]
  [        nan]]

 [[ 0.02225219]
  [-0.01227573]
  [-0.01333333]
  [ 0.01874311]
  [ 0.12356322]
  [ 0.01190476]
  [ 0.03493014]
  [-0.06643357]
  [ 0.35744323]
  [ 0.04825737]
  [-0.0465368 ]
  [ 0.00597015]
  [-0.00287253]
  [ 0.02093719]]

 [[ 0.01055409]
  [ 0.01720841]
  [-0.03222453]
  [ 0.        ]
  [ 0.01470588]
  [ 0.        ]
  [ 0.05014465]
  [ 0.06741573]
  [-0.15799257]
  [ 0.11594203]
  [ 0.0692395 ]
  [ 0.00395648]
  [-0.01044292]
  [-0.00585938]]

 [[ 0.03394256]
  [ 0.00469925]
  [ 0.00859291]
  [ 0.00324675]
  [-0.0384373 ]
  [ 0.        ]
  [-0.01010101]
  [-0.00701754]
  [-0.05371597]
  [-0.02291826]
  [-0.05732484]
  [ 0.00197044]
  [ 0.00909753]
  [ 0.        ]]

 [[-0.00883838]
  [-0.01590271]
  [-0.09265176]
  [-0.01294498]
  [-0.009

### Scaling Data with `MinMaxScaler`

Once the training and test datasets are created, we need to scale the data before training the LSTM model. We will use the `MinMaxScaler` from `sklearn` to scale all values between `0` and `1`.

Note that we scale both features and target sets.

In [202]:

# Use the MinMaxScaler to scale data between 0 and 1.
#from sklearn.preprocessing import MinMaxScaler

# Create a MinMaxScaler object
#scaler = MinMaxScaler()

# Fit the MinMaxScaler object with the training feature data X_train
#scaler.fit(X_train)

# Scale the features training and testing sets
#X_train = scaler.transform(X_train)
#X_test = scaler.transform(X_test)

# Fit the MinMaxScaler object with the training target data y_train
#scaler.fit(y_train)

# Scale the target training and testing sets
#y_train = scaler.transform(y_train)
#y_test = scaler.transform(y_test)




In [203]:
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
from sklearn.preprocessing import StandardScaler 
scaler = StandardScaler().fit(X_train) 
X_train_scaled = scaler.transform(X_train) 
X_test_scaled = scaler.transform(X_test)

### Reshape Features Data for the LSTM Model

The LSTM API from Keras needs to receive the features data as a _vertical vector_, so that we need to reshape the `X` data in the form `reshape((X_train.shape[0], X_train.shape[1], 1))`.

Both sets, training, and testing are reshaped.

In [204]:
# Reshape the features for the model
#X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
#X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
#print (f"X_train sample values:\n{X_train[:5]} \n")
#print (f"X_test sample values:\n{X_test[:5]}")

---

## Build and Train the LSTM RNN

In this section, we will design a custom LSTM RNN in Keras and fit (train) it using the training data we defined.

We will need to:

1. Define the model architecture in Keras.

2. Compile the model.

3. Fit the model to the training data.

### Importing the Keras Modules

The LSTM RNN model in Keras uses the `Sequential` model and the `LSTM` layer as we did before. However, there is a new type of layer called `Dropout`.

* `Dropout`: Dropout is a regularization technique for reducing overfitting in neural networks. This type of layer applies the dropout technique to the input.

In [205]:
# Import required Keras modules
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

### Defining the LSTM RNN Model Structure

To create an LSTM RNN model, we will add `LSTM` layers. The `return_sequences` parameter needs to set to `True` every time we add a new `LSTM` layer, excluding the final layer. The `input_shape` is the number of time steps and the number of indicators

After each `LSTM` layer, we add a `Dropout` layer to prevent overfitting. The parameter passed to the `Dropout` layer is the fraction of nodes that will be drop on each epoch, for this demo, we will use a dropout value of `0.2`, it means that on each epoch we will randomly drop `20%` of the units.

The number of units in each `LSTM` layers, is equal to the size of the time window, in this demo, we are taking five previous `T-Bons` closing price to predict the next closing price.

In [206]:
# Define the LSTM RNN model.
model = Sequential()

number_units = 16
dropout_fraction = 0.2

# Layer 1
model.add(LSTM(
    units=number_units,
    return_sequences=True,
    input_shape=(X_train.shape[1], 1))
    )
model.add(Dropout(dropout_fraction))
# Layer 2
model.add(LSTM(units=number_units, return_sequences=True))
model.add(Dropout(dropout_fraction))
# Layer 3
model.add(LSTM(units=number_units))
model.add(Dropout(dropout_fraction))
# Output layer
model.add(Dense(14))

### Compiling the LSTM RNN Model

We will compile the model, using the `adam` optimizer, as loss function, we will use `mean_square_error` since the value we want to predict is continuous.

In [207]:
# Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

In [208]:
# Summarize the model
model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 lstm_15 (LSTM)              (None, 14, 16)            1152      
                                                                 
 dropout_15 (Dropout)        (None, 14, 16)            0         
                                                                 
 lstm_16 (LSTM)              (None, 14, 16)            2112      
                                                                 
 dropout_16 (Dropout)        (None, 14, 16)            0         
                                                                 
 lstm_17 (LSTM)              (None, 16)                2112      
                                                                 
 dropout_17 (Dropout)        (None, 16)                0         
                                                                 
 dense_5 (Dense)             (None, 14)               

### Training the Model

Once the model is defined, we train (fit) the model using `10` epochs. Since we are working with time-series data, it's important to set `shuffle=False` since it's necessary to keep the sequential order of the data.

We can experiment with the `batch_size` parameter; however, smaller batch size is recommended; in this demo, we will use a `batch_size=1`.

In [209]:
# Train the model
model.fit(X_train, y_train, epochs=10, shuffle=False, batch_size=4, verbose=1)

Epoch 1/10


ValueError: in user code:

    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\engine\training.py", line 1051, in train_function  *
        return step_function(self, iterator)
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\engine\training.py", line 1040, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\engine\training.py", line 1030, in run_step  **
        outputs = model.train_step(data)
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\engine\training.py", line 890, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\engine\training.py", line 949, in compute_loss
        y, y_pred, sample_weight, regularization_losses=self.losses)
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\engine\compile_utils.py", line 201, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\losses.py", line 139, in __call__
        losses = call_fn(y_true, y_pred)
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\losses.py", line 243, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "c:\Users\scott\anaconda3\envs\pyvizenv\envs\pyvizenv2\lib\site-packages\keras\losses.py", line 1327, in mean_squared_error
        return backend.mean(tf.math.squared_difference(y_pred, y_true), axis=-1)

    ValueError: Dimensions must be equal, but are 14 and 0 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](sequential_5/dense_5/BiasAdd, IteratorGetNext:1)' with input shapes: [?,14], [?,0].


---
## Model Performance

In this section, we will evaluate the model using the test data. 

We will need to:

1. Evaluate the model using the `X_test` and `y_test` data.

2. Use the `X_test` data to make predictions.

3. Create a DataFrame of real (`y_test`) vs predicted values.

4. Plot the Real vs predicted values as a line chart.

### Evaluate the Model

It's time to evaluate our model to assess its performance. We will use the `evaluate` method using the testing data.

In [None]:
# Evaluate the model
model.evaluate(X_test, y_test)

### Making Predictions

We will make some closing price predictions using our brand new LSTM RNN model and our testing data.

In [None]:
# Make some predictions
predicted = model.predict(X_test)

Since we scaled the original values using the `MinMaxScaler`, we need to recover the original prices to better understand the predictions.

We will use the `inverse_transform()` method of the scaler to decode the scaled values to their original scale.

In [None]:
# Recover the original prices instead of the scaled version
predicted_prices = scaler.inverse_transform(predicted)
real_prices = scaler.inverse_transform(y_test.reshape(-1, 1))

### Plotting Predicted Vs. Real Prices

To plot the predicted vs. the real values, we will create a DataFrame.

In [None]:
# Create a DataFrame of Real and Predicted values
stocks = pd.DataFrame({
    "Real": real_prices.ravel(),
    "Predicted": predicted_prices.ravel()
    }, index = group_data.index[-len(real_prices): ])
stocks.head()

In [None]:
# Plot the real vs predicted prices as a line chart
stocks.plot()