# LSTM Stock Predictor Using FNG



## Data Preparation

The training and testing data will be prepared for the model. The model will use a rolling 10 day window to predict the 11th day closing price.

You will need to:
1. Use the `window_data` function to generate the X and y values for the model.
2. Split the data into 70% training and 30% testing
3. Apply the MinMaxScaler to the X and y values
4. Reshape the X_train and X_test data for the model. Note: The required input format for the LSTM is:

```python
reshape((X_train.shape[0], X_train.shape[1], 1))
```

In [24]:
import numpy as np
import pandas as pd
from path import Path

%matplotlib inline
import hvplot.pandas

In [25]:
# Set the random seed for reproducibility

from numpy.random import seed
seed(1)

from tensorflow import random
random.set_seed(2)

## Data Loading

In [27]:
# Load the FNG sentiment data for BTC
df_fng = pd.read_csv('btc_sentiment.csv', index_col="date", infer_datetime_format=True, parse_dates=True)
df_fng = df_fng.drop(columns="fng_classification")
df_fng.head()

Unnamed: 0_level_0,fng_value
date,Unnamed: 1_level_1
2019-07-29,19
2019-07-28,16
2019-07-27,47
2019-07-26,24
2019-07-25,42


In [28]:
# Load BTC historic closing price
df_btc = pd.read_csv('btc_historic.csv', index_col="Date", infer_datetime_format=True, parse_dates=True)['Close']
df_btc = df_btc.sort_index()
df_btc.tail()

Date
2019-07-25    9882.429688
2019-07-26    9847.450195
2019-07-27    9478.320313
2019-07-28    9531.769531
2019-07-29    9529.889648
Name: Close, dtype: float64

In [31]:
# Join both DataFrames to a single one.
df = df_fng.join(df_btc, how="inner")
df.head()

Unnamed: 0,fng_value,Close
2018-02-01,30,9114.719727
2018-02-02,15,8870.820313
2018-02-03,40,9251.269531
2018-02-04,24,8218.049805
2018-02-05,11,6937.080078


## Creating the Features X and Target y Data

The first step towards preparing the data is to create the input features vectors X and the target vector y. We will use the window_data() function to create these vectors.This function chunks the data up with a rolling window of X<sub>t</sub> - window to predict X<sub>t</sub>.The function returns two numpy arrays:

X: The input features vectors.

y: The target vector.

In [32]:

def window_data(df, window, feature_col_number, target_col_number):
    X = []
    y = []
    for i in range(len(df) - window - 1):
        features = df.iloc[i:(i + window), feature_col_number]
        target = df.iloc[(i + window), target_col_number]
        X.append(features)
        y.append(target)
    return np.array(X), np.array(y).reshape(-1, 1)

In [49]:
# Predict Closing Prices using a 10 day window of previous closing prices
# Try a window size anywhere from 1 to 10 and see how the model performance changes
window_size = 10

# Column index 1 is the `Close` column
feature_column = 0
target_column = 1
X, y = window_data(df, window_size, feature_column, target_column)
print (f"X sample values:\n{X[:5]} \n")
print (f"y sample values:\n{y[:5]}")

X sample values:
[[30 15 40 24 11  8 36 30 44 54]
 [15 40 24 11  8 36 30 44 54 31]
 [40 24 11  8 36 30 44 54 31 42]
 [24 11  8 36 30 44 54 31 42 35]
 [11  8 36 30 44 54 31 42 35 55]] 

y sample values:
[[ 8084.609863]
 [ 8911.269531]
 [ 8544.69043 ]
 [ 9485.639648]
 [10033.75    ]]


## Splitting Data Between Training and Testing Sets

In [50]:
# Use 70% of the data for training and the remaineder for testing
split = int(.7 * len(X))
X_train = X[:split - 1]
X_test = X[split:]
y_train = y[:split - 1]
y_test = y[split:]

##  Scaling Data with MinMaxScaler

In [51]:
# This uses MinMaxScaler to scale between 0 and 1. 
from sklearn.preprocessing import  MinMaxScaler

scaler = MinMaxScaler()
scaler.fit(X)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
scaler.fit(y)
y_train = scaler.transform(y_train)
y_test = scaler.transform(y_test)

## Reshape Features Data for the LSTM Model

In [52]:
# Reshape the features for the model
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
print (f"X_train sample values:\n{X_train[:5]} \n")
print (f"X_test sample values:\n{X_test[:5]}")

X_train sample values:
[[[0.25287356]
  [0.08045977]
  [0.36781609]
  [0.18390805]
  [0.03448276]
  [0.        ]
  [0.31395349]
  [0.24418605]
  [0.40697674]
  [0.52325581]]

 [[0.08045977]
  [0.36781609]
  [0.18390805]
  [0.03448276]
  [0.        ]
  [0.32183908]
  [0.24418605]
  [0.40697674]
  [0.52325581]
  [0.25581395]]

 [[0.36781609]
  [0.18390805]
  [0.03448276]
  [0.        ]
  [0.32183908]
  [0.25287356]
  [0.40697674]
  [0.52325581]
  [0.25581395]
  [0.38372093]]

 [[0.18390805]
  [0.03448276]
  [0.        ]
  [0.32183908]
  [0.25287356]
  [0.4137931 ]
  [0.52325581]
  [0.25581395]
  [0.38372093]
  [0.30232558]]

 [[0.03448276]
  [0.        ]
  [0.32183908]
  [0.25287356]
  [0.4137931 ]
  [0.52873563]
  [0.25581395]
  [0.38372093]
  [0.30232558]
  [0.53488372]]] 

X_test sample values:
[[[0.36781609]
  [0.43678161]
  [0.34482759]
  [0.45977011]
  [0.45977011]
  [0.40229885]
  [0.39534884]
  [0.37209302]
  [0.3372093 ]
  [0.62790698]]

 [[0.43678161]
  [0.34482759]
  [0.459770

---

## Build and Train the LSTM RNN

In this section, you will design a custom LSTM RNN and fit (train) it using the training data.

We will:
1. Define the model architecture

2. Compile the model

3. Fit the model to the training data



## Importing the Keras Modules

In [53]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

##  Defining the LSTM RNN Model Structure

In [95]:
model = Sequential()

number_units = 2
dropout_fraction = 0.2

# Layer 1
model.add(LSTM(
    units=number_units, return_sequences=True,
    input_shape=(X_train.shape[1], 1)))
model.add(Dropout(dropout_fraction))

# Layer 2
model.add(LSTM(units=number_units, return_sequences=True))
model.add(Dropout(dropout_fraction))

# Layer 3
model.add(LSTM(units=number_units))
model.add(Dropout(dropout_fraction))

# Output layer
model.add(Dense(1))

##  Compiling the LSTM RNN Model

In [96]:
# Compile the model
model.compile(optimizer="adam", loss="mean_squared_error")

In [97]:
# Summarize the model
model.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_21 (LSTM)               (None, 10, 2)             32        
_________________________________________________________________
dropout_21 (Dropout)         (None, 10, 2)             0         
_________________________________________________________________
lstm_22 (LSTM)               (None, 10, 2)             40        
_________________________________________________________________
dropout_22 (Dropout)         (None, 10, 2)             0         
_________________________________________________________________
lstm_23 (LSTM)               (None, 2)                 40        
_________________________________________________________________
dropout_23 (Dropout)         (None, 2)                 0         
_________________________________________________________________
dense_7 (Dense)              (None, 1)                

## Training the Model

In [98]:
model.fit(X_train, y_train, epochs=10, shuffle=False, batch_size=1, verbose=1)

Train on 371 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1a5901e048>

---

## Model Performance

In this section, you will evaluate the model using the test data. 

We will:
1. Evaluate the model using the `X_test` and `y_test` data.
2. Use the X_test data to make predictions
3. Create a DataFrame of Real (y_test) vs predicted values. 
4. Plot the Real vs predicted values as a line chart

## Evaluate the Model

In [99]:
model.evaluate(X_test, y_test)



0.09338335664942861

##  Making Predictions

In [100]:
predicted = model.predict(X_test)

In [101]:
# Recover the original prices instead of the scaled version
predicted_prices = scaler.inverse_transform(predicted)
real_prices = scaler.inverse_transform(y_test.reshape(-1, 1))

## Plotting Predicted Vs. Real Prices

In [102]:
# Create a DataFrame of Real and Predicted values
stocks = pd.DataFrame({
    "Real": real_prices.ravel(),
    "Predicted": predicted_prices.ravel()
})
stocks.head()

Unnamed: 0,Real,Predicted
0,3924.23999,4826.541504
1,3974.050049,4880.321777
2,3937.040039,4957.229004
3,3983.530029,5083.90332
4,4149.089844,5192.344238


In [103]:
# Plot the real vs predicted values as a line chart
stocks.head(100).hvplot()