# RNN Activity: Predicting the stock market
![](https://imgs.xkcd.com/comics/marketwatch.png)

Note: I do not recommend making any kind of financial decisions based on RNNs.

## Setup
To fetch the data, we'll need the `yfinance` package. Activate your virtual environment, then run `pip install yfinance`.

In [None]:
import yfinance as yf
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
import numpy as np

In [None]:
# feel free to play around with different ticker values
ticker = "BTC-USD"
description = "Bitcoin"

# get the data for the last 10 years. This is daily data by default, so it's not actually that much.
data = yf.download(ticker, period="10y")

import matplotlib.pyplot as plt

plt.plot(data['Close'])
plt.ylabel(f'{description} Closing Price (USD)')
plt.xticks(rotation=45)
plt.show()

data.head()

## Split the data into training and testing sets
Pandas is pretty smart with dates, so we can use the date as an index. Let's keep 2024 as our test set and choose 2023 as validation.

In [None]:
train_val = '2022-12-31'
val_test = '2023-12-31'

train = data['Close'][:train_val].values
val = data['Close'][train_val:val_test].values
test = data['Close'][val_test:].values

In [None]:
# And prepare for RNN
window = 7

# try using the last 7 days as input to predict the next day
def to_ds(data, input_width, batch_size=32):
    input_data = data[:-input_width]
    targets = data[input_width:]
    return tf.keras.utils.timeseries_dataset_from_array(
        input_data, 
        targets, 
        sequence_length=input_width,
        batch_size=batch_size)

train_ds = to_ds(train, window)
val_ds = to_ds(val, window)
test_ds = to_ds(test, window)

## SimpleRNN
In the cell below, define a [SimpleRNN](https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN) layer and try to get it to behave. I recommend starting with a single layer and relatively few nodes to start.

In [None]:
# Inputs should be more or less in the +- 1 range rather than 10e5
norm = tf.keras.layers.Normalization(axis=None)
norm.adapt(train)

simple_RNN = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(None,1)),
    norm,
    # TODO: SimpleRNN layer here
    tf.keras.layers.Dense(1),
])

metrics = [
    tf.keras.metrics.MeanAbsoluteError(),
]

simple_RNN.compile(optimizer='adam', loss='mse', metrics=metrics)
history = simple_RNN.fit(train_ds, validation_data=val_ds, epochs=100)

In [None]:
# plot the training curve
pd.DataFrame(history.history)[['mean_absolute_error', 'val_mean_absolute_error']].plot()
plt.ylim(0, 2000)

In [None]:
# define the true vector (target)
truth = val[window:-window+1]

# Predict on the validation set and see how well it does
predictions = simple_RNN.predict(val_ds)
r_mae = np.abs(predictions - truth).mean()

# also define the naive prediction - today is the same as yesterday
naive = val[window-1:-window]
n_mae = np.abs(naive - truth).mean()

# grab the time axis
t = data["Close"][train_val:val_test].index
t = t[window:-window+1]

plt.plot(t, truth, label='True')
plt.plot(t, naive, label=f"Naive (MAE = {n_mae:.1f})")
plt.plot(t, predictions.flatten(), label=f'RNN (MAE = {r_mae:.1f})')
plt.legend()

## Exercises and Questions
1. Can you improve the model? Try playing around with layers, units, activation functions, etc
2. Why does it look okay when you overlay the naive model, but the MAE is so high?
3. Can you modify the model to try to predict the **direction** of the stock instead of the price (i.e. up or down)?
4. Can you modify the model to make it predict X days in the future instead of just 1?