<a href="https://www.kaggle.com/code/hkaragah/meta-stock-price?scriptVersionId=191834991" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Meta Stock Price Prediction
The objective of this notebook is to:
* Load and preprocess the meta stock price data series
* Train a deep model using DNN, CNN, and LSTM units using portion of the data
* Utilize the trained model to predict the tail portion of the series (which has not used for training)

### 1. Import Data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
file_path = '/kaggle/input/meta-stock-price-dataset/Meta Dataset.csv'

df = pd.read_csv(file_path, index_col='Date')
display(df)

### 2. Explore Data

In [None]:
# Check the range of the values in each column
df.describe()

All columns, except the Volume, are in the same order of magnitude. The volume column is in the order of 10^6 to 10^8

In [None]:
# Check the data type of each column and whether there is any null (NaN)
df.info()

In [None]:
df.count()

In [None]:
df.isna().any()

In [None]:
df.dtypes

In [None]:
# Check the data type of the index column
df.index.dtype

Data type 'O' is any Python object.

In [None]:
# Convert index to 'datetime' data type
df.index = pd.to_datetime(df.index)
df.index.dtype

In [None]:
fig, axes = plt.subplots(nrows=len(df.columns), ncols=1, figsize=(10, 2 * len(df.columns)), sharex=True)

for col, ax in zip(df.columns, axes):
    df[col].plot(ax=ax, title=col)
    ax.set_xlabel('Date')
    ax.set_ylabel('Values')

# Adjust layout to prevent overlap
plt.tight_layout()

# Display the plot
plt.show()

All column values, except the Volume, demonstrate similar trends over time.

### 3. Create Dataset
Let's use 'High' values to create a dataset.<br>
*Disclaimer: most of the code in this section is copied over from course 4 in ref. [1].*

In [None]:
highs = np.array(df['High']) # numpy.ndarray of shape (3028,)
time = np.arange(len(highs)) # numpy.ndarray of shape (3028,)

In [None]:
print(f'First 5 values of series: {highs[:5]}')
print(f'Last 5 values of series: {highs[-5:]}')
print(f'Length of the time series: {len(highs)}')

#### 3.1. Split the Dataset

In [None]:
split_time = 2500

time_train = time[:split_time] # numpy.ndarray of shape (2500,)
X_train = highs[:split_time] # numpy.ndarray of shape (2500,)

time_valid = time[split_time:] # numpy.ndarray of shape (2500,)
X_valid = highs[split_time:] # numpy.ndarray of shape (2500,)

#### 3.2. Prepare Features and Labels

In [None]:
def windowed_dataset(series, window_size, batch_size, shuffle_buffer):
    """Generates dataset windows

    Args:
      series (array of float) - contains the values of the time series
      window_size (int) - the number of time steps to include in the feature
      batch_size (int) - the batch size
      shuffle_buffer(int) - buffer size to use for the shuffle method

    Returns:
      dataset (TF Dataset) - TF Dataset containing time windows
    """
  
    explore = {} # Store converted data in each for further exploration
    
    # Generate a TF Dataset from the series values
    dataset = tf.data.Dataset.from_tensor_slices(series) # Each value in series will be a Tensor
    explore['from_tensor_slices'] = dataset
          
    # Window the data but only take those with the specified size
    dataset = dataset.window(window_size + 1, shift=1, drop_remainder=True)
    explore['window'] = dataset
    
    # Flatten the windows by putting its elements in a single batch
    dataset = dataset.flat_map(lambda window: window.batch(window_size + 1))
    explore['flat_map'] = dataset

    # Create tuples with features and labels 
    dataset = dataset.map(lambda window: (window[:-1], window[-1]))
    explore['map'] = dataset

    # Shuffle the windows
    dataset = dataset.shuffle(shuffle_buffer)
    explore['shuffle'] = dataset

    # Create batches of windows
    dataset = dataset.batch(batch_size).prefetch(1)
    explore['batch'] = dataset

    return dataset, explore

In [None]:
window_size = 120
batch_size = 10
shuffle_buffer_size = 1000

train_set, explore = windowed_dataset(X_train, window_size, batch_size, shuffle_buffer_size)

Let's break down the steps and see how "windowed_dataset" method transforms "X_train" to "train_set".

In [None]:
# Explore slices
slices = list(explore['from_tensor_slices'].as_numpy_iterator())
print(f"Total no. of slices: {len(slices)}\n")

print("First 5 slices:")
for count, element in enumerate(explore['from_tensor_slices'].take(5), start=1):
    print(f"Slice #{count}: {element.numpy()} of type {type(element)}")

X_train is an numpy.ndarray of shape (2500,). The `tf.data.Dataset.from_tensor_slices(series)` transforms it to 2500 individual slice of type _tf.Tensor_ (`tensorflow.python.framework.ops.EagerTensor`).

In [None]:
# Explore windows
for count, window in enumerate(explore['window'], start=1):
    if count in [1,2,3,4,5]:
        print(f"First 5 elements of window #{count}: {list(window.as_numpy_iterator())[:5]}")
        if count==1: window_len = len(list(window.as_numpy_iterator()))
        
print(f"\nTotal no. of windows: {count}")
print(f"Window length: {window_len}")


The first five elements of the first 5 windows are printed above. Node that we used `shift=1` and by default `stride=1`. This measn to create a new set, the window is shifted one element. Furthermore, the `stride=1` determines the stride between input elements within a windows. See how the first element in each window is the second element of the previous window. This is because we set`shift=1`. Also, the total number of windows assuming `shift=1` is the total of slices minus the window size (2500-120=2380).<br>
Note that the length of each window is 121. This is because we used `window_size + 1` to include the target in each window. Keep that in mind that the goal is use 120 values to predict the next one (i.e., 121st element's value).

In [None]:
#Explore flat_map
for count, batch in enumerate(explore['flat_map'].as_numpy_iterator(), start=1):
    if count in [1,2,3]:
        print(f"Batch #{count}: shape={batch.shape}")

Here are some highlights:

* __.batch(...)__: The batch method in `lambda window: window.batch(window_size + 1)` groups consecutive elements int batches of specified size (window_size + 1).

* __.flat_map(...)__: The flat_map method is useful to expand each element into multiple elements or datasets. It allows us to transform each element of the dataset into another dataset and then flatten the results into a single dataset.

In [None]:
# Explore map
for count, batch in enumerate(explore['map'].as_numpy_iterator(), start=1):
    if count in [1,2,3,121,122,123]:
        if count==121: print("...")
        print(f"Batch #{count}: features[0:5]={batch[0][:5]}, target={batch[1]}")
            

The `.map(...)` applies "mapfunction" to each element of the dataset and return a new dataset containing the transformed elements in the same order as they appeared in the input.<br>
The "map function" here is `lambda window: (window[:-1], window[-1])`. It transforms "window" with the shape of (121,) to a tuple((120,), (1,)) to generate the feature-target pair for each batch. Remember, out `window_size` is set to 120. When we created windows, we used `window_size + 1` to include the target to batch.

#### 3.3. Build the Model

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Conv1D(filters=64, kernel_size=3,
                      strides=1,
                      activation="relu",
                      padding='causal',
                      input_shape=[window_size, 1]),
  tf.keras.layers.LSTM(64, return_sequences=True),
  tf.keras.layers.LSTM(64),
  tf.keras.layers.Dense(30, activation="relu"),
  tf.keras.layers.Dense(10, activation="relu"),
  tf.keras.layers.Dense(1),
  tf.keras.layers.Lambda(lambda x: x * 400)
])

 # Print the model summary 
model.summary()

#### 3.4. Tune the Learning Rate

In [None]:
init_weights = model.get_weights()

In [None]:
# Set the learning rate scheduler
lr_schedule = tf.keras.callbacks.LearningRateScheduler(
    lambda epoch: 1e-8 * 10**(epoch / 20))

# Initialize the optimizer
optimizer = tf.keras.optimizers.SGD(momentum=0.9)

# Set the training parameters
model.compile(loss=tf.keras.losses.Huber(), optimizer=optimizer)

# Train the model
history = model.fit(train_set, epochs=100, callbacks=[lr_schedule])

In [None]:
# Define the learning rate array
lrs = 1e-8 * (10 ** (np.arange(100) / 20))

# Set the figure size
plt.figure(figsize=(10, 6))

# Set the grid
plt.grid(True)

# Plot the loss in log scale
plt.semilogx(lrs, history.history["loss"])

# Increase the tickmarks size
plt.tick_params('both', length=10, width=1, which='both')

# Set the plot boundaries
plt.axis([1e-8, 1e-3, 0, 100])

Based on the results shown on the graph, I choose $2 \times 10^{-7}$ for training the model.

#### 3.5. Train the Model
Before starting the training with the best learning rate, we need to reset the trained weights to their pre-trained state.

In [None]:
# Reset states generated by Keras
tf.keras.backend.clear_session()

# Reset the weights
model.set_weights(init_weights)

In [None]:
# Set the learning rate
learning_rate = 2e-7

# Set the optimizer 
optimizer = tf.keras.optimizers.SGD(learning_rate=learning_rate, momentum=0.9)

# Set the training parameters
model.compile(loss=tf.keras.losses.Huber(),
              optimizer=optimizer,
              metrics=["mae"])

In [None]:
# Train the model
history = model.fit(train_set,epochs=100)

In [None]:
def plot_series(x, y, format="-", start=0, end=None, 
                title=None, xlabel=None, ylabel=None, legend=None ):
    """
    Visualizes time series data

    Args:
      x (array of int) - contains values for the x-axis
      y (array of int or tuple of arrays) - contains the values for the y-axis
      format (string) - line style when plotting the graph
      start (int) - first time step to plot
      end (int) - last time step to plot
      title (string) - title of the plot
      xlabel (string) - label for the x-axis
      ylabel (string) - label for the y-axis
      legend (list of strings) - legend for the plot
    """

    # Setup dimensions of the graph figure
    plt.figure(figsize=(10, 6))
    
    # Check if there are more than two series to plot
    if type(y) is tuple:

      # Loop over the y elements
      for y_curr in y:

        # Plot the x and current y values
        plt.plot(x[start:end], y_curr[start:end], format)

    else:
      # Plot the x and y values
      plt.plot(x[start:end], y[start:end], format)

    # Label the x-axis
    plt.xlabel(xlabel)

    # Label the y-axis
    plt.ylabel(ylabel)

    # Set the legend
    if legend:
      plt.legend(legend)

    # Set the title
    plt.title(title)

    # Overlay a grid on the graph
    plt.grid(True)

    # Draw the graph on screen
    plt.show()

In [None]:
# Get mae and loss from history log
mae=history.history['mae']
loss=history.history['loss']

# Get number of epochs
epochs=range(len(loss)) 

# Plot mae and loss
plot_series(
    x=epochs, 
    y=(mae, loss), 
    title='MAE and Loss', 
    xlabel='MAE',
    ylabel='Loss',
    legend=['MAE', 'Loss']
    )

# Only plot the last 80% of the epochs
zoom_split = int(epochs[-1] * 0.2)
epochs_zoom = epochs[zoom_split:]
mae_zoom = mae[zoom_split:]
loss_zoom = loss[zoom_split:]

# Plot zoomed mae and loss
plot_series(
    x=epochs_zoom, 
    y=(mae_zoom, loss_zoom), 
    title='MAE and Loss', 
    xlabel='MAE',
    ylabel='Loss',
    legend=['MAE', 'Loss']
    )

#### 3.6. Model Prediction

In [None]:
def model_forecast(model, series, window_size, batch_size):
    """Uses an input model to generate predictions on data windows

    Args:
      model (TF Keras Model) - model that accepts data windows
      series (array of float) - contains the values of the time series
      window_size (int) - the number of time steps to include in the window
      batch_size (int) - the batch size

    Returns:
      forecast (numpy array) - array containing predictions
    """

    # Generate a TF Dataset from the series values
    dataset = tf.data.Dataset.from_tensor_slices(series)

    # Window the data but only take those with the specified size
    dataset = dataset.window(window_size, shift=1, drop_remainder=True)

    # Flatten the windows by putting its elements in a single batch
    dataset = dataset.flat_map(lambda w: w.batch(window_size))
    
    # Create batches of windows
    dataset = dataset.batch(batch_size).prefetch(1)
    
    # Get predictions on the entire dataset
    forecast = model.predict(dataset)
    
    return forecast

In [None]:
# Reduce the original series
forecast_series = highs[split_time-window_size:-1]

# Use helper function to generate predictions
forecast = model_forecast(model, forecast_series, window_size, batch_size)

# Drop single dimensional axis
results = forecast.squeeze()

# Plot the results
plot_series(time_valid, (X_valid, results))

The prediction (after the timestamp of 2500) is obviousely off, and the model couldn't predict the sudden rise the in the high value. This could be due the fact that the model never saw such a sharp rise in the training set.<br>
Nonetheless, the model successfully predicted the dip that happened shortly after 3000 timestamp.

### Reference
[ 1. Tensorflow Developer Professional Certificate by DeepLearning.AI](https://www.coursera.org/professional-certificates/tensorflow-in-practice)