# Stock Prediction Capstone Project

In this project, you will work on predicting stock prices using Long Short-Term Memory (LSTM) networks implemented in PyTorch. This project will guide you through the steps of preparing the data, building the LSTM model, training the model, and making predictions.

## Note/Disclaimer:
Before starting this project, it is highly recommended to read up on LSTM networks and understand how they work on a high level and how they're generally implemented with PyTorch. This will help you grasp the concepts better as you work through the project. Here are some resources to get you started:
- [Video to Help Understand LSTMs](https://www.youtube.com/watch?v=YCzL96nL7j0)
- [Resource for Implementing LSTMs for Time Series Prediction with PyTorch](https://machinelearningmastery.com/lstm-for-time-series-prediction-in-pytorch/)
- [Resource for Implementing LSTMs with PyTorch](https://wandb.ai/sauravmaheshkar/LSTM-PyTorch/reports/Using-LSTM-in-PyTorch-A-Tutorial-With-Examples--VmlldzoxMDA2NTA5)
- [PyTorch Documentation for Implementing LSTMs](https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html)


## Project Steps Overview:
1. **Data Preprocessing:** Load and preprocess the stock price data.
2. **Model Building:** Define the LSTM model architecture using PyTorch.
3. **Model Training:** Train the LSTM model with the prepared data.
4. **Prediction:** Use the trained model to make stock price predictions.

Let's get started! Provided below is some starter code to get you set up with the data and packages/imports.



1. **Importing Libraries**: First, we import the necessary libraries for data manipulation, visualization, scaling, and model building.
   - `numpy` and `pandas` are used for data manipulation.
   - `matplotlib.pyplot` is used for data visualization.
   - `MinMaxScaler` from `sklearn.preprocessing` is used for scaling the data.
   - `torch` and `torch.nn` are used for building and training the LSTM model.
   - `yfinance` is used to download stock data.
   - `datetime` is used for handling date and time.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
import torch
import torch.nn as nn
import yfinance as yf
from datetime import datetime


2. **Downloading Stock Data**: We define a list of stock symbols and specify the date range (last one year). We then download the stock data for each symbol using the `yfinance` library and concatenate the data into a single DataFrame.


In [None]:
# Download stock data
stocks_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)

data_frames = []
for stock in stocks_list:
    df = yf.download(stock, start, end)
    df['Symbol'] = stock
    data_frames.append(df)

df = pd.concat(data_frames)
df = df.sort_index()


3. **Scaling Data**: We use `MinMaxScaler` to scale the 'Close' prices to a range between -1 and 1. This step is crucial for training the LSTM model as it helps in faster convergence.


In [None]:
# Scaling both features and target
scaler = MinMaxScaler(feature_range=(-1, 1))
df['Close'] = scaler.fit_transform(df['Close'].values.reshape(-1, 1))


4. **Preparing Data for LSTM**:

We will now define the sequence length and create sequences of the scaled 'Close' prices. We need to prepare the data in a way that is suitable for the LSTM model to process. This involves creating sequences of data that the LSTM can learn from. Here's how we do it: The `create_sequences` function generates input-output pairs where each input sequence of length `seq_length` corresponds to a single output value (the next value in the sequence). This preparation ensures that the LSTM model has the necessary historical data to learn from and predict future stock prices.

We chose a sequence length of **50** so that the model has enough historical context to read patterns and the sequence length is neither too long or too short. Play around with this hyperparameter and compare the results!

In [None]:
# Prepare data for LSTM
sequence_length = 50  # Length of the sequence for LSTM
data = df['Close'].values

def create_sequences(data, seq_length):
    xs = []
    ys = []
    for i in range(len(data) - seq_length):
        x = data[i:i+seq_length]
        y = data[i+seq_length]
        xs.append(x)
        ys.append(y)
    return np.array(xs), np.array(ys)

x_data, y_data = create_sequences(data, sequence_length)


# Data Preprocessing
In this step, you need to split the data into training and test sets and convert them to PyTorch tensors. Follow these steps:
1. **Split the data into training and test sets.** Use a 70-30 split for training and test data (since x_data and y_data are NumPy arrays, you can do this manually through array slicing).
2. **Expand the dimensions of the data.** This step ensures that the data has the right shape for the LSTM model. For example, you can do this for your x_train set by doing: x_train = np.expand_dims(x_train, axis=-1).
3. **Convert the data to PyTorch tensors.** Use `torch.from_numpy` to convert the NumPy arrays to PyTorch tensors (also be sure to call the `float()` function on the tensors). Hint: if needed, for the y_train and y_test, call `view(-1,1)` on the tensors to properly reshape them.
    


# Model Building
Define the LSTM model architecture using PyTorch:
1. **Import the necessary PyTorch libraries.** Import `torch`, `torch.nn`, and other relevant modules.
2. **Define the LSTM model class.** Create a class that inherits from `nn.Module` and define the LSTM layers and the forward pass.
3. **Initialize your hyperparameters (ex. input_dim, hidden_dim, num_layers, output_dim, num_epochs, learning rate, etc.), initialize the model, and define the loss function and the optimizer.** Use `nn.MSELoss` for the loss function and `torch.optim.Adam` or any other optimizer of your choice.
    


# Model Training
Train the LSTM model with the prepared data:
1. **Implement the training loop.** Iterate over the training data, perform forward and backward passes, and update the model weights.
2. **Use a loss function to track training performance.**
    


# Prediction
Use the trained model to make stock price predictions:
1. **Implement the prediction function.** Use the trained model to generate predictions on the test data.
2. **Evaluate the model's performance on the test data.** Calculate metrics like MSE or MAE to assess the model's performance (if you're using PyTorch's `nn.MSELoss` function, you can use the `item()` function to get the MSE).
3. **Visualize the predicted vs actual stock prices.** Plot the predicted and actual prices to visually inspect the model's performance.
    