# JPMC QR Mentorship - Case Study 1
## Khushmeet Chandi

This notebook is specific to NVIDIA Corporation ("NVDA") between January 1, 2020, to January 1, 2024. The stock ticker symbol and date range can be changed. This notebook serves to provide a framework to begin analyzing stocks and take a first attempt at answering the guidance questions. 

In [None]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data

In [None]:
stock_ticker_symbol = "NVDA" # NVIDIA Corporation ("NVDA")

nvda = yf.Ticker(stock_ticker_symbol)

# get stock info
nvda.info

In [None]:
# get historical market data
nvda.history(period="max")

In [None]:
# show actions (dividends, splits)
nvda.actions

In [None]:
# show dividends
nvda.dividends

In [None]:
# show splits
nvda.splits

### 1.) What was the change in price of the stock over time?

In [None]:
# Fetches historical stock data for "NVDA"from January 1, 2020, to January 1, 2024
stock_data = yf.download(stock_ticker_symbol, start="2020-01-01", end="2024-01-01")

In [None]:
# Calculate daily change in Closing price
stock_data['Daily Change'] = stock_data['Close'].diff()
# Plotting the change in Closing price over time
plt.figure(figsize=(14, 7))
plt.plot(stock_data.index, stock_data['Close'], label='Close Price', color='blue')
plt.title('Change in Stock Price over Time for {}'.format(stock_ticker_symbol))
plt.xlabel('Date')
plt.ylabel('Stock Price ($)')
plt.legend()
plt.grid(True)
plt.show()

### 2) What was the daily return of the stock on average?

In [None]:
# Calculating the average daily return of the stock 
stock_data['Daily Return'] = stock_data['Close'].pct_change() * 100
average_daily_return = stock_data['Daily Return'].mean()

In [None]:
# Plotting the average daily return of the stock 
plt.figure(figsize=(10, 6))
plt.hist(stock_data['Daily Return'].dropna(), bins=50, edgecolor='black', alpha=0.7)
plt.title('Daily Returns Distribution for {}'.format(stock_ticker_symbol))
plt.xlabel('Daily Return (%)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

This visualization reperesents the distribution of daily returns using a histogram to understand the volatility and frequency of different return values.

### 3) What was the moving average of the various stocks?

In [None]:
# Calculate moving averages
stock_data['MA_20'] = stock_data['Close'].rolling(window=20).mean()
stock_data['MA_50'] = stock_data['Close'].rolling(window=50).mean()

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(stock_data.index, stock_data['Close'], label='Close Price', color='blue')
plt.plot(stock_data.index, stock_data['MA_20'], label='20-Day Moving Avg', color='red')
plt.plot(stock_data.index, stock_data['MA_50'], label='50-Day Moving Avg', color='green')
plt.title('Moving Averages for {}'.format(stock_ticker_symbol))
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True)
plt.show()

Calculate various moving averages (e.g., 20-day moving average, 50-day moving average) to smooth out fluctuations and observe trends in the stock prices.


### 5) How much value do we put at risk by investing in a particular stock

To estimate how much value is at risk by investing in a particular stock, we can calculate the volatility or standard deviation of its daily returns. Volatility measures the variability of a stock's price over time and is a common metric used to assess risk in financial markets. 


In [None]:
volatility = stock_data['Daily Return'].std()

print("Volatility (Risk) for {}: {:.2f}%".format(stock_ticker_symbol, volatility * 100))

The volatility calculated represents the daily risk associated with the stock. Higher volatility indicates greater fluctuations in the stock price, implying higher risk. 

A volatility of 341.60% indicates that on average, the daily returns of NVDA fluctuate widely. This means that NVDA's stock price can experience significant swings on a daily basis. In this case, the high volatility suggests that there could be substantial gains or losses in short periods. Stocks with high volatility like NVDA may attract traders and speculators looking to capitalize on short-term price movements. However, for long-term investors, high volatility may signal higher uncertainty and potential for larger losses if not managed properly. It's also important to perform a comparative analysis, which means comparing this volatility figure across different stocks and sectors, and to integrate historical context. 

### 6) How can we attempt to predict future stock behavior? Specifically, how can we predict the closing price stock price of NVIDIA using LSTM?

In [None]:
# Extract the closing prices
timeseries = stock_data['Close'].values.astype('float32').reshape(-1, 1)

In [None]:
# train-test split for time series
train_size = int(len(timeseries) * 0.67)
test_size = len(timeseries) - train_size
train, test = timeseries[:train_size], timeseries[train_size:]

In [None]:
def create_dataset(dataset, lookback):
    """Transform a time series into a prediction dataset
    
    Args:
        dataset: A numpy array of time series, first dimension is the time steps
        lookback: Size of window for prediction
    """
    X, y = [], []
    for i in range(len(dataset)-lookback):
        feature = dataset[i:i+lookback]
        target = dataset[i+lookback]
        X.append(feature)
        y.append(target)
    return torch.tensor(X), torch.tensor(y)

In [None]:
# Lookbook is the number of days (ex: 60) used to predict the next day
lookback = 60  
X_train, y_train = create_dataset(train, lookback=lookback)
X_test, y_test = create_dataset(test, lookback=lookback)

In [None]:
# Reshape input data to match LSTM expected input
# If batch_first=True: (batch_size, sequence_length, input_size)
# If batch_first=False (default): (sequence_length, batch_size, input_size)
X_train = X_train.view(-1, lookback, 1)  # Reshape to (batch_size, sequence_length, input_size)
X_test = X_test.view(-1, lookback, 1)

In [None]:
# Create DataLoader for batch processing
train_loader = data.DataLoader(data.TensorDataset(X_train, y_train), shuffle=True, batch_size=8)

In [None]:
class StockLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super().__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.linear = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.linear(out[:, -1, :])
        return out

In [None]:
# Instantiate the model, optimizer, and loss function
input_size = 1
hidden_size = 50
output_size = 1
num_layers = 1

model = StockLSTM(input_size, hidden_size, output_size, num_layers)
optimizer = optim.Adam(model.parameters())
criterion = nn.MSELoss()

# Train the model
n_epochs = 100
for epoch in range(n_epochs):
    model.train()
    epoch_loss = 0.0
    for inputs, targets in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, targets.unsqueeze(-1))
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    
    if epoch % 10 == 0:
        print(f'Epoch [{epoch+1}/{n_epochs}], Loss: {epoch_loss/len(train_loader):.6f}')

print('Finished Training')

In [None]:
# Evaluate the model and make predictions
model.eval()
with torch.no_grad():
    train_pred = model(X_train).numpy()
    test_pred = model(X_test).numpy()

In [None]:
# Plot predictions
plt.figure(figsize=(14, 7))
plt.plot(np.arange(lookback, len(train_pred) + lookback), y_train.numpy(), label='Actual Prices (Train)', color='blue')
plt.plot(np.arange(len(train_pred) + lookback, len(train_pred) + len(test_pred) + lookback),
         y_test.numpy(), label='Actual Prices (Test)', color='green')
plt.plot(np.arange(lookback, len(train_pred) + lookback), train_pred, label='Predicted Prices (Train)', color='red')
plt.plot(np.arange(len(train_pred) + lookback, len(train_pred) + len(test_pred) + lookback),
         test_pred, label='Predicted Prices (Test)', color='orange')
plt.title('Stock Price Prediction for ' +  stock_ticker_symbol + ' using LSTM')
plt.xlabel('Time')
plt.ylabel('Stock Price ($)')
plt.legend()
plt.grid(True)
plt.show()


### 7) Suppose we have a derivative maturing in 5 trading days, with payoff function f(x) where f is some given function and x is the closing price at maturity date. How could we use/modify our model to estimate its payoff ?

Use the trained LSTM model to predict the closing price x for the maturity date, which is 5 trading days from the last available date in your historical data. To calculate the payoff, we can then use the predicted closing price x as input to the payoff function f(x) to calculate the derivative's payoff. 

### 8) How can you make Neural Network model (e.g. LSTM) more interpretable?

- We can use feature importance to determine which features have the biggest impact on predictions. 
- We can also look at how the model focuses on different parts of the input data. For example, some time periods, such as the COVID-19 pandemic, might have a different impact on the model. 
- We can experiment with changing settings, such as different parameters, to see how they effect the predictions. 
- We can perform residual analysis to pinpoint areas where the model can improve. 
- We can also take a closer at the data to pinpoint any anomalies in the time series that might be affecting the LSTM. 


To make the model interpretable for different stakeholders, we can include clear documentation and visualization tools that explain the different parts of the model, moving away from a black-box interpretation and providing clear background information. 

### Sources:
- https://www.investopedia.com/terms/v/volatility.asp
- https://aroussi.com/post/python-yahoo-finance
- https://machinelearningmastery.com/lstm-for-time-series-prediction-in-pytorch/