Apple Stock Analysis:

---

The provided dataset contains historical stock data for Apple Inc. from the past year. It features daily records with essential attributes such as Date, Open, High, Low, Close, and Volume. For this task, your primary focus should be on the Close price, which indicates the stock’s final trading price at the end of each day. The time-series nature of the dataset makes it ideal for sequential modeling, and its numerical structure is well-suited for regression analysis.

Problem Statement / Objective :

The primary objective is to develop a predictive model capable of accurately forecasting Apple Inc.’s stock prices for the next 10 days. Due to the complexity of stock market data, which can display both short-term sequential dependencies and broader trends, relying on a single model type may be insufficient. The challenge is to effectively capture these intricate patterns. Therefore, a hybrid modeling approach is required, combining LSTM (to capture time dependencies and short-term trends) and Linear Regression (to capture long-term trends).

In [60]:
# Importing Libraries 

import pandas as pd 

file_path = "D:/VS/Data Science Projects/Hybrid Model/apple_stock_data.csv" 
df = pd.read_csv(file_path) 
print(df.head())

                        Date   Adj Close       Close        High         Low  \
0  2023-11-02 00:00:00+00:00  176.665985  177.570007  177.779999  175.460007   
1  2023-11-03 00:00:00+00:00  175.750671  176.649994  176.820007  173.350006   
2  2023-11-06 00:00:00+00:00  178.317520  179.229996  179.429993  176.210007   
3  2023-11-07 00:00:00+00:00  180.894333  181.820007  182.440002  178.970001   
4  2023-11-08 00:00:00+00:00  181.958893  182.889999  183.449997  181.589996   

         Open    Volume  
0  175.520004  77334800  
1  174.240005  79763700  
2  176.380005  63841300  
3  179.179993  70530000  
4  182.350006  49340300  


In [61]:
df.dtypes # Here we can see that Date is in object format, we need to change it into datetime format 

Date          object
Adj Close    float64
Close        float64
High         float64
Low          float64
Open         float64
Volume         int64
dtype: object

In [62]:
df['Date'] = pd.to_datetime(df['Date']) # Changing Date into datetime format 
df.set_index('Date', inplace=True) # Setting Date as index
data = df[['Close']] # Selecting Close column as data, because we are going to predict Close column. 

LSTM & LR :

---

We will be utilizing LSTM (Long Short-Term Memory) and Linear Regression models for this task. LSTM is chosen because it effectively captures sequential dependencies and patterns in time-series data, making it suitable for modeling stock price movements influenced by historical trends.

Linear Regression, on the other hand, is a straightforward model that captures simple linear relationships and long-term trends in data. By combining these two models into a hybrid approach, we leverage the LSTM’s ability to model complex time-dependent patterns alongside the Linear Regression’s ability to identify and follow broader trends. This combination aims to create a more balanced and accurate prediction system.

Standardize / Scaling:

---

Scale the 'Close' price data between 0 and 1 using MinMaxScaler to ensure compatibility with the LSTM model and to standardize our data on a unified scale.

In [63]:
from sklearn.preprocessing import MinMaxScaler 
scaler  = MinMaxScaler(feature_range = (0,1)) # Scaling data between 0 and 1  

data['Close'] = scaler.fit_transform(data[["Close"]]) # Scaling Close column which is our target column 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Close'] = scaler.fit_transform(data[["Close"]]) # Scaling Close column which is our target column


In [64]:
# let’s prepare the data for LSTM by creating sequences of a defined length (e.g., 60 days) to predict the next day’s price:

import numpy as np 
def sequence(data, seq_length = 60):
    x = []
    y = [] 
    
    for i in range(len(data) - seq_length):
        x.append(data[i:i + seq_length])
        y.append(data[i + seq_length]) 
    return np.array(x), np.array(y)

seq_length = 60
x, y = sequence(data['Close'].values, seq_length)

In [65]:
# Splitting data into training and testing data eg. 80 % of training data and 20 % of testing data. 

train_size = int(len(x) * 0.8)
X_train, X_test = x[:train_size], x[train_size:]
Y_train, Y_test = y[:train_size], y[train_size:]

In [66]:
# Now, we will build a sequential LSTM model with layers to capture the temporal dependencies in the data
from tensorflow.keras.models import Sequential 
from tensorflow.keras.layers import LSTM, Dense

model = Sequential()
model.add(LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1))) 
model.add(LSTM(units = 50))
model.add(Dense(1)) 

  super().__init__(**kwargs)


In [67]:
# we will compile the model using an appropriate optimizer and loss function, and fit it into the training data 
model.compile(optimizer = 'adam', loss = 'mean_squared_error') 
model.fit(X_train, Y_train, epochs = 20, batch_size = 32)  

# epochs = 20 means model will run 20 times on the dataset.
# batch_size = 32 means it will change its parameter after 32 data points.  

Epoch 1/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 70ms/step - loss: 0.2989
Epoch 2/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 65ms/step - loss: 0.0881
Epoch 3/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step - loss: 0.0313
Epoch 4/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 65ms/step - loss: 0.0194
Epoch 5/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step - loss: 0.0201
Epoch 6/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 64ms/step - loss: 0.0201
Epoch 7/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 67ms/step - loss: 0.0120
Epoch 8/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 68ms/step - loss: 0.0153
Epoch 9/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 68ms/step - loss: 0.0123
Epoch 10/20
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 66ms/step - loss: 0.0106
Epoch 11/20
[1m5/5

<keras.src.callbacks.history.History at 0x23df437df40>

In [68]:
data['Lag_1'] = data['Close'].shift(1) # Creating Lag_1 column which is shifted by 1 
data['Lag_2'] = data['Close'].shift(2) # Creating Lag_2 column which is shifted by 2
data['Lag_3'] = data['Close'].shift(3) # Creating Lag_3 column which is shifted by 3     
''' '
 Lag_1': Each value is the 'Close' price from the previous day.
'Lag_2': Each value is the 'Close' price from two days prior.
'Lag_3': Each value is the 'Close' price from three days prior.
'''
data = data.dropna() # Dropping NA values   

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Lag_1'] = data['Close'].shift(1) # Creating Lag_1 column which is shifted by 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Lag_2'] = data['Close'].shift(2) # Creating Lag_2 column which is shifted by 2
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['Lag_3'] = data['Close'].shift

In [69]:
# Split the data into training and testing data 

X_LR = data[['Lag_1', 'Lag_2', 'Lag_3']] # Independent Variables 
Y_LR = data['Close'] # Dependent Variable 

X_train_LR, X_test_LR = X_LR[:train_size], X_LR[train_size:] 
Y_train_LR, Y_test_LR = Y_LR[:train_size], Y_LR[train_size:]   

In [70]:
from sklearn.linear_model import LinearRegression 

LR = LinearRegression() 
LR.fit(X_train_LR, Y_train_LR) 

In [71]:
# Making Predictions  
# This line reshapes X_test to a 3-dimensional array with the shape (samples, time_steps, features), which is the required input format for LSTM models in Keras.

X_test_lstm = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))  

lstm_predictions = model.predict(X_test_lstm) 
lstm_predictions = scaler.inverse_transform(lstm_predictions) 

# Since the model's outputs are scaled, this line applies the inverse transformation to convert the predictions back to their original scale. This step is crucial for interpreting the predictions in the context of the original data. 

[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 813ms/step


In [72]:
# Making Prediction using Linear Regression Model 

LR_Prediction = LR.predict(X_test_LR) 
LR_Prediction = scaler.inverse_transform(LR_Prediction.reshape(-1, 1)) 

In [73]:
# Check if both models predictions are with same shape or not 

print(lstm_predictions.shape)
print(LR_Prediction.shape)

(39, 1)
(96, 1)


In [74]:
# As we have seen LR_Predictions have different shape than lstm_predictions, so we need to reshape LR_Predictions to match the shape of lstm_predictions.

if lstm_predictions.shape != LR_Prediction.shape:
    
    if len(lstm_predictions.shape) == 1: 
        lstm_predictions = lstm_predictions.reshape(-1, 1) 
    if len(LR_Prediction.shape) == 1:
        LR_Prediction = LR_Prediction.reshape(-1, 1)
        
    # Now Slice the array to match the shape of both the models 
    min_length = min(len(lstm_predictions), len(LR_Prediction))
    lstm_predictions = lstm_predictions[:min_length] 
    LR_Prediction = LR_Prediction[:min_length]
    
print(lstm_predictions.shape)
print(LR_Prediction.shape)

(39, 1)
(39, 1)


In [75]:
# Creating Hybrid Predictions using weighted average of LSTM and Linear Regression Model 

hybrid_predictions = scaler.inverse_transform(0.7 * lstm_predictions) + (0.3 * LR_Prediction)

In [76]:
# Lets predict for next 10 days using LSTM model

lstm_future = [] 
last_60_days = x[-1].reshape(1, seq_length, 1) 

for _ in range(10):
    lstm_pred = model.predict(last_60_days)[0,0]
    lstm_future.append(lstm_pred)
    lstm_pred_reshape = np.array([[lstm_pred]]).reshape(1, 1, 1) # Creating 3D array, so LSTM model can accept it [[[lstm_pred]]] 
    last_60_days = np.append(last_60_days[:, 1:, :], lstm_pred_reshape, axis = 1) # Shift the sequence by one day to the left and appending the new prediction to the end of the sequence.

lstm_future = scaler.inverse_transform(np.array(lstm_future).reshape(-1,1)) 

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 96ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 93ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 88ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 80ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 75ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 78ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 83ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 84ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 81ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 72ms/step


---
We always keep 60 values (dropping the first, appending the new).
Each sequence gradually shifts forward. Our each sequence consist of 60 days. 
The model predicts one step at a time, and we keep using the new predictions to make further predictions. 

Step 1: Days - [ 1, 2, 3, 4, ... 58, 59, 60 ] # Shape: (1, 60, 1)
--> We feed this into the LSTM model, which predicts the next value (e.g., 61.5). 

Step 2: Shift Left & Append New Prediction, After prediction, we drop the first value (1) and append the new predicted value (61.5).
--> [ 2, 3, 4, 5, ... 59, 60, 61.5 ]  # Shape: (1, 60, 1), Now we have a new sequence of 60 values, ready for the next prediction.

Step 3: Repeat the Process, Remove the first value (2). Add the next predicted value (e.g., 62.3).
--> [ 3, 4, 5, 6, ... 60, 61.5, 62.3 ]  # Shape: (1, 60, 1)

---

In [77]:
#  Lets predict for next 10 days using LR model

recent_data = data['Close'].values[-3:]
LR_Future = [] 

for _ in range(10): 
    LR_pred = LR.predict(recent_data.reshape(1,-1)) 
    LR_Future.append(LR_pred[0])
    recent_data = np.append(recent_data[1:], LR_pred)

LR_Future = scaler.inverse_transform(np.array(LR_Future).reshape(-1,1))     # Transferring the scaled predictions back to their original scale.



In [80]:
hybrid_future_predictions = (0.7 * lstm_future) + (0.3 * LR_Future) # Creating Hybrid Predictions for next 10 days

In [81]:
future_data = pd.date_range(start = data.index[-1] + pd.Timedelta(days=1), periods=10)
predictions_onFuture = pd.DataFrame({'Date' : future_data, 
                                     'LSTM' : lstm_future.flatten(), 
                                     'LR' : LR_Future.flatten(),
                                     'Hybrid' : hybrid_future_predictions.flatten()})
print(predictions_onFuture) 

                       Date        LSTM          LR      Hybrid
0 2024-11-02 00:00:00+00:00  231.300964  230.355192  231.017233
1 2024-11-03 00:00:00+00:00  231.014114  225.707291  229.422072
2 2024-11-04 00:00:00+00:00  230.732025  222.703426  228.323449
3 2024-11-05 00:00:00+00:00  230.455582  230.631535  230.508369
4 2024-11-06 00:00:00+00:00  230.184204  225.486380  228.774851
5 2024-11-07 00:00:00+00:00  229.917084  222.494588  227.690332
6 2024-11-08 00:00:00+00:00  229.652863  230.930195  230.036062
7 2024-11-09 00:00:00+00:00  229.390533  225.245599  228.147044
8 2024-11-10 00:00:00+00:00  229.129303  222.284007  227.075705
9 2024-11-11 00:00:00+00:00  228.868515  231.252375  229.583675
