# 🔁 CNN-BiLSTM Hybrid Model with PSO Optimization

This notebook implements a **hybrid deep learning model** combining **CNN** and **Bidirectional LSTM (BiLSTM)** layers for stock price forecasting. The model integrates both local pattern extraction and long-term dependencies from financial time series. A **Particle Swarm Optimization (PSO)** algorithm is applied **post-training** to optimally combine the outputs of CNN and LSTM components.

---

## 🧱 Workflow Overview

### 1. 📥 Data Collection
- Shanghai Stock Index data (`000001.SS`) from Yahoo Finance  
- Date range: `2010-01-04` to `2020-01-23`

### 2. 🧹 Preprocessing
- Outlier removal using **Z-score**
- Min-Max scaling (fitted only on training data)
- Look-back window of **5 days** to create time sequences
- Train/Validation/Test split: 70% / 10% / 20%

### 3. 🧠 Model Architecture
- **Conv1D layers** to extract short-term patterns
- **BiLSTM layers** to capture temporal dependencies
- **Dropout & Dense layers** for regularization and output
- Optimizer: **Adam**, Learning Rate: 0.001, Epochs: 200

### 4. ⚖️ PSO-Based Post-Training Optimization
- CNN and LSTM predictions are fused using **optimized weighted average**
- Weights are found via **Particle Swarm Optimization (PSO)** to minimize combined RMSE

### 5. 📊 Evaluation Metrics
Performance is reported on Train, Validation, and Test sets using:
- **RMSE** (Root Mean Squared Error)
- **MAE** (Mean Absolute Error)
- **MAPE** (Mean Absolute Percentage Error)
- **R²** (Coefficient of Determination)

---

## 🔬 Paper Context

This notebook corresponds to **Section 3.5** and results in **Table 2** of the article:

**"The Application and Effectiveness of Machine Learning and Deep Learning Methods in Analyzing and Predicting the Shanghai Stock Index"**

---

## ✅ Key Highlights
- Hybrid deep learning approach improves prediction accuracy
- PSO enhances ensemble output of CNN and BiLSTM branches
- Model achieves **Test R² of 0.95** and **MAPE below 1.3%**


In [1]:
!pip install pyswarm

Collecting pyswarm
  Downloading pyswarm-0.6.tar.gz (4.3 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pyswarm
  Building wheel for pyswarm (setup.py) ... [?25l[?25hdone
  Created wheel for pyswarm: filename=pyswarm-0.6-py3-none-any.whl size=4464 sha256=94b8eb97071ca3e910320c4d9676ca3a59790516801009744a138b4ab82b6703
  Stored in directory: /root/.cache/pip/wheels/71/67/40/62fa158f497f942277cbab8199b05cb61c571ab324e67ad0d6
Successfully built pyswarm
Installing collected packages: pyswarm
Successfully installed pyswarm-0.6


In [2]:
!pip install deap

Collecting deap
  Downloading deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading deap-1.4.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (135 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.4/135.4 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: deap
Successfully installed deap-1.4.1


In [3]:
!pip install tensorflow



In [4]:
!pip install keras



In [5]:
!pip install keras-tuner

Collecting keras-tuner
  Downloading keras_tuner-1.4.7-py3-none-any.whl.metadata (5.4 kB)
Collecting kt-legacy (from keras-tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl.metadata (221 bytes)
Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras-tuner
Successfully installed keras-tuner-1.4.7 kt-legacy-1.0.5


In [6]:
pip install --upgrade mplfinance

Collecting mplfinance
  Downloading mplfinance-0.12.10b0-py3-none-any.whl.metadata (19 kB)
Downloading mplfinance-0.12.10b0-py3-none-any.whl (75 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: mplfinance
Successfully installed mplfinance-0.12.10b0


In [7]:
!pip install statsmodels



In [8]:
!pip install scikit-learn



In [9]:
# !pip install scikeras[tensorflow]  # For GPU compute platform
!pip install scikeras[tensorflow-cpu]  # For CPU

Collecting scikeras[tensorflow-cpu]
  Downloading scikeras-0.13.0-py3-none-any.whl.metadata (3.1 kB)
Downloading scikeras-0.13.0-py3-none-any.whl (26 kB)
Installing collected packages: scikeras
Successfully installed scikeras-0.13.0


In [10]:
!pip install yfinance



In [11]:
import numpy as np
import pandas as pd
import mplfinance as mpf
from statsmodels.tsa import stattools as tsast
import matplotlib.pyplot as plt
import tensorflow as tf
from sklearn.metrics import mean_squared_error, mean_absolute_error, mean_absolute_percentage_error, r2_score
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.seasonal import seasonal_decompose
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout, Bidirectional, SimpleRNN, Input, Conv1D, MaxPooling1D, Flatten, LeakyReLU
from sklearn.preprocessing import MinMaxScaler
from scipy import stats
from scipy.stats import zscore, randint as sp_randint
from keras.optimizers import Adam, SGD, RMSprop, Adamax, Nadam
from tensorflow.keras.optimizers import Adam, Adamax, Nadam, RMSprop, Ftrl
from tensorflow import random as tf_random
from keras.initializers import GlorotUniform
from scikeras.wrappers import KerasRegressor
from sklearn.model_selection import TimeSeriesSplit, RandomizedSearchCV
from math import sqrt
from keras_tuner import RandomSearch
from keras_tuner import HyperParameters, Objective
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.ar_model import AutoReg
from statsmodels.tsa.arima.model import ARIMA, ARIMAResults
from tensorflow.keras import layers
from tensorflow import keras
from kerastuner.tuners import RandomSearch
from keras.optimizers import Adadelta, Adagrad, Ftrl
from keras.callbacks import EarlyStopping
from keras.models import Model
from hyperopt import Trials, fmin, tpe, hp, STATUS_OK
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from scipy.optimize import minimize
import yfinance as yf
from pyswarm import pso


  from kerastuner.tuners import RandomSearch


# ***Get Data***

In [None]:
start = '2010-01-04'
end = '2020-01-23'


data = yf.download('000001.SS', start, end)


data = data.reset_index()

data = data.dropna()

data


[*********************100%%**********************]  1 of 1 completed


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2010-01-04,3289.750000,3295.279053,3243.319092,3243.760010,3243.760010,109400
1,2010-01-05,3254.468018,3290.511963,3221.461914,3282.178955,3282.178955,126200
2,2010-01-06,3277.517090,3295.867920,3253.043945,3254.215088,3254.215088,123600
3,2010-01-07,3253.990967,3268.819092,3176.707031,3192.775879,3192.775879,128600
4,2010-01-08,3177.259033,3198.919922,3149.017090,3195.997070,3195.997070,98400
...,...,...,...,...,...,...,...
2437,2020-01-16,3095.733887,3096.372070,3070.884033,3074.081055,3074.081055,203400
2438,2020-01-17,3081.464111,3091.951904,3067.252930,3075.496094,3075.496094,190300
2439,2020-01-20,3082.113037,3096.311035,3070.479980,3095.787109,3095.787109,210500
2440,2020-01-21,3085.790039,3085.790039,3051.229980,3052.139893,3052.139893,234800


In [None]:
# Drop the 'Date' column
data = data.drop(columns=['Date'])


In [None]:
# Determine the length of the training data (70%)
train_len = int(len(data["Adj Close"]) * 0.7)

# Determine the length of the validation data (10%)
val_len = int(len(data["Adj Close"]) * 0.1)

# Set the training, validation, and test data
train_data = data.iloc[:train_len]
val_data = data.iloc[train_len:train_len + val_len]
test_data = data.iloc[train_len + val_len:]


# ***1) Scaling the training data with min-max scaler***

In [None]:
# Selecting columns
columns = ['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']


# Calculating Z-Score for each column
z_scores = zscore(train_data[columns])

# Creating a training dataframe without outliers
train_data_without_outliers = train_data[(z_scores < 3).all(axis=1)]


In [None]:
# Initialize the scaler
scaler = MinMaxScaler()

train_data_scaled = train_data_without_outliers.copy()

# Fit the scaler to the training data and transform
train_data_scaled[columns] = scaler.fit_transform(train_data_without_outliers[columns])

train_data_scaled


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
0,0.548970,0.534932,0.559070,0.524307,0.524307,0.131500
1,0.534668,0.533024,0.550302,0.539877,0.539877,0.163150
2,0.544011,0.535168,0.562971,0.528544,0.528544,0.158252
3,0.534475,0.524339,0.532349,0.503645,0.503645,0.167671
4,0.503369,0.496354,0.521241,0.504950,0.504950,0.110776
...,...,...,...,...,...,...
1704,0.495030,0.483586,0.516126,0.480941,0.480941,0.261492
1705,0.485672,0.474754,0.507988,0.473864,0.473864,0.205916
1706,0.478570,0.468967,0.502445,0.471219,0.471219,0.219857
1707,0.473872,0.458809,0.479230,0.467436,0.467436,0.411266


# ***2) Validation data scaling with min-max scaler***

In [None]:
val_data_scaled = val_data.copy()

val_data_scaled[columns] = scaler.transform(val_data[columns])


val_data_scaled


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
1709,0.473983,0.466247,0.501010,0.471320,0.471320,0.173700
1710,0.474066,0.463067,0.499173,0.466573,0.466573,0.158817
1711,0.470356,0.467023,0.499658,0.475424,0.475424,0.155991
1712,0.482356,0.475103,0.511775,0.480950,0.480950,0.175396
1713,0.486073,0.476580,0.514101,0.483292,0.483292,0.162773
...,...,...,...,...,...,...
1948,0.599384,0.588955,0.621458,0.596473,0.596473,0.319329
1949,0.599980,0.587462,0.624185,0.597896,0.597896,0.252826
1950,0.603343,0.591041,0.629136,0.599354,0.599354,0.253391
1951,0.605399,0.593875,0.622850,0.591875,0.591875,0.362472


# ***3) Scaling test data with min-max scaler***

In [None]:
test_data_scaled = test_data.copy()



test_data_scaled[columns] = scaler.transform(test_data[columns])

test_data_scaled


Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
1953,0.609300,0.602964,0.634162,0.605728,0.605728,0.417106
1954,0.613883,0.607505,0.641495,0.617920,0.617920,0.339864
1955,0.626749,0.616268,0.651725,0.623233,0.623233,0.391673
1956,0.624874,0.618251,0.652279,0.628703,0.628703,0.335154
1957,0.635961,0.635801,0.663781,0.646998,0.646998,0.375094
...,...,...,...,...,...,...
2437,0.470321,0.455298,0.489898,0.455542,0.455542,0.308591
2438,0.464537,0.453528,0.488441,0.456116,0.456116,0.283911
2439,0.464800,0.455273,0.489736,0.464339,0.464339,0.321967
2440,0.466290,0.451061,0.482014,0.446651,0.446651,0.367747


In [None]:
def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / (y_true + 1e-10))) * 100


CNN-BiLSTM

In [None]:
# Define a function to create the dataset
def create_dataset(dataset, look_back=5):
    dataX, dataY = [], []
    dataset = dataset.values  # Convert the DataFrame to a numpy array
    for i in range(len(dataset)-look_back-1):
        a = dataset[i:(i+look_back), :]
        dataX.append(a)
        dataY.append(dataset[i + look_back, 4])  # Use column index 4 for 'Adj Close'
    return np.array(dataX), np.array(dataY)

# Create the dataset
look_back = 5
trainX, trainY = create_dataset(train_data_scaled, look_back)
valX, valY = create_dataset(val_data_scaled, look_back)
testX, testY = create_dataset(test_data_scaled, look_back)

# Reshape the data to be suitable for CNN
trainX_CNN = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], trainX.shape[2]))
valX_CNN = np.reshape(valX, (valX.shape[0], valX.shape[1], valX.shape[2]))
testX_CNN = np.reshape(testX, (testX.shape[0], testX.shape[1], testX.shape[2]))

# Define the CNN-BiLSTM model
model = Sequential()
model.add(Conv1D(filters=3, kernel_size=2, activation='relu', input_shape=(look_back, len(columns))))  # CNN layer
model.add(Conv1D(filters=3, kernel_size=2, activation='relu'))  # CNN layer
model.add(Conv1D(filters=3, kernel_size=2, activation='relu'))  # CNN layer
model.add(MaxPooling1D(pool_size=2))
model.add(Bidirectional(LSTM(72, activation='relu', return_sequences=True)))  # BiLSTM layer
model.add(Bidirectional(LSTM(72, activation='relu')))  # BiLSTM layer
model.add(Dropout(0.5))
model.add(Dense(10, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1))

# Compile the model with learning rate 0.001
model.compile(optimizer=Adam(learning_rate=0.001), loss='mean_squared_error')

# Train the model
history = model.fit(trainX_CNN, trainY, validation_data=(valX_CNN, valY), epochs=200, batch_size=32)

# Train the CNN and LSTM models and make predictions
train_predictions_cnn = model.predict(trainX_CNN)
train_predictions_lstm = model.predict(trainX_CNN)

# Similarly for validation and test sets
val_predictions_cnn = model.predict(valX_CNN)
val_predictions_lstm = model.predict(valX_CNN)

test_predictions_cnn = model.predict(testX_CNN)
test_predictions_lstm = model.predict(testX_CNN)

# Define objective function for PSO
def objective(weights):
    epsilon = 1e-10  # small constant
    train_predictions = (weights[0] * train_predictions_cnn + weights[1] * train_predictions_lstm) / (sum(weights) + epsilon)
    val_predictions = (weights[0] * val_predictions_cnn + weights[1] * val_predictions_lstm) / (sum(weights) + epsilon)
    test_predictions = (weights[0] * test_predictions_cnn + weights[1] * test_predictions_lstm) / (sum(weights) + epsilon)

    train_rmse = np.sqrt(mean_squared_error(trainY, train_predictions))
    val_rmse = np.sqrt(mean_squared_error(valY, val_predictions))
    test_rmse = np.sqrt(mean_squared_error(testY, test_predictions))

    return train_rmse + val_rmse + test_rmse

# Define constraints
def constraint(weights):
    return sum(weights) - 1

# Initial guess
x0 = [0.5, 0.5]
# Show initial objective
print('Initial Objective: ' + str(objective(x0)))

# Define the lower and upper bounds for the weights
lb = [0, 0]
ub = [1, 1]

# Optimize with PSO
xopt, fopt = pso(objective, lb, ub, f_ieqcons=constraint)

print('The optimum is at:')
print('    {}'.format(xopt))
print('Optimal function value:')
print('    myfunc: {}'.format(fopt))

# Use the optimized weights to make predictions
train_predictions = (xopt[0] * train_predictions_cnn + xopt[1] * train_predictions_lstm) / sum(xopt)
val_predictions = (xopt[0] * val_predictions_cnn + xopt[1] * val_predictions_lstm) / sum(xopt)
test_predictions = (xopt[0] * test_predictions_cnn + xopt[1] * test_predictions_lstm) / sum(xopt)

# Invert the predictions to the original scale
train_predictions_original = scaler.inverse_transform(np.c_[train_predictions, np.zeros((train_predictions.shape[0], len(columns)-1))])[:, 0]
val_predictions_original = scaler.inverse_transform(np.c_[val_predictions, np.zeros((val_predictions.shape[0], len(columns)-1))])[:, 0]
test_predictions_original = scaler.inverse_transform(np.c_[test_predictions, np.zeros((test_predictions.shape[0], len(columns)-1))])[:, 0]

# Invert the actual values to the original scale
trainY_original = scaler.inverse_transform(np.c_[trainY, np.zeros((trainY.shape[0], len(columns)-1))])[:, 0]
valY_original = scaler.inverse_transform(np.c_[valY, np.zeros((valY.shape[0], len(columns)-1))])[:, 0]
testY_original = scaler.inverse_transform(np.c_[testY, np.zeros((testY.shape[0], len(columns)-1))])[:, 0]

# Calculate RMSE for training, validation and test data
train_rmse = np.sqrt(mean_squared_error(trainY_original, train_predictions_original))
val_rmse = np.sqrt(mean_squared_error(valY_original, val_predictions_original))
test_rmse = np.sqrt(mean_squared_error(testY_original, test_predictions_original))

print('Train RMSE: ', train_rmse)
print('Validation RMSE: ', val_rmse)
print('Test RMSE: ', test_rmse)

# Calculate MAE for training, validation and test data
train_mae = mean_absolute_error(trainY_original, train_predictions_original)
val_mae = mean_absolute_error(valY_original, val_predictions_original)
test_mae = mean_absolute_error(testY_original, test_predictions_original)

print('Train MAE: ', train_mae)
print('Validation MAE: ', val_mae)
print('Test MAE: ', test_mae)

# Calculate MAPE for training, validation and test data
train_mape = mean_absolute_percentage_error(trainY_original, train_predictions_original)
val_mape = mean_absolute_percentage_error(valY_original, val_predictions_original)
test_mape = mean_absolute_percentage_error(testY_original, test_predictions_original)

print('Train MAPE: ', train_mape)
print('Validation MAPE: ', val_mape)
print('Test MAPE: ', test_mape)

# Calculate R^2 for training, validation and test data
train_r2 = r2_score(trainY_original, train_predictions_original)
val_r2 = r2_score(valY_original, val_predictions_original)
test_r2 = r2_score(testY_original, test_predictions_original)

print('Train R^2: ', train_r2)
print('Validation R^2: ', val_r2)
print('Test R^2: ', test_r2)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 22ms/step - loss: 0.0814 - val_loss: 0.0215
Epoch 2/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 0.0166 - val_loss: 0.0038
Epoch 3/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step - loss: 0.0088 - val_loss: 0.0025
Epoch 4/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 0.0071 - val_loss: 0.0016
Epoch 5/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 0.0064 - val_loss: 0.0050
Epoch 6/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/step - loss: 0.0059 - val_loss: 0.0031
Epoch 7/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 0.0052 - val_loss: 0.0025
Epoch 8/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 0.0052 - val_loss: 0.0045
Epoch 9/200
[1m52/52[0m [32m━━━━━━━━━━━━━━━