Skip to content

[Applied Energy] This work proposes an Input Convex LSTM neural network for real-time neural network-based optimization.

Notifications You must be signed in to change notification settings

killingbear999/ICLSTM

Repository files navigation

Real-Time Machine-Learning-Based Optimization Using Input Convex Long Short-Term Memory Network

Zihao Wang, Donghan Yu, Zhe Wu
Applied Energy
Paper: https://doi.org/10.1016/j.apenergy.2024.124472

Requires: Python 3.11.3, Tensorflow Keras 2.13.0, Pyipopt, Numpy, Sklearn

File description

  • ICLSTM_poster.pdf is the poster version of the paper
  • docker.pptx includes the instruction on how to install Pyipopt into Docker on your laptop
  • ICLSTM_toy_examples.ipynb demonstrates the input convexity of ICLSTM in some 3D toy examples on surface fitting of non-convex bivariate scalar functions (we have constructed three examples for you to play around)
  • Under CSTR subfolder:
    1. CSTR_ICLSTM.ipynb and CSTR_NNs.ipynb are used to train neural networks to learn the system dynamics
  • Under MPC subfolder:
    1. rnn.h5, lstm.h5, icrnn.h5, iclstm.h5 are trained RNN, LSTM, ICRNN, and ICLSTM respectively. You may regenerate the models using CSTR_ICLSTM.ipynb and CSTR_NNs.ipynb
    2. mpc_rnn.ipynb, mpc_lstm.ipynb, mpc_icrnn.ipynb, mpc_iclstm.ipynb are used to integrate NNs into LMPC and solve the MPC optimization problem
    3. mpc_first_principles.ipynb uses the first principle model to solve the MPC optimization problem
  • FYI:
    1. .ipynb files can be run on Jupyter Notebook or Google Colab
    2. Pyipopt can be installed and run on Docker. mpc_rnn.ipynb, mpc_lstm.ipynb, mpc_icrnn.ipynb, mpc_iclstm.ipynb use Pyipopt

Motivation

  • Traditional model-based optimization and control rely on the development of first-principles models, a process that is resource-intensive
  • Neural network-based optimization suffers from slow computation times, limiting its applicability in real-time tasks
  • Computational efficiency is a critical parameter for real-world and real-time implementation of neural network-based optimization
  • The optima of convex optimization problems are easier and faster to obtain than those of non-convex optimization problems
  • Long Short-Term Memory (LSTM) network's advanced gating architecture, which has been well documented in the literature

Objective

  • Proposes an Input Convex Long Short-Term Memory (ICLSTM) neural network to increase computational efficiency (by preserving the convexity in neural network-based optimization) for real-time neural network-based optimization (e.g., model predictive control (MPC))

Architecture

The ICLSTM cell follows the structure:

alt text

Specifically,

$f_t = g[D_f(W_hh_{t-1} + W_x[x_t,-x_t]) + b_f]$
$i_t = g[D_i(W_hh_{t-1} + W_x[x_t,-x_t]) + b_i]$
$c_{temp} = g[D_c(W_hh_{t-1} + W_x[x_t,-x_t]) + b_c]$
$o_t = g[D_o(W_hh_{t-1} + W_x[x_t,-x_t]) + b_o]$
$c_t = f_t * c_{t-1} + i_t * c_{temp}$
$h_t = o_t * g(c_t)$

where

  • $D_f$, $D_i$, $D_c$, $D_o$ are non-negative trainable scaling vectors to differentiate different gates
  • $W_h$, $W_x$ are non-negative trainable weights (i.e., sharing weights across all gates)
  • $b_f$, $b_i$, $b_c$, $b_o$ are trainable bias
  • $g$ are convex, non-negative, and non-decreasing activation function (e.g., ReLU)
  • $*$ denotes element-wise multiplication (i.e., Hadamard product)

The output of $L$-layer ICLSTM follows the structure:

alt text

Specifically,

$z = g^d(W_dh_t + b_d) + [x_t,-x_t]$
$y = g^y(W_yz + b_y)$

where

  • $W_d$, $W_y$ are non-negative trainable weights
  • $b_d$, $b_y$ are trainable bias
  • $g^d$ is convex, non-negative, and non-decreasing activation function (e.g., ReLU)
  • $g^y$ is convex, non-decreasing activation function

Results

  • ICLSTM-based MPC achieved convergence in 15 initial conditions (i.e., it achieved the fastest convergence in 13 out of 15 different initial conditions) on a continuous stirred tank reactor (CSTR) example, with an average percentage decrease in computational time of 54.4%, 40.0%, and 41.3% compared to plain RNN, plain LSTM, and ICRNN, respectively
  • ICLSTM-based MPC enjoys a faster (at least 4 $\times$) solving time compared to LSTM on a solar PV energy system example (i.e., for a scaled-up solar PV energy system or a longer prediction horizon, the time discrepancy will be even greater)

Citation

If you find our work relevant to your research, please cite:

@article{wang2025real,
  title={Real-time machine-learning-based optimization using Input Convex Long Short-Term Memory network},
  author={Wang, Zihao and Yu, Donghan and Wu, Zhe},
  journal={Applied Energy},
  volume={377},
  pages={124472},
  year={2025},
  publisher={Elsevier}
}

About

[Applied Energy] This work proposes an Input Convex LSTM neural network for real-time neural network-based optimization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published