# Physics-Informed Neural Networks (PINNs) for Ordinary Differential Equations

## 1. Mathematical Formulation

Consider the solution $u: [0,T] \times \Omega \to \mathbb{R}$ of an evolution equation
$$\begin{align*} \partial_t u(t, x) + \mathcal{D}[u](t, x) &= 0, &&(t, x) \in (0, T] \times \Omega, \tag{1a} \\ u(0, x) &= u_0(x), && x \in \Omega, \tag{1b} \\ u(t, x) &= u_b(t, x), && (t, x) \in (0, T] \times \partial D, && \tag{1c}\end{align*}$$ where $\mathcal{D}$ is a nonlinear differential acting on $u$, $D \subset \mathbb{R}^d$ is a bounded domain, $T$ denotes the final time, $u_0 : D\to \mathbb{R}$ is the prescribed initial data, $u_b: (0, T] \times \partial D \to \mathbb{R}$ gives the boundary data.

- The PINN method constructs a neural network approximation $u_{\boldsymbol \theta} \approx u(t, x)$ of the solution of equations (1), where $u_{\boldsymbol \theta}: [0, T] \times D \to \mathbb{R}$ denotes a function realized by  a neural network with parameters $\boldsymbol \theta$.
- Other learning-based methods try to infer a solution by a purely data-driven approach by fitting an NN to a number of state-value pairs $\{t_i, x_i, u(t_i, x_i)\}_{i = 1}^N$. PINN, however, incorporates the underlying PDE into the learning process. The physics in the DE are incorporated into a loss function.

**Constructing the Loss Function**

- The **residual** of a neural network approximation $u_{\boldsymbol \theta}: [0, T] \times \Omega \to \mathbb{R}$ of the solution $u$ w.r.t. (1)a) is $$r_{\boldsymbol{\theta}}(t, x) := \partial_t u_{\boldsymbol{\theta}}(t, x) + \mathcal{D}[u_{\boldsymbol \theta}](t, x). \tag{2}$$
- The FNN in consideration consists of *alternating affine linear functions $W^l(\cdot) + b^l$ and activations $\sigma^l(\cdot)$, $$u_{\boldsymbol \theta} := W^m \sigma^m (W^{m - 1} \sigma^{m - 1}(\cdots \sigma^1(W^0(t, x)^T + b^0) \cdots) + b^{m - 1}) + b^m,$$ where $W^l$ and $b^l$ are weight matrices and bias vectors.
- The PINN approach for (1) proceeds by minimizing the loss function $$L_{\boldsymbol \theta}(X) := L_{\boldsymbol \theta}^r(X^r) + L_{\boldsymbol \theta}^0(X^0) + L_{\boldsymbol \theta}^b(X^b),$$ where $X$ is the set of training data and the loss function $L_{\boldsymbol \theta}(X)$ contains the following terms:
    - The *mean squared residual* $$L_{\boldsymbol \theta}^r(X^r) := \frac{1}{N_r} \sum_{i = 1}^{N_r} \left|r_{\boldsymbol \theta}(t_i^r, x_i^r)\right|^2$$ in a number of **collocation points** $X^r := \{(t_i^r, x_i^r)\}_{i = 1}^{N_r} \subset (0, T] \times \Omega$.
    - The MSE with respect to the initial and boundary conditions, $$L_{\boldsymbol \theta}^0(X^0) := \frac{1}{N_0} \sum_{i = 1}^{N_0} \left|u_{\boldsymbol \theta}(t_i^0, x_i^0) - u_0(x_i^0)\right|^2, \qquad L_{\boldsymbol \theta}^b(X^b) := \frac{1}{N_b} \sum_{i = 1}^{N_b} \left|u_{\boldsymbol \theta} (t_i^b, x_i^b) - u_b(t_i^b, x_i^b)\right|^2$$ in a number of points $X^0 := \{(t_i^0, x_i^0)\}_{i = 1}^{N_0} \subset \{0\} \times \Omega$ and $X^b := \{(t_i^b, x_i^b)\}_{i = 1}^{N_b} \subset (0, T] \times \partial \Omega$.
    - The training data $X = X^r \cup X^0 \cup X^b$. 

## 2. Solving for Unknown Parameters using PINN

Consider the PDE $$\partial u(t, x) + \mathcal{D}^{\lambda}[u_{\boldsymbol \theta}](t, x) = 0, \qquad (t, x) \in (0, T] \times \Omega, \tag{3}$$ where $\mathcal{D}$ is a nonlinear partial differential operator depending on an unknown parameter $\lambda \in \mathbb{R}$. 
- The parameter identification setting assumes a set of data $X_d := \{t_i^d, x_i^d, u_i^d\}_{i = 1}^{N_d}$, where $u_i^d \approx u(t_i^d, x_i^d)$ are (possibly noisy) observations of the solution of (3). This training data is then used in a *new* loss function, which consists of a mean squared "misfit" term and a mean squared residual term: $$L(X^d) = \frac{1}{N_d} \sum_{i = 1}^{N_d} \left|u_{\boldsymbol \theta}(t_i^d, x_i^d) - u_i^d\right|^2 + \frac{1}{N_d} \sum_{i = 1}^{N_d} \left|r_{\boldsymbol \theta} (t_i^d, x_i^d)\right|^2.$$
- So, $\lambda$ can be learned through training in the same way as the unknown weight matrices and bias vectors by automatic differentiation of the loss function with respect to $\lambda$. 

## 3. PINN for Deterministic Optimal Control Problems

A **general continuous-time deterministic control problem** involves finding a contorl policy that optimizes a given objective function over time, subject to the dynamics of the system and any constraints.
- General continuous-time deterministic control problems and HJB equations.
- Hamilton-Jacobi-Bellman equations.

### Introduction to General Continuous-Time Deterministic Control Problems

- Let $\mathbf{x}_t \in \mathbb{R}^n$ be the state of the system at time $t$ and $\mathbf{u}_t \in \mathbb{R}^m$ be the **control variable** or **decision variable** at time $t$.

- **Dynamics**: The evolution of the state variable is governed by a system of ODEs:
   $$
  \dot{\mathbf{x}}_t = f(t, \mathbf{x}_t, \mathbf{u}_t) \tag{4}
   $$
  where $f: \mathbb{R} \times \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R}^n$ is a known function that describes the system dynamics.

- **Objective function**: The goal is to maximize (or minimize) an objective function over a finite or infinite time horizon. The objective function typically takes the form
  $$
  V(0, \mathbf{x}_0) = \max_{\mathbf{u}_t} J(\mathbf{x}_0; \mathbf{u}_t), \tag{5}
   $$
  where $J(\mathbf{x}_0; \mathbf{u}_t)$ is the value of the objective function starting from the initial state $\mathbf{x}_0$:
   $$
  J(\mathbf{x}_0; \mathbf{u}_t) = \int_0^T e^{-\rho t} L(t, \mathbf{x}_t, \mathbf{u}_t) \, dt + e^{-\rho T} \Phi(\mathbf{x}_T),
   $$
  in which $\Phi(\mathbf{x}_T)$ is the terminal cost or reward function at the end time $T$, $L(t, \mathbf{x}_t, \mathbf{u}_t)$ is the running reward function, and $\rho \geq 0$ is the discount rate.

- **Constraints**: The problem may include constraints on the state and control variables:
   $$
  \mathbf{x}_t \in \mathcal{X}, \quad \mathbf{u}_t \in \mathcal{U}, \quad \mathbf{x}_0 = \mathbf{x}_0,
   $$
  where $\mathcal{X}$ and $\mathcal{U}$ are the admissible sets for the state and control variables, respectively.

### Hamilton-Jacobi-Bellman Equations

The HJB equation for the control problem (5) is a PDE that characterizes the value function $V(t, \mathbf{x})$, which represents the optimal value of the objective function starting from a given state $\mathbf{x}$ at time $t$. For the control problem (5), this is $$\frac{\partial V(t, \mathbf{x})}{\partial t} + \max_{\mathbf{u} \in \mathcal{U}} \left[e^{-\rho t} L(t, \mathbf{x}_t, \mathbf{u}_t) + \nabla V(t, \mathbf{x}) \cdot f(t, \mathbf{x}_t, \mathbf{u}_t)\right] = 0, \tag{6}$$ with terminal condition $$V(T, \mathbf{x}_T) = \Phi(\mathbf{x}_T). \tag{7}$$ 

The first order condition gives $$e^{-\rho t} \frac{\partial L}{\partial \mathbf{u}} = \nabla V(t, \mathbf{x}) \cdot \frac{\partial f}{\partial \mathbf{u}} \tag{8}$$ If the solution of $\mathbf{u}$ to (8) is denoted by $\mathbf{u}^\ast(t, \mathbf{x}_t$, then the HJB equation (6) becomes a PDE of $(t, \mathbf{x}_t)$: $$\frac{\partial V(t, \mathbf{x})}{\partial t} + e^{-\rho t} L(t, \mathbf{x}_t, \mathbf{u}_t^\ast) + \nabla V(t, \mathbf{x}) \cdot f(t, \mathbf{x}_t, \mathbf{u}_t^\ast) = 0. \tag{9}$$

**Linear Quadratic Regulators**

- The LQR problem aims to find the optimal control policy for a linear dynamic system.
- It involves minimizing a quadratic cost function over a finite time horizon.
    - **Dynamics**: $\dot x(t) = Ax(t) + Bu(t)$, $x(0) = x_0$.
          - $x(t) \in \mathbb{R}^n$: State vector
          - $u(t) \in \mathbb{R}^m$: Control vector
          - $A \in \mathbb{R}^{n \times n}$, $B \in \mathbb{R}^{n \times m}$: System matrices
          - $x_0$: Initial state
    - **Objective**: Minimize the *quadratic cost*: $$J(u) = \frac{1}{2} \int_0^T \left(x(t)^T Qx(t) + u(t)^T Ru(t)\right) \ dt + \frac{1}{2} x(T)^T Q_fx(T).$$
          - $Q \in \mathbb{R}^{n \times n}$: Penalizes state deviations
          - $R \in \mathbb{R}^{m \times m}$: Penalizes control effort
          - $Q_f \in \mathbb{R}^{n \times n}$: Terminal cost matrix (positive semi-definite)
**Hamiltonian**: $$H(x, u, \lambda) = x^T Qx + u^T Ru + \lambda^T(Ax + Bu)$$

**Optimal Control**: $$\frac{\partial H}{\partial u} = 0 \implies Ru + B^T \lambda = 0 \implies u^\ast = -R^{-1} B^T \lambda.$$

**Costate Evolution**: $$\dot \lambda(t) = -\frac{\partial H}{\partial x} = -Qx - A^T \lambda.$$

**Value Function** $V(x)$: $$V(x(t)) = \frac{1}{2} x(t)^T P(t) x(t) + \text{ constant}$$

**Bellman Equation**: $$V(x) = \min_u \left[\frac{1}{2} x^T Qx + u^T Ru + V(Ax + Bu)\right]$$

**Ricatti Equation**: 
$$\begin{align*}\dot P(t) &= -P(t) A - A^T P(t) + P(t) BR^{-1} B^T P(t) - Q \\ P(T) &= Q_f\end{align*}$$
**Optimal Feedback Gain**: $$K = R^{-1}B^TP(t)$$
**Control Law**: $$u(t) = -Kx(t) = -R^{-1}B^T P(t) x(t)$$ 
- Stabilizes the system.
- Minimizes the quadratic cost function.

## 4. Merton's Problem

- Merton's problem is a classic optimal control problem in financial economics.
- The deterministic version removes stochastic elements for simplification.
- Focuses on optimizing consumption and investment over an infinite horizon.

### Wealth Process

**Wealth dynamics**: $$\begin{align*} \frac{dW(t)}{dt} &= rW(t) + \pi(t)(\mu - r) - C(t) \\ W(0) &= W_0\end{align*}$$ where 
- $W(t)$: Wealth at time $t$.
- $\pi(t)$: Proportion of wealth invested in the risky asset.
- $\mu$: Expected return of the risky asset.
- $r$: Risk-free rate.
- $C(t)$: Consumption at time $t$.
- $W_0$: Initial wealth. 

**Utility Function**: 

$$U(C(t)) = \frac{C(t)^{1 - \gamma}}{1 - \gamma},$$ where $\gamma > 0$ is the coefficient o relative risk aversion. The utility function captures the satisfaction derived from consumption.

**Objective**: Maximize the expected utility over an infinite horizon $$J = \int_0^\infty e^{-\rho t} \frac{C(t)^{1 - \gamma}}{1 - \gamma} dt $$ where $\rho$ is the rate of time preference (discount rate). The objective is to maximize the discounted utility of consumption. 

**HJB Equation**: $$\rho V(W) = \max_{C, \pi} \left\{\frac{C^{1 - \gamma}}{1 - \gamma} + \frac{\partial V}{\partial W} [rW + \pi (\mu - r) - C]\right\}$$ for value function $V(W)$. The HJB equation characterizes the optimal control policy.

**Optimal Consumption**: $$C^\ast(t) = \left(\frac{\rho}{\gamma} \cdot \frac{1}{\lambda}\right)^{1/(\gamma - 1)}$$ where $\lambda$ is a constant derived from the value function. Optimal consumption is generally a constant fraction of wealth.

**Optimal Investment**: $$\pi^\ast(t) = \frac{\mu - r}{\gamma \sigma^2},$$ where this is simplified, as $\sigma$ is not present in the deterministic case.

## 5. Ramsey-Cass-Koopmans Model

- Consider an infinite-horizon economy in continuous time.
- Suppose the economy admits an aggregate production functiona nd a representative household.
- Let $k_t$ be the capital stock at time $t$ and $c_t$ the consumption at time $t$.
- The representative household has an **instantaneous utility function** $u(c) : \mathbb{R}^+ \to \mathbb{R}$ such that it is strictly increasing, concave, and twice differentiable, with $u'(c) > 0$, $u''(c) < 0$ for all $c$ in the interior of its domain.
- Suppose the population is constant. Then the utility of each household at time $t= 0$ can be written as $$\int_0^\infty e^{-\rho t} u(c_t) \ dt,$$ where $c_t$ is the consumption per capita at time $t$, $\rho > 0$ is the discount rate.
- Suppose that the economy doesn't have any technological process. Factory and product markets are competitive, and the per capita production function $f(k)$ satisfies $f'(k) > 0$, $f''(k) < 0$, $\lim_{k \to \0^+} f(k) = \infty$, $\lim_{k \to \infty} f(k) = 0$.
- Given the initial per-capita capital $k_0$, the representative household needs to choose a plan of consumption to maximize her utility $$\max_{\{c_t\}_{t = 0}^\infty} \int_0^\infty e^{-\rho t} u(c_t) \ dt \tag{10}$$ subject to the law of capital accumulation $$\dot k_t = f(k_t) - \delta k_t - c_t, \qquad k_0 > 0, \tag{11}$$ in which $\delta \in (0, 1)$ is the depreciation rate.

If $V(k)$ is the value function of the optimization problem (1), then $V(k)$ solves $$-\rho V(k) + \sup_c [(f(k) - \delta k - c)V'(k) + u(c)] = 0 \tag{12}$$ where the first order condition yields $V'(k) = u'(c)$. 

If we consider the current-value Hamiltonian: $$H(k_t, c_t, \mu_t) = e^{-\rho t} u(c_t) + \mu_t[f(k_t) - (n + \delta) k_t - c_t].$$ Necessary and sufficient conditions for a path to be optimal under the assumption on the utility and production functions made here are that $H_c = 0$, $H_k = -\dot \mu_t$, and the transversality condition $\lim_{t \to \infty} e^{-\rho t} \mu_t k_t = 0$. Such conditions yield the following transitional dynamics $$\begin{cases} \dot k_t = f(k_t) - \delta k_t - c_t, \\ \frac{\dot c_t}{c_t} = \frac{1}{\theta} (f'(k_t) - \delta - \rho).\end{cases}\tag{13}$$

The transversality condition and the above transitional dynamics indicates that there exists a steady state $(k_{ss}, c_{ss})$ solving $$f'(k_{ss} + \rho + \delta, \qquad c_{ss} = f'(k_{ss}) - \delta k_{ss}. \tag{14}$$ If the representative household starts with the capital $k_0 = k_{ss}$, the optimal consumption is always $c_{ss}$. Hence the maximized utility is $$V(k_{ss}) = \int_0^\infty e^{-\rho t} u(c_{ss}) \ dt = \frac{1}{\rho} u(c_{ss}). \tag{15}$$

## 5. Cobb-Douglas Production Function and CRRA Utility

Consider the Cobb-Douglas production function $f(k) = Ak^\alpha$, $A > 0$, $\alpha \in (0, 1)$, and the CRRA utility (**C**onstant **R**elative **R**isk **A**version) $$u(c) = \frac{c^{1 - \theta} - 1}{1 - \theta}, \qquad \theta > 0, \ \theta \neq 1, \qquad u(c) = \log c, \qquad \theta = 1.$$

The first-order condition now becomes $c^{-\theta} = V'(k)$. Plugging this first-order condition into the HJB equation yields that $$-\rho V(k) + [f(k) - \delta k]V'(k) + \frac{\theta V'(k)^{1 - 1/\theta}  - 1}{1 - \theta} = 0, \qquad \theta > 0, \theta \neq 1 \tag{16a}$$ or $$-\rho V(k) + [f(k) - \delta k] V'(k) - \log V'(k) - 1 = 0. \qquad \theta = 1 \tag{16b}$$ The value function at steady state $(k_{ss}, c_{ss})$ is given by (15). 