We have the following HJB:

$$
0 = \sup_{(\pi,c)\in \mathcal{U}} \left\{ \partial_tV(w,t) + \partial_wV(w,t)\cdot(r + \pi_t(\mu-r)w - c_t) + \frac{1}{2}\partial_{ww}Vw^2\pi_t^2\sigma^2 + e^{-\rho t}U(c_t,t) \right\}, \quad w\geq0, w_0\geq 0, c_t \geq 0
$$

An closed-form solution can be found for specific $U(c,t)$ for example those with constant relative risk-aversion $U(c, t) = \frac{c^{1-\gamma}}{1-\gamma}$.
However, for more complicated utility functions a numerical solver is needed. 

One way fo doing this is using finite differences. Since the equation has a terminal condition our first instinct is to use an implicit scheme. While it is possible to create an explicit scheme by changing variables to $\tau = T-t$, we remember from numerical analysis that such a scheme may face stability issues.

The implicit scheme is as follows: 

### Step 1: Finite Differences
- discretise the spatial variable linearly: $w_j = 0 + j\Delta w$, with $j = 0,...,N$, $w_N = w_{max}$ for suitably large $w_{max}$.
- discretise the time variable linearly: $t_k = t_0 + k\Delta t$ with $k = 0,...,M$, $t_M = T$. For simplicity we take t_0 = 0 W.L.O.G.
- We use central difference for the 2nd order derivative and an upwinding derivative for the advection (1st order derivative) term to ensure stability.

Let $v_j^k$ be the value function evaluated at grid point $(w_j, t_k)$, $\alpha_j$ be the advection coefficient evaluated at $w_j$, $d_j$ the diffusion coefficient evaluated at $w_j$, and $\Theta^k$ be the constant term.
$$\begin{align*}
\alpha_j(\pi,c) &:= r + \pi(\mu -r)w_j - c \\
d_j(\pi,c) &:= \frac{1}{2}w_j^2\pi^2\sigma^2 \\
\Theta^k(\pi,c) &:= e^{-\rho t_{k}}U(c,t_{k})
\end{align*}$$

The standard backward Euler scheme for the HJB is thus:

$$\frac{v^{k}_j - v^{k-1}_j}{\Delta t} = - \sup_{(\pi,c)}\left[ D^{up}_w(v^{k-1}_j)\alpha_j(\pi,c) + D^+_wD^-_w(v^{k-1}_j)d_j(\pi,c)+ \Theta^{k-1}(\pi,c)\right]$$

Where:
- $D^{up}_w$ is the forward difference operator $D^+_w(v^k_j) = \frac{v^k_{j+1} - v^k_{j}}{\Delta w}$ if $\alpha_j <0$, and $D^-_w(v^k_j) = \frac{v^k_{j} - v^k_{j-1}}{\Delta w}$ if $\alpha_j >0$
- $D^+_wD^-_w$ is the second order central difference operator $D^+_wD^-_w(v^k)_{j} = \frac{v^k_{j+1} - 2v^k_{j} + v^k_{j-1}}{(\Delta w)^2}$ 

Taking the supremum accross the admissable set of controls is a non-linear optimisation problem because the diffusion term is quadratic in $\pi$. Therefore, we rely on policy iteration in order to find the optimal control at each time step.

### Step 2: Policy Iteration

Policy Iteration is a form of reinforcement learning used for iteratively choosing optimal control. At the PDE level policy iteration can be shown to converge on the optimal policy. If sufficiently careful with what discretisation you take, this convergence carries over to our finite difference scheme.

In practice we guess a policy at the required timestep, evaluate the value function given the policy, then find a new optimal policy based on that evaluation. We use an extra subscript to indicate the policy iteration. Let the inital guess policy be $(\pi_0, c_0)$. Let $v^{*k}_j$ denote the optimal value function determined from the previous time step.

$$\frac{v^{*k}_j - v^{k-1}_{j,0}}{\Delta t} = - \left[ D^{up}_w(v^{k-1}_{j,0})\alpha_j(\pi_0, c_0) + D^+_wD^-_w(v^{k-1}_{j,0})d_j(\pi_0, c_0)+ \Theta_{j}^{k-1}(\pi_0, c_0)\right]$$

This yields a linear system of equations we can solve for $\mathbf{v^{k-1}_0}$. We then find the optimal policy w.r.t the resulting value function. This is a non-linear optimisation problem.

$$(\pi_1,c_1) = \argmax_{(\pi,c)} \left[ D^{up}_w(v^{k-1}_{j,0})\alpha_j(\pi_0, c_0) + D^+_wD^-_w(v^{k-1}_{j,0})d_j(\pi_0, c_0)+ \Theta_{j}^{k-1}(\pi_0, c_0)\right]$$ 

We iterate this process until $||\mathbf{v^{k-1}_n}-\mathbf{v^{k-1}_{n-1}}|| < \epsilon $ and:

$$(\pi^*, c^*) := (\pi_{n+1}, c_{n+1}) = \argmax_{(\pi,c)} \left[ D^{up}_w(v^{k-1}_{j,n})\alpha_j(\pi_{n}, c_{n}) + D^+_wD^-_w(v^{k-1}_{j,n})d_j(\pi_{n}, c_{n})+ \Theta_{j}^{k-1}(\pi_{n}, c_{n})\right]$$ 



we can write the difference equation as 

$$
\begin{align*}
\frac{v^{*k}_{j} - v^{k-1}_{j,n}}{\Delta t} &= - \left[ \left(\frac{\alpha_j(\pi_{n-1}, c_{n-1})}{\Delta w} + \frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2}\right)v^{k-1}_{j+1} + \left(-\frac{\alpha_j(\pi_{n-1}, c_{n-1})}{\Delta w} - 2\frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2}\right)v^{k-1}_{j} + \frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2}v^{k-1}_{j-1} + \Theta^{k-1}(\pi_{n-1}, c_{n-1})\right], \quad \text{for } \alpha_j(\pi_{n-1}, c_{n-1}) <0\\

\frac{v^{*k}_{j} - v^{k-1}_{j,n}}{\Delta t} &= - \left[ \frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2}v^{k-1}_{j+1} + \left(\frac{\alpha_j(\pi_{n-1}, c_{n-1})}{\Delta w} - 2\frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2}\right)v^{k-1}_{j} + \left(\frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2}-\frac{\alpha_j(\pi_{n-1}, c_{n-1})}{\Delta w}\right)v^{k-1}_{j-1} + \Theta^{k-1}(\pi_{n-1}, c_{n-1})\right ], \quad \text{for } \alpha_j(\pi_{n-1}, c_{n-1}) >0
\end{align*}
$$


Reformulating in matrix notation we define a matrix $\mathbf{M}$ for $2 \leq j \leq N-1$

$$\begin{align*}
[M]_{j,j} &= 1 - \Delta t\left(\frac{|\alpha(\pi_{n-1},c_{n-1})|}{\Delta w} - 2\frac{d_j(\pi_{n-1},c_{n-1})}{(\Delta w)^2}\right) \\
[M]_{j,j+1} &= - \Delta t\left(1_{\alpha<0}\frac{\alpha_j(\pi_{n-1}, c_{n-1})}{\Delta w} + \frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2}\right) \\
[M]_{j,j-1} &= - \Delta t\left(1_{\alpha>0}\frac{d_j(\pi_{n-1}, c_{n-1})}{(\Delta w)^2} -\frac{\alpha_j(\pi_{n-1}, c_{n-1})}{\Delta w} \right)\\
[M]_{1,\cdot} &= [1, 0, 0,...,0]\\
[M]_{N,\cdot} &=[0,..., -3,4,-1]\\
\end{align*}
$$

The first three lines are the difference equations and the final lines encode the Dirichlet boundary conditions at one end and the Neumann boundary conditions at the other. 

We define the vectors $\mathbf{v}^{*k}$ and $\mathbf{v}^{k-1}_{n-1}$
$$
\begin{align*}
[\mathbf{v}^{*k}]_j &= v^{*k}_j, \quad \text{for } 2 \leq j \leq N-1\\
[\mathbf{v}^{*k}]_1 &= 0\\
[\mathbf{v}^{*k}]_N &= 0\\\\
[\mathbf{v}^{k-1}_{n-1}] &= v^{k-1}_{j,n-1}, \quad \text{for } 1 \leq j \leq N\\
\end{align*}
$$

Finally we define the vector 
$$
\begin{align*}
[\mathbf{\Theta}^{k-1}(\pi_{n-1},c_{n-1})]_j = \Theta(\pi_{j,n-1}, c_{j,n-1}), \quad \text{for } 1 \leq j \leq N-1\\
[\mathbf{\Theta}^{k-1}(\pi_{n-1},c_{n-1})]_N = 0 
\end{align*}$$

Where $(\pi_{j,n-1}, c_{j,n-1})$ represent the (n-1)th policy iteration at the point $w_j$. 

And the difference equation becomes 
$$ \mathbf{v}^{*k} + \Delta t\Theta^{k-1}(\pi_{n-1},c_{n-1})= M\mathbf{v}^{k-1}_{n-1} $$

Noting that because $C(0,t) = 0$, consumption cannot exceed wealth, the top line reads correctly as the Dirichlet boundary condition and in order to make the bottom line read correctly as the Neumann boundary condition we set the last component of the vector $[\mathbf{v}^{*k} + \Delta t\Theta^{k-1}(\pi_{n-1},c_{n-1})]_{N}$ to 0.